Published January 21, 2024 © GPL3+

How to Handle Frequently used Functions using R3 vs. R4

To get faster access to values of scientific functions, you often prefer to store precalculated values in RAM or ROM. Can it speed-up R3/R4?

AdvancedFull instructions provided2 hours48

How to Handle Frequently used Functions using R3 vs. R4

Story

You often see sketches which start with lines like this:

const PROGMEM char sine256[]  = {128,130,133,136,139,143,146,149,152,

which include a long table of sine values. This is very useful as the calculation at runtime is possible but was very slow using 8-bit controllers. Fast access to sine values can be crucial in audio projects when you do not want to output that annoying square-wave sounds. In order to save precious space in SRAM you better store those constants in the FLASH memory with PROGMEM. Let us compare what the new R4 with RA4M1 controller has to offer.

Example 1: Sine function

For this benchmark I implemented two tables of sine values, one calculated at compile-time (by the compiler inside your PC) and uploaded and stored in the FLASH memory, and another one calculated during setup and stored in SRAM. Both were compared to the values which were calculated at run-time.

The same plot as above but using the Renesas RA4M1 Arduino UNO R4

[The complete graphic, sine, boxes and text, was produced with the code in the attachment, using the Serial plotter of the old Arduino IDE; it would not be possible using IDE 2.x. Both pictures were produced with the same code, but some #ifdef's were necessary.]

What exactly does this graphic show?

On the left you see a plot of the sine function from 0 to 360 degrees (0 to 2*PI). Only 180 values are plotted. They were shifted a little bit one against the other, otherwise you would see only one graph.

On the right you see a histogram of the times needed to access the values in FLASH, SRAM and get them calculated. As expected, calculation needs much longer than reading values from memory but that ratio is much lower using the new R4 as it has an embedded floating-point unit inside. Another difference can be seen: Using the R3, access to data stored in FLASH memory is slightly slower than accessing data in SRAM while using the R4 accessing data in FLASH is significantly faster than reading data from SRAM.

If you just want 360 values of the sine function you may use this code. As the limit of recursive calls is 397 and the __COUNTER__ is incremented each time it appears, you need to produce two values on each call. The whole procedure looks like this:

/*
  calulate Sine-Tabele at Compile-time
*/

#define M 360
#define N (2 * M)
#define RAD (TWO_PI / N)

float sinTab[] = {
#include "sin.h"
};

void setup() {
  Serial.begin(9600);
  for (int i = 0; i < M; i++)
    Serial.println(sinTab[i]);
}

void loop() {}

and this is the file sin.h

// each call of this file
// generates two values.
sin(__COUNTER__ * RAD),
// increment __COUNTER__ once more:
#if __COUNTER__ > 1000
// dummy
#endif
sin(__COUNTER__ * RAD),
#if __COUNTER__ < N
#include __FILE__
#endif

Example 2: Fibonacci Series

The first version shows the standard SRAM solution:

long fib[] = {
  // no PROGMEM
  0, 1,
#include "fibo.h";
};

void setup() {
  Serial.begin(9600);
  Serial.println(__FILE__);
  int n = sizeof fib / sizeof fib[0];
  for (int i = 0; i < n; i++) {
    Serial.print(i);
    Serial.print("\t");
    Serial.println(fib[i]);
    // lies aus SRAM, kein PROGMEM
  }
}

void loop() {}

and this is the include file:

fib[__COUNTER__ / 3] + fib[__COUNTER__ / 3 + 1],
#if __COUNTER__ < 3 * 44
#include __FILE__
#endif

Now compare it to the FLASH version:

const long PROGMEM fib[] = {
  //       ^^^^^^^
  0, 1,
#include "fibo.h";
};

void setup() {
  Serial.begin(9600);
  Serial.println(__FILE__);
  int n = sizeof fib / sizeof fib[0];
  for (int i = 0; i < n; i++) {
    Serial.print(i);
    Serial.print("\t");
    Serial.println(pgm_read_dword(&fib[i]));
    // read from FLASH because of PROGMEM
  }
}

void loop() {}

The include file is exactly the same.

Memory usage:

UNO R3 SRAM: 3100 bytes FLASH, 464 bytes SRAM

UNO R4 SRAM: 54392 byte FLASH, 4744 bytes SRAM

UNO R3 FLASH: 2040 bytes FLASH, 276 bytes SRAM

UNO R4 FLASH: 54168 bytes FLASH, 4552 bytes SRAM

As expected, using the R4, normally it does not make much sense to move constant data from SRAM to FLASH memory.

Have fun.

How to Handle Frequently used Functions using R3 vs. R4

Story

Example 1: Sine function

Example 2: Fibonacci Series

Code

sinus.zip

Credits

Klausj

Comments

Embed the widget on your own site

How to Handle Frequently used Functions using R3 vs. R4

How to Handle Frequently used Functions using R3 vs. R4

Story

Example 1: Sine function

Example 2: Fibonacci Series

Code

sinus.zip

Credits

Klausj

Comments

Related channels and tags