In this tutorial we would like to show how to import an already trained artificial neural network (ANN) from an AI framework like TensorFlow®/Keras® into AIfES® and perform an inference. There are several ways to do this. We show this with a simple XOR example, which is already available in the AIfES® library. In the examples, however, only one possible way is shown. The variants shown in this tutorial are used to prepare to be able to further train an already pre-trained net on your Arduino® board.
More infos:
Install AIfES® in the Arduino IDETo follow up the examples you have to download and install AIfES® (search for aifes) with the Arduino library manager.
The XOR exampleWe use the XOR problem as an example. More details about the XOR Problem can be found in this article. An XOR gate is replicated and trained with an ANN. The XOR truth table is shown in the image below.
The used ANN structure for the replication is shown in the picture below. The sigmoid function is used as the activation function in the entire ANN.
To get the input data into the AIfES® model a tensor has to be created first. Detailed information about the AIfES® tensor can be found in our documentation. In this example we want to perform an inference of all combinations of the XOR truth table in one step.
We start with the tensor for the inputs. First, the input data is stored in a 1D or 2D float array. In our example we have named the array input_data
. After that we have to describe the shape of our tensor. For this we use a uint16_t
array, which was named input_shape
in our example. The first value specifies how many datasets are passed and the second how many inputs the ANN has. In our example we have 4 data sets and our ANN has 2 inputs.
The next step is the creation of the tensor of the data type aitensor_t
which we called input_tensor
. To this tensor we now pass the parameters. The following parameters are passed:
.dtype:
Specifies which data type is to be used.aif32
stands for float 32 Bit. AIfES will soon support multiple data types (e.g. 32 Bit or 8 Bit integer).
.dim:
Describes the number of dimensions. In our case we have 2 dimensions. For future CNNs, for example, a third dimension can also follow.
.shape:
Pointer to our shape arrayinput_shape
.data:
Pointer to our input data arrayinput_data
#define INPUTS 2
#define DATA_COUNT 4
//Option_1
float input_data[] = {0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f};
//Option_2
float input_data[DATA_COUNT][INPUTS] = {
{0.0f, 0.0f},
{0.0f, 1.0f},
{1.0f, 0.0f},
{1.0f, 1.0f}
};
uint16_t input_shape[] = {DATA_COUNT, INPUTS};
aitensor_t input_tensor;
input_tensor.dtype = aif32;
input_tensor.dim = 2;
input_tensor.shape = input_shape;
input_tensor.data = input_data;
For the results or output, we also need to create a tensor.
If only one inference (forward pass) is to be made for one dataset, the AIfES® function aialgo_forward_model()
can be used, as shown in this AIfES® example (line 143). With this function, there is no need to worry about the output tensor, since only the pointer to an aitensor_t
needs to be passed. The disadvantage is that the memory area is allocated in the AIfES® function. It is safer and more elegant to create a separate output tensor. This is what we do now.
The ANN has 1 output neuron and 4 data sets are input. So we expect 4 results. The array output_data
for the calculated output has a length of 4. If our ANN had multiple outputs, we would have to create a corresponding 1D or 2D array. Also, the shape has changed compared to the input tensor. After inference, the results are then stored in the output_data
array.
#define OUTPUTS 1
#define DATA_COUNT 4
float output_data[DATA_COUNT*OUTPUTS];
uint16_t output_shape[] = {DATA_COUNT, OUTPUTS};
aitensor_t output_tensor_1;
output_tensor_1.dtype = aif32;
output_tensor_1.dim = 2;
output_tensor_1.shape = output_shape;
output_tensor_1.data = output_data;
WeightsIn order to map an ANN in AIfES®, you need the structure and, of course, the trained weights. In AIfES® there are two possibilities to bring the weights into the model:
LayeredWeights
: The weights are transferred layer wise
FlatWeights
: All weights are passed in one array
The LayeredWeights
method is more clearly arranged, but of course you have to make sure that the weights are assigned to the correct layer. The FlatWeights
method is not so clear but much more practical, because with one line of code the weights can be passed. In addition, it has the advantage that the storage of the weights is easier and also the further training.
Of course, you have to pass the weights in the correct order so that the ANN can be calculated correctly. Many AI frameworks already have a coordinated order here and AIfES® has this as well.
The following figure shows the arrangement of the weights and the bias weights. The weights of the hidden layer are named >Wh< and the bias weights are named >Bh<. The output layer has the identifiers >Wout< and >Bout<. The sigmoid activation function was chosen for the entire ANN.
The following figure shows the arrangement of the weights for the LayeredWeights
and FlatWeights
method.
Extract the weights from Kerasmodel
The AIfES® Library already has an example of how to extract the weights from Keras. This is how you can find the examples with this ANN structure in your Arduino IDE:
File
-> Examples
-> AIfES for Arduino
-> 0_Universal
-> 0_XOR
-> 4_XOR_Inference_keras
The Python script can be found here.
Keras has the same arrangement of weights and they can be extracted layer by layer. With .get_weights()
the weights can be extracted from the trained model.
# Python code
# get the weights
weights = model.get_weights()
# get the weights from the different layers
hidden_weights = weights[0]
hidden_bias = weights[1]
output_weights = weights[2]
output_bias = weights[3]
To be able to use them in AIfES® we print them directly in C array style. The example corresponds to the LayeredWeights
method. The indexing of the weights corresponds here to AIfES®. The names of the arrays have been changed in the code shown below. For the FlatWeights
method, the weights would simply be arranged one after the other in one array.
# Python code
# print the weights for AIfES
print("float weights_hidden_layer[] = {")
print(str(hidden_weights[0, 0]) + "f,")
print(str(hidden_weights[0, 1]) + "f,")
print(str(hidden_weights[0, 2]) + "f,")
print(str(hidden_weights[1, 0]) + "f,")
print(str(hidden_weights[1, 1]) + "f,")
print(str(hidden_weights[1, 2]) + "f")
print("};")
print("")
print("float bias_weights_hidden_layer[] = {")
print(str(hidden_bias[0]) + "f,")
print(str(hidden_bias[1]) + "f,")
print(str(hidden_bias[2]) + "f")
print("};")
print("")
print("float weights_output_layer[] = {")
print(str(output_weights[0, 0]) + "f,")
print(str(output_weights[1, 0]) + "f,")
print(str(output_weights[2, 0]) + "f")
print("};")
print("")
print("float bias_weights_output_layer[] = {")
print(str(output_bias[0]) + "f")
print("};")
In C, the arrays would look like this:
// LayeredWeights
float weights_hidden_layer[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f,
-7.6482f, -9.0155f};
float bias_weights_hidden_layer[] = {-2.9653f, 2.3677f, -1.5968f};
float weights_output_layer[] = {12.0305f, -6.5858f, 11.9371f};
float bias_weights_output_layer[] = {-5.4247f};
// FlatWeights
float FlatWeights[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f,
-9.0155f, -2.9653f, 2.3677f, -1.5968f, 12.0305f, -6.5858f, 11.9371f,
-5.4247f};
Especially for large networks it can be useful to store the weights in flash/program memory instead of SRAM. You can use PROGMEM
here, but you should make a (void*)
typecast when passing the pointer to the AIfES® layer. This will be described later. For further training, the weights should of course be stored in the SRAM.
// LayeredWeights
const float weights_hidden_layer[] PROGMEM = {-10.1164f, -8.4212f,
5.4396f, 7.297f, -7.6482f, -9.0155f};
const float bias_weights_hidden_layer[] PROGMEM = {-2.9653f, 2.3677f,
-1.5968f};
const float weights_output_layer[] PROGMEM = {12.0305f, -6.5858f,
11.9371f};
const float bias_weights_output_layer[] PROGMEM = {-5.4247f};
// FlatWeights
const float FlatWeights[] PROGMEM = {-10.1164f, -8.4212f, 5.4396f, 7.297f,
-7.6482f, -9.0155f, -2.9653f, 2.3677f, -1.5968f, 12.0305f, -6.5858f,
11.9371f, -5.4247f};
Layer creation in AIfESThis chapter describes how to create an AIfES® model. In our documentation you will also find a tutorial on this. AIfES® has a similar structure to Keras when describing the individual layers.
Input layer:
To describe our example ANN we start with the input layer. Similar to tensors, we configure the shape of the Inputs as an uint16_t
array. Because the neural net takes only one sample as input (and not all samples at once), the first element of the shape array is 1. This number will become even more interesting for other stuctures😉. The second number is for the number of inputs, which in our example are 2. The ailayer_input_t
object is created for the input layer. The dimension (.input_dim
) is two-dimensional and the pointer to the shape array is passed to it.
#define INPUTS 2
// Input layer
uint16_t input_layer_shape[] = {1, INPUTS};
ailayer_input_t input_layer;
input_layer.input_dim = 2;
input_layer.input_shape = input_layer_shape;
Hidden (dense) layer:
Next, the hidden/dense layer is described. Our example ANN has only one hidden layer with 3 neurons. We use the ailayer_dense_t
data type to describe the hidden layer.
#define NEURONS 3
ailayer_dense_t hidden_layer;
hidden_layer.neurons = NEURONS;
Here there is a difference in the LayeredWeights
and the FlatWeights
method. With the FlatWeights
method we would be finished at this point. In the LayeredWeights
method, we need to pass the weights of this layer. As explained before, a (void*)
typecast must be performed here.
// LayeredWeights method
#define NEURONS 3
const float weights_hidden_layer[] PROGMEM = {-10.1164f, -8.4212f,
5.4396f, 7.297f, -7.6482f, -9.0155f};
const float bias_weights_hidden_layer[] PROGMEM = {-2.9653f, 2.3677f,
-1.5968f};
ailayer_dense_t hidden_layer;
hidden_layer.neurons = NEURONS;
hidden_layer.weights.data = (void*)weights_hidden_layer;
hidden_layer.bias.data = (void*)bias_weights_hidden_layer;
Activation function hidden layer:
Next, the activation function for the hidden layer is set. Wikipedia has a nice overview of the different functions. In our example we use the sigmoid function, which is described as follows:
ailayer_sigmoid_f32_t sigmoid_layer_1;
Of course, AIfES® has several activation functions. The Leaky ReLU and the ELU function have an alpha value which still has to be passed. The Softmax activation function should only be used for the output layer. Here is an overview:
ailayer_relu_f32_t relu_layer;
ailayer_sigmoid_f32_t sigmoid_layer;
ailayer_tanh_f32_t tanh_layer;
ailayer_softsign_f32_t softsign_layer;
ailayer_leaky_relu_f32_t leaky_relu_layer;
ailayer_elu_f32_t elu_layer;
//Alpha values
leaky_relu_layer.alpha = 0.01f;
elu_layer.alpha = 1.0f;
//Softmax
ailayer_softmax_f32_t softmax_layer;
Output (dense) layer:
The output layer is also a dense layer. The description is the same as for the hidden layer. In our example network we have one output neuron. For deeper networks with several hidden layers, each layer is described with a dense and an activation layer.
#define OUTPUTS 1
ailayer_dense_t output_layer;
output_layer.neurons = OUTPUTS;
With the LayeredWeights
method, the weights of the layer must be passed here again.
// LayeredWeights method
#define OUTPUTS 1
const float weights_output_layer[] PROGMEM = {12.0305f, -6.5858f,
11.9371f};
const float bias_weights_output_layer[] PROGMEM = {-5.4247f};
ailayer_dense_t output_layer;
output_layer.neurons = OUTPUTS;
output_layer.weights.data = (void*) weights_output_layer;
output_layer.bias.data = (void*) bias_weights_output_layer;
Activation function output layer:
As with the hidden layer, a final activation function must be specified for the output layer.
ailayer_sigmoid_f32_t sigmoid_layer_2;
Pack and compile AIfES® modelAfter all layers have been created, they are packed together in an AIfES® model. For this we need the model itself, which is called aimodel_t
. We also need a layer pointer ailayer_t
to which we pass the individual layers one after the other in order to establish a connection.
aimodel_t model;
ailayer_t *x;
For each AIfES® layer there is a separate function that establishes a connection so that a complete model can be created.The transfer takes place in the order of the desired layers. The pointer x
is used here to pass the individual layers. Finally, the entire model is inzialized/compiled. Hardware accelerators can also be activated via these functions. AIfES® supports e.g. the Arm CMSIS-DSP accelerators of the Cortex® series. How these are used is described in this example.
// Passing the layers to the AIfES model
model.input_layer = ailayer_input_f32_default(&input_layer);
x = ailayer_dense_f32_default(&hidden_layer, model.input_layer);
x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
x = ailayer_dense_f32_default(&output_layer, x);
model.output_layer = ailayer_sigmoid_f32_default(&sigmoid_layer_2, x);
// Compile the model
aialgo_compile_model(&model);
Also each activation function has its own connection function. Here is an overview:
ailayer_relu_f32_default();
ailayer_sigmoid_f32_default();
ailayer_softmax_f32_default();
ailayer_leaky_relu_f32_default();
ailayer_elu_f32_default();
ailayer_tanh_f32_default();
ailayer_softsign_f32_default();
Distribute the weights (only for the FlatWeights method)With the LayeredWeights
method, the weights were already passed when the dense layer was created. With the FlatWeights
method this is done after compiling the model. Here all weights are passed in one step and is therefore comparable to the AIfES training, which will be explained in the next tutorial. For this reason, this method is particularly well suited for further training. First, the size of the required parameter (weights) storage space is calculated. This is done with the function aialgo_sizeof_parameter_memory()
which returns the required size in byte.
// Parameter memory size in Byte
uint32_t parameter_memory_size = aialgo_sizeof_parameter_memory(&model);
The number of required weights can also be calculated back over the required memory space to control the length of the FlatWeights
array. A float number corresponds to 4 bytes and can be retrieved via sizeof(float)
. A check of the length could be done e.g. like this:
// Calculate the number of float weights
uint32_t FlatWeight_array_length = parameter_memory_size / sizeof(float);
Serial.print(F("Length FlatWeight array: "));
Serial.println(FlatWeight_array_length);
//Check the array length
if(FlatWeight_array_length != sizeof(FlatWeights)/sizeof(float))
{
Serial.println(F("ERROR!: Number of weights wrong"));
}
Finally, the FlatWeights
array is passed to the model and distributed. Again, a (void*)
typecast is performed because the array was stored in flash memory via PROGMEM
.
aialgo_distribute_parameter_memory(&model, (void*) FlatWeights, parameter_memory_size);
Memory for the inferenceFor the inference temporary variables are needed to store intermediate results of the layers. First we calculate the required memory in bytes:
uint32_t memory_size = aialgo_sizeof_inference_memory(&model);
The memory can be reserved with malloc at program runtime or defined as an array.
void *memory_ptr = malloc(memory_size);
// Here is an alternative if no "malloc" should be used
byte memory_ptr[memory_size];
Assign the memory for intermediate results of an inference to the model:
aialgo_schedule_inference_memory(&model, memory_ptr, memory_size);
Inference and outputThe inference itself is performed with one function call. The AIfES® model, the input tensor and the output tensor are passed on.
aialgo_inference_model(&model, &input_tensor, &output_tensor_1);
The output of the results is done via the data memory of the output tensor:
Serial.print("Output 0: ");
Serial.println(output_data[0],6);
Serial.print("Output 1: ");
Serial.println(output_data[1],6);
Serial.print("Output 2: ");
Serial.println(output_data[2],6);
Serial.print("Output 3: ");
Serial.println(output_data[3],6);
Examples in the attached source codeIn the attached source code you will find two functions that show the content of the tutorial:
- Inference_LayeredWeights()
- Inference_FlatWeights()
And another function explained below.
- Inference_FlatWeights_bytes() -> special
Sometimes it can be helpful to save the weights as a byte (uint8) array to read them e.g. from a file or to update the weights with a byte stream.This is also possible in AIfES and for this purpose the example function Inference_FlatWeights_bytes()
has been created. This example is based on the Inference_FlatWeights()
function with a few minor changes.
The FlatWeights array is not stored in flash memory so we can convert them to a byte (uint8) array.
float FlatWeights[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f,
-9.0155f, -2.9653f, 2.3677f, -1.5968f, 12.0305f, -6.5858f, 11.9371f,
-5.4247f};
First a uint8 array is created. The length of the byte array is calculated by the AIfES® function aialgo_sizeof_parameter_memory()
which was already used in the first example. Then a uint8 pointer is created and a typecast to the float array is performed. Finally the values are copied into the uint8 array
// Array for the weights in uint8
uint8_t FlatWeights_byte[parameter_memory_size];
// Typecast uint_8 pointer
uint8_t *FlatWeights_byte_ptr = (uint8_t *) FlatWeights;
uint32_t i = 0;
// Copy the values into the uint8 array
for (i = 0; i < parameter_memory_size; i++) {
FlatWeights_byte[i] = FlatWeights_byte_ptr[i];
}
The transfer to the AIfES® model is done in the same way as before, but now the byte (uint8) array is used.
aialgo_distribute_parameter_memory(&model, (void*) FlatWeights_byte,
parameter_memory_size);
What's next- A tutorial for training
- AIfES®-Express Functions: Simplified functions to perform inference and training with one function call.
Comments
Please log in or sign up to comment.