Published September 2, 2021 © GPL3+

AIfES - Inference Tutorial

This tutorial shows the different ways how to perform an inference in AIfES®.

AdvancedFull instructions provided1,775

Things used in this project

Hardware components

Arduino UNO

Arduino or an Arduino compatible board

Software apps and online services

AIfES®

Arduino IDE

Story

Introdution

In this tutorial we would like to show how to import an already trained artificial neural network (ANN) from an AI framework like TensorFlow®/Keras® into AIfES® and perform an inference. There are several ways to do this. We show this with a simple XOR example, which is already available in the AIfES® library. In the examples, however, only one possible way is shown. The variants shown in this tutorial are used to prepare to be able to further train an already pre-trained net on your Arduino® board.

More infos:

www.aifes.ai

AIfES for Arduino - GitHub

Install AIfES® in the Arduino IDE

To follow up the examples you have to download and install AIfES® (search for aifes) with the Arduino library manager.

The XOR example

We use the XOR problem as an example. More details about the XOR Problem can be found in this article. An XOR gate is replicated and trained with an ANN. The XOR truth table is shown in the image below.

Fig.1: XOR truth table

The used ANN structure for the replication is shown in the picture below. The sigmoid function is used as the activation function in the entire ANN.

Fig.2: ANN structure

Tensors

To get the input data into the AIfES® model a tensor has to be created first. Detailed information about the AIfES® tensor can be found in our documentation. In this example we want to perform an inference of all combinations of the XOR truth table in one step.

We start with the tensor for the inputs. First, the input data is stored in a 1D or 2D float array. In our example we have named the array input_data. After that we have to describe the shape of our tensor. For this we use a uint16_t array, which was named input_shape in our example. The first value specifies how many datasets are passed and the second how many inputs the ANN has. In our example we have 4 data sets and our ANN has 2 inputs.

The next step is the creation of the tensor of the data type aitensor_t which we called input_tensor. To this tensor we now pass the parameters. The following parameters are passed:

.dtype: Specifies which data type is to be used. aif32 stands for float 32 Bit. AIfES will soon support multiple data types (e.g. 32 Bit or 8 Bit integer).

.dim: Describes the number of dimensions. In our case we have 2 dimensions. For future CNNs, for example, a third dimension can also follow.

.shape: Pointer to our shape array input_shape

.data: Pointer to our input data array input_data

#define INPUTS  2
#define DATA_COUNT 4
//Option_1
float input_data[] = {0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f};
//Option_2
float input_data[DATA_COUNT][INPUTS] = {
    {0.0f, 0.0f},
    {0.0f, 1.0f},
    {1.0f, 0.0f},
    {1.0f, 1.0f}
};
uint16_t input_shape[] = {DATA_COUNT, INPUTS};
aitensor_t input_tensor;
input_tensor.dtype = aif32;
input_tensor.dim = 2;
input_tensor.shape = input_shape;
input_tensor.data = input_data;

For the results or output, we also need to create a tensor.

If only one inference (forward pass) is to be made for one dataset, the AIfES® function aialgo_forward_model() can be used, as shown in this AIfES® example (line 143). With this function, there is no need to worry about the output tensor, since only the pointer to an aitensor_t needs to be passed. The disadvantage is that the memory area is allocated in the AIfES® function. It is safer and more elegant to create a separate output tensor. This is what we do now.

The ANN has 1 output neuron and 4 data sets are input. So we expect 4 results. The array output_data for the calculated output has a length of 4. If our ANN had multiple outputs, we would have to create a corresponding 1D or 2D array. Also, the shape has changed compared to the input tensor. After inference, the results are then stored in the output_data array.

#define OUTPUTS 1
#define DATA_COUNT 4
float output_data[DATA_COUNT*OUTPUTS];
uint16_t output_shape[] = {DATA_COUNT, OUTPUTS}; 
aitensor_t output_tensor_1;                
output_tensor_1.dtype = aif32;             
output_tensor_1.dim = 2;                 
output_tensor_1.shape = output_shape; 
output_tensor_1.data = output_data;

Weights

In order to map an ANN in AIfES®, you need the structure and, of course, the trained weights. In AIfES® there are two possibilities to bring the weights into the model:

LayeredWeights: The weights are transferred layer wise

FlatWeights: All weights are passed in one array

The LayeredWeights method is more clearly arranged, but of course you have to make sure that the weights are assigned to the correct layer. The FlatWeights method is not so clear but much more practical, because with one line of code the weights can be passed. In addition, it has the advantage that the storage of the weights is easier and also the further training.

Of course, you have to pass the weights in the correct order so that the ANN can be calculated correctly. Many AI frameworks already have a coordinated order here and AIfES® has this as well.

The following figure shows the arrangement of the weights and the bias weights. The weights of the hidden layer are named >Wh< and the bias weights are named >Bh<. The output layer has the identifiers >Wout< and >Bout<. The sigmoid activation function was chosen for the entire ANN.

Fig.3: Overview of the weights

The following figure shows the arrangement of the weights for the LayeredWeights and FlatWeights method.

Fig.4: Arrangement of the weights

Extract the weights from Kerasmodel

The AIfES® Library already has an example of how to extract the weights from Keras. This is how you can find the examples with this ANN structure in your Arduino IDE:

File -> Examples -> AIfES for Arduino -> 0_Universal -> 0_XOR -> 4_XOR_Inference_keras

The Python script can be found here.

Keras has the same arrangement of weights and they can be extracted layer by layer. With .get_weights() the weights can be extracted from the trained model.

# Python code
# get the weights
weights = model.get_weights()
# get the weights from the different layers
hidden_weights = weights[0]
hidden_bias = weights[1]
output_weights = weights[2]
output_bias = weights[3]

To be able to use them in AIfES® we print them directly in C array style. The example corresponds to the LayeredWeights method. The indexing of the weights corresponds here to AIfES®. The names of the arrays have been changed in the code shown below. For the FlatWeights method, the weights would simply be arranged one after the other in one array.

# Python code
# print the weights for AIfES
print("float weights_hidden_layer[] = {")
print(str(hidden_weights[0, 0]) + "f,")
print(str(hidden_weights[0, 1]) + "f,")
print(str(hidden_weights[0, 2]) + "f,")
print(str(hidden_weights[1, 0]) + "f,")
print(str(hidden_weights[1, 1]) + "f,")
print(str(hidden_weights[1, 2]) + "f")
print("};")
print("")
print("float bias_weights_hidden_layer[] = {")
print(str(hidden_bias[0]) + "f,")
print(str(hidden_bias[1]) + "f,")
print(str(hidden_bias[2]) + "f")
print("};")
print("")
print("float weights_output_layer[] = {")
print(str(output_weights[0, 0]) + "f,")
print(str(output_weights[1, 0]) + "f,")
print(str(output_weights[2, 0]) + "f")
print("};")
print("")
print("float bias_weights_output_layer[] = {")
print(str(output_bias[0]) + "f")
print("};")

In C, the arrays would look like this:

// LayeredWeights
float weights_hidden_layer[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f, 
-7.6482f, -9.0155f};
float bias_weights_hidden_layer[] = {-2.9653f,  2.3677f, -1.5968f};
float weights_output_layer[] = {12.0305f, -6.5858f, 11.9371f};
float bias_weights_output_layer[] = {-5.4247f};
// FlatWeights
float FlatWeights[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f, 
-9.0155f, -2.9653f,  2.3677f, -1.5968f, 12.0305f, -6.5858f, 11.9371f, 
-5.4247f};

Especially for large networks it can be useful to store the weights in flash/program memory instead of SRAM. You can use PROGMEM here, but you should make a (void*) typecast when passing the pointer to the AIfES® layer. This will be described later. For further training, the weights should of course be stored in the SRAM.

// LayeredWeights
const float weights_hidden_layer[] PROGMEM = {-10.1164f, -8.4212f,
5.4396f, 7.297f, -7.6482f, -9.0155f};
const float bias_weights_hidden_layer[] PROGMEM = {-2.9653f,  2.3677f,
-1.5968f};
const float weights_output_layer[] PROGMEM = {12.0305f, -6.5858f,
11.9371f};
const float bias_weights_output_layer[] PROGMEM = {-5.4247f};
// FlatWeights
const float FlatWeights[] PROGMEM = {-10.1164f, -8.4212f, 5.4396f, 7.297f,
-7.6482f, -9.0155f, -2.9653f,  2.3677f, -1.5968f, 12.0305f, -6.5858f,
11.9371f, -5.4247f};

Layer creation in AIfES

This chapter describes how to create an AIfES® model. In our documentation you will also find a tutorial on this. AIfES® has a similar structure to Keras when describing the individual layers.

Input layer:

To describe our example ANN we start with the input layer. Similar to tensors, we configure the shape of the Inputs as an uint16_t array. Because the neural net takes only one sample as input (and not all samples at once), the first element of the shape array is 1. This number will become even more interesting for other stuctures😉. The second number is for the number of inputs, which in our example are 2. The ailayer_input_t object is created for the input layer. The dimension (.input_dim) is two-dimensional and the pointer to the shape array is passed to it.

#define INPUTS  2
// Input layer
uint16_t input_layer_shape[] = {1, INPUTS};
ailayer_input_t input_layer;
input_layer.input_dim = 2;
input_layer.input_shape = input_layer_shape;

Hidden (dense) layer:

Next, the hidden/dense layer is described. Our example ANN has only one hidden layer with 3 neurons. We use the ailayer_dense_t data type to describe the hidden layer.

#define NEURONS 3
ailayer_dense_t hidden_layer;
hidden_layer.neurons = NEURONS;

Here there is a difference in the LayeredWeights and the FlatWeights method. With the FlatWeights method we would be finished at this point. In the LayeredWeights method, we need to pass the weights of this layer. As explained before, a (void*) typecast must be performed here.

// LayeredWeights method
#define NEURONS 3
const float weights_hidden_layer[] PROGMEM = {-10.1164f, -8.4212f, 
5.4396f, 7.297f, -7.6482f, -9.0155f};
const float bias_weights_hidden_layer[] PROGMEM = {-2.9653f,  2.3677f, 
-1.5968f};
ailayer_dense_t hidden_layer;
hidden_layer.neurons = NEURONS;
hidden_layer.weights.data = (void*)weights_hidden_layer;
hidden_layer.bias.data = (void*)bias_weights_hidden_layer;

Activation function hidden layer:

Next, the activation function for the hidden layer is set. Wikipedia has a nice overview of the different functions. In our example we use the sigmoid function, which is described as follows:

ailayer_sigmoid_f32_t sigmoid_layer_1;

Of course, AIfES® has several activation functions. The Leaky ReLU and the ELU function have an alpha value which still has to be passed. The Softmax activation function should only be used for the output layer. Here is an overview:

ailayer_relu_f32_t          relu_layer;
ailayer_sigmoid_f32_t       sigmoid_layer;
ailayer_tanh_f32_t          tanh_layer;
ailayer_softsign_f32_t      softsign_layer;
ailayer_leaky_relu_f32_t    leaky_relu_layer;
ailayer_elu_f32_t           elu_layer;
//Alpha values
leaky_relu_layer.alpha = 0.01f;
elu_layer.alpha = 1.0f;
//Softmax
ailayer_softmax_f32_t       softmax_layer;

Output (dense) layer:

The output layer is also a dense layer. The description is the same as for the hidden layer. In our example network we have one output neuron. For deeper networks with several hidden layers, each layer is described with a dense and an activation layer.

#define OUTPUTS 1
ailayer_dense_t output_layer;
output_layer.neurons = OUTPUTS;

With the LayeredWeights method, the weights of the layer must be passed here again.

// LayeredWeights method
#define OUTPUTS 1
const float weights_output_layer[] PROGMEM = {12.0305f, -6.5858f, 
11.9371f};
const float bias_weights_output_layer[] PROGMEM = {-5.4247f};
ailayer_dense_t output_layer;
output_layer.neurons = OUTPUTS;
output_layer.weights.data = (void*) weights_output_layer;
output_layer.bias.data = (void*) bias_weights_output_layer;

Activation function output layer:

As with the hidden layer, a final activation function must be specified for the output layer.

ailayer_sigmoid_f32_t sigmoid_layer_2;

Pack and compile AIfES® model

After all layers have been created, they are packed together in an AIfES® model. For this we need the model itself, which is called aimodel_t. We also need a layer pointer ailayer_t to which we pass the individual layers one after the other in order to establish a connection.

aimodel_t model;
ailayer_t *x;

For each AIfES® layer there is a separate function that establishes a connection so that a complete model can be created.The transfer takes place in the order of the desired layers. The pointer x is used here to pass the individual layers. Finally, the entire model is inzialized/compiled. Hardware accelerators can also be activated via these functions. AIfES® supports e.g. the Arm CMSIS-DSP accelerators of the Cortex® series. How these are used is described in this example.

// Passing the layers to the AIfES model
model.input_layer = ailayer_input_f32_default(&input_layer);
x = ailayer_dense_f32_default(&hidden_layer, model.input_layer);
x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
x = ailayer_dense_f32_default(&output_layer, x);
model.output_layer = ailayer_sigmoid_f32_default(&sigmoid_layer_2, x);
// Compile the model
aialgo_compile_model(&model);

Also each activation function has its own connection function. Here is an overview:

ailayer_relu_f32_default();
ailayer_sigmoid_f32_default();
ailayer_softmax_f32_default();
ailayer_leaky_relu_f32_default();
ailayer_elu_f32_default();
ailayer_tanh_f32_default();
ailayer_softsign_f32_default();

Distribute the weights (only for the FlatWeights method)

With the LayeredWeights method, the weights were already passed when the dense layer was created. With the FlatWeights method this is done after compiling the model. Here all weights are passed in one step and is therefore comparable to the AIfES training, which will be explained in the next tutorial. For this reason, this method is particularly well suited for further training. First, the size of the required parameter (weights) storage space is calculated. This is done with the function aialgo_sizeof_parameter_memory() which returns the required size in byte.

// Parameter memory size in Byte
uint32_t parameter_memory_size = aialgo_sizeof_parameter_memory(&model);

The number of required weights can also be calculated back over the required memory space to control the length of the FlatWeights array. A float number corresponds to 4 bytes and can be retrieved via sizeof(float). A check of the length could be done e.g. like this:

// Calculate the number of float weights
uint32_t FlatWeight_array_length = parameter_memory_size / sizeof(float);
Serial.print(F("Length FlatWeight array: "));
Serial.println(FlatWeight_array_length);
//Check the array length
if(FlatWeight_array_length != sizeof(FlatWeights)/sizeof(float))
{
    Serial.println(F("ERROR!: Number of weights wrong"));
}

Finally, the FlatWeights array is passed to the model and distributed. Again, a (void*) typecast is performed because the array was stored in flash memory via PROGMEM.

aialgo_distribute_parameter_memory(&model, (void*) FlatWeights, parameter_memory_size);

Memory for the inference

For the inference temporary variables are needed to store intermediate results of the layers. First we calculate the required memory in bytes:

uint32_t memory_size = aialgo_sizeof_inference_memory(&model);

The memory can be reserved with malloc at program runtime or defined as an array.

void *memory_ptr = malloc(memory_size);
// Here is an alternative if no "malloc" should be used
byte memory_ptr[memory_size];

Assign the memory for intermediate results of an inference to the model:

aialgo_schedule_inference_memory(&model, memory_ptr, memory_size);

Inference and output

The inference itself is performed with one function call. The AIfES® model, the input tensor and the output tensor are passed on.

aialgo_inference_model(&model, &input_tensor, &output_tensor_1);

The output of the results is done via the data memory of the output tensor:

Serial.print("Output 0: ");
Serial.println(output_data[0],6);
Serial.print("Output 1: ");
Serial.println(output_data[1],6);
Serial.print("Output 2: ");
Serial.println(output_data[2],6);
Serial.print("Output 3: ");
Serial.println(output_data[3],6);

Examples in the attached source code

In the attached source code you will find two functions that show the content of the tutorial:

Inference_LayeredWeights()

Inference_FlatWeights()

And another function explained below.

Inference_FlatWeights_bytes() -> special

Special: Weights as byte array (for the FlatWeights method)

Sometimes it can be helpful to save the weights as a byte (uint8) array to read them e.g. from a file or to update the weights with a byte stream.This is also possible in AIfES and for this purpose the example function Inference_FlatWeights_bytes() has been created. This example is based on the Inference_FlatWeights() function with a few minor changes.

The FlatWeights array is not stored in flash memory so we can convert them to a byte (uint8) array.

float FlatWeights[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f, 
-9.0155f, -2.9653f,  2.3677f, -1.5968f, 12.0305f, -6.5858f, 11.9371f,
-5.4247f};

First a uint8 array is created. The length of the byte array is calculated by the AIfES® function aialgo_sizeof_parameter_memory() which was already used in the first example. Then a uint8 pointer is created and a typecast to the float array is performed. Finally the values are copied into the uint8 array

// Array for the weights in uint8
uint8_t FlatWeights_byte[parameter_memory_size];
// Typecast uint_8 pointer
uint8_t *FlatWeights_byte_ptr = (uint8_t *) FlatWeights;
uint32_t i = 0;
// Copy the values into the uint8 array
for (i = 0; i < parameter_memory_size; i++) {
    FlatWeights_byte[i] = FlatWeights_byte_ptr[i];
}

The transfer to the AIfES® model is done in the same way as before, but now the byte (uint8) array is used.

aialgo_distribute_parameter_memory(&model, (void*) FlatWeights_byte, 
parameter_memory_size);

What's next

A tutorial for training

AIfES®-Express Functions: Simplified functions to perform inference and training with one function call.

AIfES_Inference_tutorial.ino

  /*
  www.aifes.ai
  https://github.com/Fraunhofer-IMS/AIfES_for_Arduino
  Copyright (C) 2020-2021 Fraunhofer Institute for Microelectronic Circuits and Systems. All rights reserved.

  AIfES is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.
  
  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.
  
  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <https://www.gnu.org/licenses/>.

  AIfES_Inference_tutorial
  --------------------

  Versions:
    1.0.0   Initial version
  
  The sketch shows an example of how the inference of an already trained network is performed. 
  In the concrete example, a neural network was trained to map an XOR gate. 
  The neural network was trained in Keras and the configuration including the weights was imported into AIfES. 
  The network structure is 2-3(Sigmoid)-1(Sigmoid) and Sigmoid is used as activation function.
  The calculation is done in float 32.
  
  XOR truth table
  Input    Output
  0   0    0
  0   1    1
  1   0    1
  1   1    0
  
  Tested on:
    Arduino UNO
    Arduino Nano 33 BLE Sense
   
*/
#include <aifes.h>

#define INPUTS  2
#define NEURONS 3
#define OUTPUTS 1

#define DATA_COUNT 4

void setup() {
  Serial.begin(115200); //115200 baud rate (If necessary, change in the serial monitor)
  while (!Serial);

  Serial.println(F("AIfES inference tutorial"));
  Serial.println(F("Type >inference< to start"));

}

void loop() {

  while(Serial.available() > 0 ){
    String str = Serial.readString();
    if(str.indexOf("inference") > -1){        //Keyword "inference"
      
      Serial.println(F("Result LayeredWeights method"));
      Inference_LayeredWeights();
      
      Serial.println(F(""));
      Serial.println(F("Result FlatWeights method"));
      Inference_FlatWeights();

      Serial.println(F(""));
      Serial.println(F("Result FlatWeights bytes method"));
      Inference_FlatWeights_bytes();
    }
    else{
      Serial.println(F("unknown"));
      Serial.println(F("Type >inference<"));
    }
  }

}

void Inference_LayeredWeights(){

    //Tensor for the input data
    //Input data for the XOR ANN
    //float input_data[] = {0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f};  // 1D array example
    float input_data[DATA_COUNT][INPUTS] = {
        {0.0f, 0.0f},
        {0.0f, 1.0f},
        {1.0f, 0.0f},
        {1.0f, 1.0f}
    };

    //Tensor for the input data
    uint16_t input_shape[] = {DATA_COUNT, INPUTS};        // Definition of the input shape
    aitensor_t input_tensor;                              // Creation of the input AIfES tensor
    input_tensor.dtype = aif32;                           // Definition of the used data type, here float with 32 bits, different ones are available
    input_tensor.dim = 2;                                 // Dimensions of the tensor, here 2 dimensions, as specified at input_shape
    input_tensor.shape = input_shape;                     // Set the shape of the input_tensor
    input_tensor.data = input_data;                       // Assign the input_data array to the tensor. It expects a pointer to the array where the data is stored

    //Tensor for the output data
    float output_data[DATA_COUNT * OUTPUTS];              // Output data
    uint16_t output_shape[] = {DATA_COUNT, OUTPUTS};      // Definition of the output shape
    aitensor_t output_tensor_1;                           // Creation of the output AIfES tensor
    output_tensor_1.dtype = aif32;                        // Definition of the used data type, here float with 32 bits, different ones are available
    output_tensor_1.dim = 2;                              // Dimensions of the tensor, here 2 dimensions, as specified at input_shape
    output_tensor_1.shape = output_shape;                 // Set the shape of the output_tensor
    output_tensor_1.data = output_data;                   // Assign the output_data array to the tensor. It expects a pointer to the array where the data is stored

    // ---------------------------------- Layer definition ---------------------------------------

    // Input layer
    uint16_t input_layer_shape[] = {1, INPUTS};         // Definition of the input layer shape (The 1 must remain here regardless of the number of data sets)
    ailayer_input_t input_layer;                        // Creation of the AIfES input layer
    input_layer.input_dim = 2;                          // Definition of the input dimension (Must fit to the input tensor)
    input_layer.input_shape = input_layer_shape;        // Handover of the input layer shape

    // Dense layer (hidden layer)
    const float weights_hidden_layer[] PROGMEM = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f, -9.0155f};  //Hidden layer weights
    const float bias_weights_hidden_layer[] PROGMEM = {-2.9653f,  2.3677f, -1.5968f};                         //Hidden layer bias weights
    
    ailayer_dense_t hidden_layer;                                                                             // Creation of the AIfES hidden dense layer
    hidden_layer.neurons = NEURONS;                                                                           // Number of neurons
    hidden_layer.weights.data = (void*)weights_hidden_layer;                                                  // Passing the hidden layer weights. The (void*) is only needed if you use PROGMEM
    hidden_layer.bias.data = (void*)bias_weights_hidden_layer;                                                // Passing the hidden layer bias weights. The (void*) is only needed if you use PROGMEM


    // Sigmoid activation function
    ailayer_sigmoid_f32_t sigmoid_layer_1;

    /* // Alternative activation functions
    ailayer_relu_f32_t          relu_layer;
    ailayer_sigmoid_f32_t       sigmoid_layer;
    ailayer_tanh_f32_t          tanh_layer;
    ailayer_softsign_f32_t      softsign_layer;
    ailayer_leaky_relu_f32_t    leaky_relu_layer;
    ailayer_elu_f32_t           elu_layer;
    //Alpha values
    leaky_relu_layer.alpha = 0.01f;
    elu_layer.alpha = 1.0f;
    //Softmax
    ailayer_softmax_f32_t       softmax_layer; // Only for the output layer
    */

    // Output dense layer
    const float weights_output_layer[] PROGMEM = {12.0305f, -6.5858f, 11.9371f};  //Output dense layer weights
    const float bias_weights_output_layer[] PROGMEM = {-5.4247f};                 //Output dense layer bias weights
    
    ailayer_dense_t output_layer;                                                 // Creation of the AIfES ouput dense layer
    output_layer.neurons = OUTPUTS;                                               // Number of neurons
    output_layer.weights.data = (void*) weights_output_layer;                     // Passing the output layer weights. The (void*) is only needed if you use PROGMEM
    output_layer.bias.data = (void*) bias_weights_output_layer;                   // Passing the output layer bias weights. The (void*) is only needed if you use PROGMEM

    // Sigmoid activation function
    ailayer_sigmoid_f32_t sigmoid_layer_2;

    // --------------------------- Define the structure of the model ----------------------------

    aimodel_t model;  // AIfES model
    ailayer_t *x;     // Layer object from AIfES, contains the layers

    // Passing the layers to the AIfES model
    model.input_layer = ailayer_input_f32_default(&input_layer);
    x = ailayer_dense_f32_default(&hidden_layer, model.input_layer);
    x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
    x = ailayer_dense_f32_default(&output_layer, x);
    model.output_layer = ailayer_sigmoid_f32_default(&sigmoid_layer_2, x);

    aialgo_compile_model(&model); // Compile the AIfES model

    // -------------------------------- Allocate and schedule the working memory for inference ---------
    uint32_t memory_size = aialgo_sizeof_inference_memory(&model);
    void *memory_ptr = malloc(memory_size);
    // Here is an alternative if no "malloc" should be used
    //byte memory_ptr[memory_size];

    // Schedule the memory over the model
    aialgo_schedule_inference_memory(&model, memory_ptr, memory_size);

     // ------------------------------------- Run the inference ------------------------------------
    aialgo_inference_model(&model, &input_tensor, &output_tensor_1);

    // ------------------------------------- Print result ------------------------------------
    Serial.print("Output 0: ");
    Serial.println(output_data[0],6);
    Serial.print("Output 1: ");
    Serial.println(output_data[1],6);
    Serial.print("Output 2: ");
    Serial.println(output_data[2],6);
    Serial.print("Output 3: ");
    Serial.println(output_data[3],6);


    free(memory_ptr);

}

void Inference_FlatWeights(){

    //FlatWeights array
    const float FlatWeights[] PROGMEM = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f, -9.0155f, -2.9653f,  2.3677f, -1.5968f, 12.0305f, -6.5858f, 11.9371f,-5.4247f};

    //Tensor for the input data
    //float input_data[] = {0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f};    // Input data for the XOR ANN
    float input_data[DATA_COUNT][INPUTS] = {
        {0.0f, 0.0f},
        {0.0f, 1.0f},
        {1.0f, 0.0f},
        {1.0f, 1.0f}
    };

    //Tensor for the input data
    uint16_t input_shape[] = {DATA_COUNT, INPUTS};      // Definition of the input shape
    aitensor_t input_tensor;                            // Creation of the input AIfES tensor
    input_tensor.dtype = aif32;                         // Definition of the used data type, here float with 32 bits, different ones are available
    input_tensor.dim = 2;                               // Dimensions of the tensor, here 2 dimensions, as specified at input_shape
    input_tensor.shape = input_shape;                   // Set the shape of the input_tensor
    input_tensor.data = input_data;                     // Assign the input_data array to the tensor. It expects a pointer to the array where the data is stored

    //Tensor for the output data
    float output_data[DATA_COUNT * OUTPUTS];            // Output data
    uint16_t output_shape[] = {DATA_COUNT, OUTPUTS};    // Definition of the output shape
    aitensor_t output_tensor_1;                         // Creation of the output AIfES tensor
    output_tensor_1.dtype = aif32;                      // Definition of the used data type, here float with 32 bits, different ones are available
    output_tensor_1.dim = 2;                            // Dimensions of the tensor, here 2 dimensions, as specified at input_shape
    output_tensor_1.shape = output_shape;               // Set the shape of the output_tensor
    output_tensor_1.data = output_data;                 // Assign the output_data array to the tensor. It expects a pointer to the array where the data is stored

    // ---------------------------------- Layer definition ---------------------------------------

    // Input layer
    uint16_t input_layer_shape[] = {1, INPUTS};         // Definition of the input layer shape (The 1 must remain here regardless of the number of data sets)
    ailayer_input_t input_layer;                        // Creation of the AIfES input layer
    input_layer.input_dim = 2;                          // Definition of the input dimension (Must fit to the input tensor)
    input_layer.input_shape = input_layer_shape;        // Handover of the input layer shape

    // Dense layer (hidden layer)
    ailayer_dense_t hidden_layer;                       // Creation of the AIfES hidden dense layer
    hidden_layer.neurons = NEURONS;                     // Number of neurons

    // Sigmoid activation function
    ailayer_sigmoid_f32_t sigmoid_layer_1;

    /* // Alternative activation functions
    ailayer_relu_f32_t          relu_layer;
    ailayer_sigmoid_f32_t       sigmoid_layer;
    ailayer_tanh_f32_t          tanh_layer;
    ailayer_softsign_f32_t      softsign_layer;
    ailayer_leaky_relu_f32_t    leaky_relu_layer;
    ailayer_elu_f32_t           elu_layer;
    //Alpha values
    leaky_relu_layer.alpha = 0.01f;
    elu_layer.alpha = 1.0f;
    //Softmax
    ailayer_softmax_f32_t       softmax_layer; // Only for the output layer
    */

    // Output dense layer
    ailayer_dense_t output_layer;                       // Creation of the AIfES ouput dense layer
    output_layer.neurons = OUTPUTS;                     // Number of neurons

    // Sigmoid activation function
    ailayer_sigmoid_f32_t sigmoid_layer_2;

    // --------------------------- Define the structure of the model ----------------------------

    aimodel_t model;                                    // AIfES model
    ailayer_t *x;                                       // Layer object from AIfES, contains the layers

    // Passing the layers to the AIfES model
    model.input_layer = ailayer_input_f32_default(&input_layer);
    x = ailayer_dense_f32_default(&hidden_layer, model.input_layer);
    x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
    x = ailayer_dense_f32_default(&output_layer, x);
    model.output_layer = ailayer_sigmoid_f32_default(&sigmoid_layer_2, x);

    aialgo_compile_model(&model); // Compile the AIfES model

    // -------------------------------- Allocate and schedule the working memory for inference ---------

    // Parameter memory size in Byte
    uint32_t parameter_memory_size = aialgo_sizeof_parameter_memory(&model);
    
    // Calculate the number of float weights
    uint32_t FlatWeight_array_length = parameter_memory_size / sizeof(float);
    Serial.print(F("Length FlatWeight array: "));
    Serial.println(FlatWeight_array_length);
    
    //Check the array length
    if(FlatWeight_array_length != sizeof(FlatWeights)/sizeof(float))
    {
      Serial.println(F("ERROR!: Number of weights wrong"));
    }
    
    aialgo_distribute_parameter_memory(&model, (void*) FlatWeights, parameter_memory_size); //The (void*) is only needed if you use PROGMEM

    uint32_t memory_size = aialgo_sizeof_inference_memory(&model);
    void *memory_ptr = malloc(memory_size);
    // Here is an alternative if no "malloc" should be used
    //byte memory_ptr[memory_size];

    // Schedule the memory over the model
    aialgo_schedule_inference_memory(&model, memory_ptr, memory_size);

     // ------------------------------------- Run the inference ------------------------------------
    aialgo_inference_model(&model, &input_tensor, &output_tensor_1);

    // ------------------------------------- Print result ------------------------------------
    Serial.print("Output 0: ");
    Serial.println(output_data[0],6);
    Serial.print("Output 1: ");
    Serial.println(output_data[1],6);
    Serial.print("Output 2: ");
    Serial.println(output_data[2],6);
    Serial.print("Output 3: ");
    Serial.println(output_data[3],6);

    free(memory_ptr);

}

void Inference_FlatWeights_bytes(){

    //FlatWeights array
    float FlatWeights[] = {-10.1164f, -8.4212f, 5.4396f, 7.297f, -7.6482f, -9.0155f, -2.9653f,  2.3677f, -1.5968f, 12.0305f, -6.5858f, 11.9371f,-5.4247f};

    //Tensor for the input data
    //float input_data[] = {0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f};    // Input data for the XOR ANN
    float input_data[DATA_COUNT][INPUTS] = {
        {0.0f, 0.0f},
        {0.0f, 1.0f},
        {1.0f, 0.0f},
        {1.0f, 1.0f}
    };

    //Tensor for the input data
    uint16_t input_shape[] = {DATA_COUNT, INPUTS};      // Definition of the input shape
    aitensor_t input_tensor;                            // Creation of the input AIfES tensor
    input_tensor.dtype = aif32;                         // Definition of the used data type, here float with 32 bits, different ones are available
    input_tensor.dim = 2;                               // Dimensions of the tensor, here 2 dimensions, as specified at input_shape
    input_tensor.shape = input_shape;                   // Set the shape of the input_tensor
    input_tensor.data = input_data;                     // Assign the input_data array to the tensor. It expects a pointer to the array where the data is stored

    //Tensor for the output data
    float output_data[DATA_COUNT * OUTPUTS];            // Output data
    uint16_t output_shape[] = {DATA_COUNT, OUTPUTS};    // Definition of the output shape
    aitensor_t output_tensor_1;                         // Creation of the output AIfES tensor
    output_tensor_1.dtype = aif32;                      // Definition of the used data type, here float with 32 bits, different ones are available
    output_tensor_1.dim = 2;                            // Dimensions of the tensor, here 2 dimensions, as specified at input_shape
    output_tensor_1.shape = output_shape;               // Set the shape of the output_tensor
    output_tensor_1.data = output_data;                 // Assign the output_data array to the tensor. It expects a pointer to the array where the data is stored

    // ---------------------------------- Layer definition ---------------------------------------

    // Input layer
    uint16_t input_layer_shape[] = {1, INPUTS};         // Definition of the input layer shape (The 1 must remain here regardless of the number of data sets)
    ailayer_input_t input_layer;                        // Creation of the AIfES input layer
    input_layer.input_dim = 2;                          // Definition of the input dimension (Must fit to the input tensor)
    input_layer.input_shape = input_layer_shape;        // Handover of the input layer shape

    // Dense layer (hidden layer)
    ailayer_dense_t hidden_layer;                       // Creation of the AIfES hidden dense layer
    hidden_layer.neurons = NEURONS;                     // Number of neurons

    // Sigmoid activation function
    ailayer_sigmoid_f32_t sigmoid_layer_1;

    /* // Alternative activation functions
    ailayer_relu_f32_t          relu_layer;
    ailayer_sigmoid_f32_t       sigmoid_layer;
    ailayer_tanh_f32_t          tanh_layer;
    ailayer_softsign_f32_t      softsign_layer;
    ailayer_leaky_relu_f32_t    leaky_relu_layer;
    ailayer_elu_f32_t           elu_layer;
    //Alpha values
    leaky_relu_layer.alpha = 0.01f;
    elu_layer.alpha = 1.0f;
    //Softmax
    ailayer_softmax_f32_t       softmax_layer; // Only for the output layer
    */

    // Output dense layer
    ailayer_dense_t output_layer;                       // Creation of the AIfES ouput dense layer
    output_layer.neurons = OUTPUTS;                     // Number of neurons

    // Sigmoid activation function
    ailayer_sigmoid_f32_t sigmoid_layer_2;

    // --------------------------- Define the structure of the model ----------------------------

    aimodel_t model;                                    // AIfES model
    ailayer_t *x;                                       // Layer object from AIfES, contains the layers

    // Passing the layers to the AIfES model
    model.input_layer = ailayer_input_f32_default(&input_layer);
    x = ailayer_dense_f32_default(&hidden_layer, model.input_layer);
    x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
    x = ailayer_dense_f32_default(&output_layer, x);
    model.output_layer = ailayer_sigmoid_f32_default(&sigmoid_layer_2, x);

    aialgo_compile_model(&model); // Compile the AIfES model

    // -------------------------------- Allocate and schedule the working memory for inference ---------

    // Parameter memory size in Byte
    uint32_t parameter_memory_size = aialgo_sizeof_parameter_memory(&model);
    
    // Calculate the number of float weights
    uint32_t FlatWeight_array_length = parameter_memory_size / sizeof(float);
    Serial.print(F("Length FlatWeight array: "));
    Serial.println(FlatWeight_array_length);
    
    //Check the array length
    if(FlatWeight_array_length != sizeof(FlatWeights)/sizeof(float))
    {
      Serial.println(F("ERROR!: Number of weights wrong"));
    }

    // Array for the weights in uint8
    uint8_t FlatWeights_byte[parameter_memory_size];

    // Typecast uint_8 pointer
    uint8_t *FlatWeights_byte_ptr = (uint8_t *) FlatWeights;

    uint32_t i = 0;

    // Copy the values into the uint8 array
    for (i = 0; i < parameter_memory_size; i++) {
      FlatWeights_byte[i] = FlatWeights_byte_ptr[i];
    }
  
    aialgo_distribute_parameter_memory(&model, (void*) FlatWeights_byte, parameter_memory_size); //The (void*) is only needed if you use PROGMEM

    uint32_t memory_size = aialgo_sizeof_inference_memory(&model);
    void *memory_ptr = malloc(memory_size);
    // Here is an alternative if no "malloc" should be used
    //byte memory_ptr[memory_size];

    // Schedule the memory over the model
    aialgo_schedule_inference_memory(&model, memory_ptr, memory_size);

     // ------------------------------------- Run the inference ------------------------------------
    aialgo_inference_model(&model, &input_tensor, &output_tensor_1);

    // ------------------------------------- Print result ------------------------------------
    Serial.print("Output 0: ");
    Serial.println(output_data[0],6);
    Serial.print("Output 1: ");
    Serial.println(output_data[1],6);
    Serial.print("Output 2: ");
    Serial.println(output_data[2],6);
    Serial.print("Output 3: ");
    Serial.println(output_data[3],6);

    free(memory_ptr);

}