I have been steadily improving this robot platform for a few years.
The most recent work added a Walabot radar sensor for a robotic assistant project.
https://www.hackster.io/user462411/walabot-security-robot-with-alexa-command-and-control-c979e6
Since I am always looking for new adventures, I found the Autonomous Robot Challenge presented an opportunity to push my robot platform to greater usefulness. I am excited to add AI features to help perform a autonomous search and rescue behavior. This will be used to classify data from a camera, microphone , Walabot radar sensor and infrared temperature sensor.
Introduction to Neural Networks
Most of what I know about neural networks comes from this free online book: Neural networks and deep learning by Michael Nielsen. Without the book I would have been almost completely lost trying to figure out how the donkeycar software works and other problems involved in this project. The brevity and completeness of the book reminds me of "The C Programming Language" by Kernighan and Ritchie. Reading a few chapters of the book is a good introduction to what makes the donkeycar software work.
I have been moved
I moved my place of residence during this project. This was quite a disturbance to workflow, thought processes and overall progress. The new place is a better place to do project work and for that I am thankful.
Donkeycar as a first working example.
The donkeycar project serves as a great example for future work with neural networks. There is simply not enough project time to start work from a completely blank project. It is hoped that the process of adding features to the donkeycar project will teach neural network theory and practice quickly.
Adding More Neural Network power to the Project
I will show what the Ultra96 board can do to add more neural network processing power to the design. I will be benchmarking key pieces of neural network code using just the quad core A53 ARM processor and comparing the code execution time to code execution time with a system accelerated by FPGA fabric used in conjunction with the same quad core ARM A53 processor.
Testing Donkeycar on the bench
Building Donkeycar
Modifying Donkeycar software
The Donkeycar software as downloaded is a stateless design, The software does not develop a sense of location on the racetrack, not in a variable that we can use directly. This is a bit of a challenge since I want to deliver a package on a certain section of the racetrack. What should I do about this?
Options as I see them:
- Install a process that watches a GPS and drops the package at the right spot on the racetrack. Probably the easiest option.
- Create a parallel neural network to signal the drop the package, to run in parallel with the throttle and steering angle neural networks. Probably the second easiest option. There may be a need to filter a drop the package signal as it could be noisy in a stateless design - unless a special sign is used as a drop cue.
- Heavily modify the Donkey Car software to develop a sense of where donkeycar is along the racetrack. This is the most difficult option but the one with the greatest advantage with respect to creating an autonomous robot.
Ultra96 board "hello world"
The first step with the Ultra96 board is to get started with the design tools. Lucky for us there is a good introduction project. Thanks Adam Taylor!
Intoduction to Ultra96 board and design tools https://www.hackster.io/adam-taylor/accelerating-your-ultra96-developments-806a72
This project code will be modified to benchmark key pieces of neural network code. An example of such code is the activation function.
This function and others like it are useful to computing neural networks. The derivative of the function is needed for training the neural network. Several techniques will be demonstrated to speed the calculation. Several floating point operations would be needed for most simple implementations of the exponential function. Deep learning makes us want this to be very fast!
Logistic Function Test1
In this first test the logistic function is put into the previously used "hello world" example matrix multiply project where it would go in neural network calculations. The matrix multiply example project is copied and the logistic function is added to the accelerated and non-accelerated code.
First the mmult.cpp file is changed on line 78 to add the logistic function.
void mmult_accel(float A[N*N], float B[N*N], float C[N*N])
{
float _A[N][N], _B[N][N];
#pragma HLS array_partition variable=_A block factor=8 dim=2
#pragma HLS array_partition variable=_B block factor=8 dim=1
for(int i=0; i<N; i++) {
for(int j=0; j<N; j++) {
#pragma HLS PIPELINE
_A[i][j] = A[i * N + j];
_B[i][j] = B[i * N + j];
}
}
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
#pragma HLS PIPELINE
float result = 0;
for (int k = 0; k < N; k++) {
float term = _A[i][k] * _B[k][j];
result += term;
}
C[i * N + j] = 1.0/(1.0 + expf(-result));
}
}
}
Second main.cpp is changed on line 88 to add the logistic function.
void mmult_golden(float *A, float *B, float *C)
{
for (int row = 0; row < N; row++) {
for (int col = 0; col < N; col++) {
float result = 0.0;
for (int k = 0; k < N; k++) {
result += A[row*N+k] * B[k*N+col];
}
C[row*N+col] = 1.0/(1.0 + expf(-result)); // non-accelerated logistic function
}
}
}
The test results:
104,658 processor clock cycles were added by computing the logistic function in the non-accelerated code.
310 processor clock cycles were added by computing the logistic function in the accelerated portion of the code.
The logistic function portion of the code has been sped up 337 times!
More benefit to speeding up the code further would be to focus on the matrix multiply. There is not much there to shrink on the accelerated logistic function!
DropBox link for the zipped up archive of the SDSoC project
Making a long story short
In the future I will use a lookup table method to speed the processing of the activation function. This means the index to the table will be an integer. This also means in the future I will experiment with using integer arithmetic for most of my neural network code. There will be development points early on where I use a mixed system of floating point and fixed point arithmetic but I suspect benchmarking and other practical experience will lead preferring to mostly fixed point systems. If this sounds like intuition or guesswork to you then I call you very wise indeed.
In the scientific literature on neural networks there is much discussion on the merits of fixed point versus floating point neural networks. Work on binary neural networks further reduces the values used in neural networks to -1 and +1 ( implemented in binary circuit 1s and 0s with which we may be more familiar). I will not try to settle the argument or take sides in these discussions.
Going Further
Things you will want to know about developing with floating point
Things you will want to know about developing with fixed point
Comments