There are some of proposed traffic signs classifiers , but most of published work targets high end GPUs that consumes more power. Not only the problem with demands of power reduction in embedded systems but the customers may be favorite low cost options. So in this project a more suitable for ADAS (Advanced Driver Assistance Systems) is being introduced without sacrificing the accuracy.
Strategy:
1- Develop the high level model for different options and choose the best candidate to be implemented on FPGA.
2. As we are going to have realtime solution. Implementing the network on CPU part of Xilinx SoC won't be sufficient.
So FPGA flow must be followed.
3. Xilinx DPUs may be a strong option but the problem that it with Kria, it implements integer unit not floating point, and here we will have real concerns about the model accuracy.
So, in this work a complete FPGA code is going to be developed.
The chosen network is LeNet, why because it is very small in model size comparing to other convolutional neural networks, so it can fit on FPGA without utilization problems, keep in mind that we need to have floating point operations as well, and this is a huge headache in terms of multiply, add , or do hyperbolic tanh function (needed in activation layers) over numbers that are represented in IEE-754 format.
This network contains cascading of six layers and the output layer, first three layers are convolutional layers, followed by fully connected layers.
The input image must be of dimension 32x32 and grayscale, it also normalized to have pixels values between 0 and 1 (floating point numbers), which passes through first convolutional layer with 6 feature maps.
Then a dimensionality reduction layer (average pool) is applied and after that a hyperbolic tanh layer.
What is done in this project, is the implementation of each layer on FPGA. Till now the whole network is not implemented at once, so more optimizations are needed to have the full network implemented without need to go forward and back between the host machine and FPGA.
Attached is the tanh layer synthesized and implemented successfully on KRIA and for sake of proof of concept the remaining of the network is implemented on PC and communication is done through ethernet interfacing.
This may mimic the Xilinx DPUs, where a program is being written in CPU part or host machine in case of Alveo boards then it communicates with DPU in programable logic part back and forth,
Comments