DeepPicarMicro Crams NVIDIA's PilotNet Autonomous Vehicle Neural Network Into a Raspberry Pi Pico
Clever optimization approaches take a CNN model designed for high-end GPUs and run it on the low-cost RP2040.
A trio of scientists from the University of Kansas have published a paper on DeepPicarMicro, an autonomous vehicle testbed, which crams a fully-functional convolutional neural network (CNN) onto a Raspberry Pi Pico microcontroller board.
"Running deep neural networks (DNNs) on tiny Microcontroller Units (MCUs) is challenging due to their limitations in computing, memory, and storage capacity," the team admits. "Fortunately, recent advances in both MCU hardware and machine learning software frameworks make it possible to run fairly complex neural networks on modern MCUs, resulting in a new field of study widely known as tinyML. However, there have been few studies to show the potential for tinyML applications in cyber physical systems (CPS)."
That's where DeepPicarMicro comes in: a cyber physical system testbed for an autonomous radio-controlled model car based around a Raspberry Pi Pico and its RP2040 microcontroller. While far from the lightest-weight microcontroller around, that poses a challenge: the RP2040 has just two process Arm Cortex-M0+ processor cores running at a default speed of 133MHz and 264kB of static RAM (SRAM), far below the specifications of a typical autonomous vehicle edge AI system — and particularly the PilotNet architecture, developed for use with NVIDIA's high-end resource-rich autonomous vehicle platforms, which the team sought to port.
To get PilotNet running on the Raspberry Pi Pico, it had to be optimized for the RP2040. For this, the team used a series of techniques beginning with eight-bit quantization, which reduced the layer memory requirements to 100kB or less for a drop in accuracy from 87.6 percent to 86.9 percent — but saw per-frame processing times jump to three seconds, far to slow for an autonomous vehicle system.
The team then replaced two-dimensional convolutional layers with depthwise-separable layers, designed to reduce the number of multiply-accumulate (MAC) operations required of the processor, before using a neural architecture search (NAS) to find the highest-accuracy model, which would fit in the RP2040's resources and run quickly enough.
Of the 349 models found through NAS, the team selected 16 for real-world testing. "An important observation is that a model's accuracy alone was not a sufficient indicator to predict the system’s true performance in the track," the team notes. "For example, the best model (#2) we tested completed nine laps without a single crash, but another similarly accurate — in terms of validation loss and accuracy — model (#7) was only able to complete five labs without [a] crash."
"Note also that models #6 and #7 achieve good accuracy yet perform worse than significantly less accurate model #4. When we consider latency into account, however, it is clear that these highly accurate models did not work well as their latencies are significantly higher than others."
For their future work, the researchers propose the evaluation of more complex neural network architectures across a range of microcontroller models, along with finer-grained methods for estimating the real-world control performance in order to find the best optimization strategies for tinyML.
The team's work is available as a preprint on Cornell's arXiv server.