Pixelated Neural Networks
Running deep neural network operations in image sensor pixels is enabling state-of-the-art computer vision applications for tinyML.
Computer vision provides a very dense source of information about the world, so it should come as no surprise that this technology is being used in a wide range of applications, from surveillance to wildlife monitoring and autonomous driving, to name a few. But the richness of this data is a double-edged sword β while it enables the development of many fantastic new technologies, it also requires a lot of computing horsepower to make any sense of. And that often means high costs, poor energy efficiency, and limited portability. To improve this state of affairs and bring computer vision to more applications, a number of efforts have been undertaken in recent years to move the processing closer to the image sensor, where it can operate more efficiently.
These efforts have generally fallen into one of three broad categories β near-sensor processing, in-sensor processing, or in-pixel processing. In the first case, a specialized processing chip is located on the same circuit board as the image sensor, which saves a trip to the cloud for processing, but still presents a data transfer bottleneck between the sensor and processor. In-sensor processing moves the processing a step closer by placing it within the image sensor itself, but it does not fully eliminate the data transfer bottleneck seen with near-sensor processing. As a better path forward, in-pixel processing techniques have been developed that move processing directly into each individual pixel of the image sensor, eliminating data transfer delays.
While this method offers a lot of promise, present implementations tend to rely on emerging technologies that are not yet production ready, or they do not support the types of operations that a real world machine learning model requires, like multi-bit, multi-channel convolution operations, batch normalization, and Rectified Linear Units. These solutions look impressive on paper, but where the rubber meets the road, they are not useful for anything more than solving toy problems.
In-pixel processing suitable for real world applications looks to be a few steps closer to becoming a reality as a result of the recent work of a team at the University of Southern California. Called Processing-in-Pixel-in-Memory, their method incorporates network weights and activations at the individual pixel level to enable highly-parallelized computing inside image sensors that is capable of performing operations like convolutions that many neural networks need to perform. In fact, sensors implementing these techniques are capable of performing all of the operations required to process the first few layers of a modern deep neural network. No toy problems involving MNIST digit classifications to see here, folks.
The researchers tested out their approach by building a MobileNetV2 model trained on a visual wake words dataset using their methods. It was found that data transfer delays were reduced by a whopping 21 times when compared to standard near-processing and in-sensor implementations. That efficiency also manifested itself in a lower energy budget, with the energy-delay product found to have been reduced by 11 times. Importantly, these efficiency gains were achieved without any substantive reduction in model accuracy.
Since the first few layers of the model are processed in-pixel, only a small amount of compressed data needs to be sent to an off-sensor processor. This not only eliminates data transfer bottlenecks, but also means that inexpensive microcontrollers can be paired with these image sensors to enable advanced visual algorithms to run on ever smaller platforms, without sacrificing quality. Make sure to keep your eyes on this work in the future to see what changes it may bring to tinyML applications.