Bringing Big AI to Tiny Devices

StreamTinyNet enables multi-frame video analysis on resource-constrained devices, like the Arduino Nicla Vision, to find temporal patterns.

Arduino Nicla Vision (📷: Arduino)

The latest and greatest artificial intelligence (AI) applications tend to hog a lot of resources, from large computing clusters to massive amounts of energy consumption. For this reason, the algorithms generally run in large data centers accessible via the public internet. This architecture works well enough for many applications, but for others where real-time responses are required, the latency it introduces is unacceptable. Furthermore, we often need to provide AI-based tools with highly sensitive information like intellectual property. Sending this type of information over the internet to an unknown remote data center is a very questionable practice in terms of privacy and security.

Fortunately, advances in both hardware and software algorithms have enabled the emergence of tinyML. Using tinyML techniques, many advanced AI algorithms can run on severely resource-constrained hardware platforms, sometimes with just a few tens of kilobytes of memory. Even traditionally resource-intensive computer vision applications have made their way onto low-power platforms in recent years.

These computer vision applications most frequently are used for tasks like image classification or object detection. But what they all have in common is that they analyze a single image frame at a time. This disregards crucial temporal patterns that can only be observed by looking at multiple frames. Consider trying to recognize a dynamic hand gesture from a single frame, for example.

The proposed StreamTinyNet pipeline (📷: H. Shalby et al.)

For the first time, researchers at the Polytechnic University of Milan in Italy have developed a framework for performing analyses on video streams using highly-constrained hardware platforms. Called StreamTinyNet, the team’s approach can analyze multiple video frames to provide much greater levels of accuracy than single frame-based tinyML algorithms. Yet due to the system’s unique design, it does not require significantly more memory or processing power than those less capable solutions.

At the core of StreamTinyNet is a convolutional neural network that first processes each frame of the video individually to extract important features. This essentially creates a summary of each frame that captures only essential details to reduce its size. Once the features from all frames are extracted, the network analyzes them together to understand the sequence and timing of events. This utilizes a function that looks at the changes between frames to identify patterns over time, like a hand moving to form a gesture. The processed data is then passed through a fully connected neural network, which ultimately classifies the input into one of several possible categories.

To validate their approach in a real-world experiment, the researchers ported StreamTinyNet to an Arduino Nicla Vision development board. It has a modest amount of computational resources, with a STM32H747AII6 dual Arm Cortex M7/M4 microcontroller, two megabytes of flash memory, and one megabyte of RAM. Running on this platform, the algorithm was capable of performing gesture detection at a very impressive 15 frames per second. This only required about 300 kilobytes of RAM.

Looking ahead, the team intends to continue enhancing StreamTinyNet. Next up, they plan to address issues with sensor drift and also explore adaptive frame rates to optimize energy consumption.

machine learning

artificial intelligence

energy efficiency

computer vision

Nick Bild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Bringing Big AI to Tiny Devices

StreamTinyNet enables multi-frame video analysis on resource-constrained devices, like the Arduino Nicla Vision, to find temporal patterns.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles