Improve Your Memory with This Simple Trick

MCUNetV2 shuffles pieces of neural networks in and out of RAM to reduce memory usage while simultaneously improving inference accuracy.

(📷: J. Lin et al.)

TinyML enables the development of cost-effective and privacy protecting machine learning applications on ubiquitous, low-power compute resources. Glancing over the specifications of a typical microcontroller that these types of applications run on, you will quickly realize that the “tiny” designation is no exaggeration. There may be no more than a few tens of kilobytes of RAM available to run those machine learning algorithms. Such algorithms, neural networks in particular, are not exactly known for being tiny, as many thousands or millions of parameters are commonly needed to achieve good performance.

That said, machine learning and tiny compute resources hardly seem like a perfect match. Nevertheless, the benefits of running these algorithms on resource-constrained devices is so great, that many efforts have been undertaken to improve TinyML techniques. One such effort to recently surface comes from a team of researchers at the MIT-IBM Watson AI Lab. The method, called MCUNetV2, has been shown to shrink the memory resources required by certain types of neural networks, while simultaneously improving inference accuracy — not a bad combination.

Memory reduction with MCUNetV2 (📷: J. Lin et al.)

Focusing on convolutional neural networks, which are commonly used to classify images, the team first analyzed how microcontroller memory was utilized during inference. What they discovered is that the first few blocks require an order of magnitude larger memory usage than the rest of the network. This front-loading of memory creates a bottleneck. To alleviate this problem the researchers designed a new network architecture that employs patch-by-patch inference scheduling.

Patch-by-patch scheduling only performs operations on a small spatial region (approximately 25%) of the feature map at any given time, and as such, dramatically reduces peak memory utilization. This scheduling must be carefully implemented, as a suboptimal solution would lead to an excess of overlapping patches, and therefore additional computational overhead. To minimize this overlapping of patches, or receptive fields, the team devised a neural architecture search algorithm to automatically redistribute the receptive fields to a later stage to reduce this computational overhead.

Memory and computation comparison (📷: J. Lin et al.)

Using MCUNetV2, memory usage was found to be reduced by four to eight times when compared with existing methods. This reduction in memory use did not correspond with a reduction in accuracy, however. To the contrary, the team set a new record for classification accuracy on ImageNet with a microcontroller, at 71.8%. Proving that it is not a one-trick pony, MCUNetV2 also achieved a better than 90% accuracy on the visual wake words dataset, while using under 32 kilobytes of memory.

In the future, the researchers are looking to further automate the process of optimizing parameters and the inference schedule such that MCUNetV2 can be used even by non-machine learning experts. They envision the technique enabling improved methods for tasks such as monitoring sleep and joint movement, sports coaching, identification in agriculture, and smarter manufacturing.

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Improve Your Memory with This Simple Trick

MCUNetV2 shuffles pieces of neural networks in and out of RAM to reduce memory usage while simultaneously improving inference accuracy.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles