Neural Networks Are Just Trying to Fit In (to RAM)
Edge machine learning is red hot these days, but on-device training is still a challenge. Clever work at MIT may help to change that.
The era of devices that execute sets of instructions fully defined in advance by human software engineers is rapidly giving way to a wave of intelligent devices that can learn to perform better through experience. This new class of devices uses machine learning on the edge to power innovations in self-driving vehicles, medical diagnostics, and predictive maintenance to name just a few application areas. Recent advancements have made it possible for these algorithms to run on resource-constrained hardware, like microcontrollers, that offer the portability and energy efficiency needed to build practical intelligent devices. This has led to many successes, but while these limited hardware platforms have been able to run the machine learning algorithms quite successfully, they have not fared so well in the more challenging task of training them.
As such, when it is time to update the algorithm so that it can learn from new experiences, large compute resources — often in the cloud — are required. This slows down the rate at which an intelligent device can learn and also introduces privacy concerns by having to send potentially sensitive information over public networks for processing on cloud computing resources. A team of researchers at MIT has recently published their work in this area that seeks to make training machine learning models on resource-constrained hardware more practical. Using some clever techniques, they have shown that it is possible to train a neural network with less than 256 KB of RAM available.
When training a neural network, the strength of the connections between the neurons is continually adjusted as new example data is presented. There may be millions of these connections, or weights, in a neural network, and they all need to occupy memory in order to be operated on. In the world of edge computing, the resources simply are not available to make this possible. The MIT team sidestepped this problem by using a technique called sparse updating that identifies which weights in the network are the most important during each round of training, and only operates on those. The remainder of the weights are temporarily frozen, which means that they do not need to occupy memory.
This has an advantage over previous on-device training efforts that only allow updates to a single layer of the network to avoid memory overruns. Such methods cause model accuracy to suffer, whereas the new method does not share that same problem. The researcher also implemented a second optimization to further reduce resource consumption during training. Most models store weights as 32-bit numbers, which require four bytes of memory each. Using a method known as quantization, weights were reduced to 8 bits, or just one quarter of their normal size. To avoid any drop in accuracy that may come along with this rounding, a technique called quantization-aware scaling was also implemented. In addition to reducing the resources needed for training, quantization also has the benefit of speeding up inference times on the resulting model.
The team packaged up their enhancements into a system called tiny training engine and put it through its paces on a microcontroller platform. Using computer vision and an algorithm designed to recognize people in images, they were able to perform on-device training with just 157 KB of memory. The entire training process only needed 10 minutes to complete, which is up to 20 times faster than alternate approaches. This work has proved the feasibility of continual learning on-device, which has significant ramifications for the future of intelligent device design.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.