Neural Networks That Design Neural Networks
Researchers created a new framework that leverages LLMs and vision transformers to optimize AI algorithms for use on tiny devices.
Building a nuclear reactor and standing up a brand new data center should not be prerequisites for launching an artificial intelligence (AI) application, but that is the world we find ourselves in today. Cutting-edge AI algorithms like large language models (LLMs) and text-to-image generators often require massive amounts of computational resources that limit them to running in remote cloud computing environments. Not only does this factor make them highly inaccessible, but it also raises many privacy-related concerns and introduces latency that makes real-time operation impossible.
All of these issues could be dealt with by running the algorithms on edge computing and tinyML hardware, but that is easier said than done. These systems have very severe resource constraints that prevent large models from executing on them. To address this issue, many optimization techniques — like pruning and knowledge distillation — have been introduced. However, the application of these techniques sometimes seems a bit haphazard — slice a little here, trim a little there, and see what happens.
When a model is pruned, it does improve inference speeds, but it can also hurt accuracy, so optimization techniques must be applied with care. For this reason, a group led by researchers at the University of Rennes has developed a framework for creating efficient neural network architectures. It takes the guesswork out of the optimization process and produces highly accurate models that can even comfortably run on a microcontroller.
The framework combines three separate techniques — an LLM-guided neural architecture search, knowledge distillation from vision transformers, and an explainability module. By leveraging the generative capabilities of open-source LLMs such as Llama and Qwen, the system efficiently explores a hierarchical search space to design candidate model architectures. Each candidate is evaluated and refined through Pareto optimization, balancing three critical factors: accuracy, computational cost (MACs), and memory footprint.
Once promising architectures are identified, they are fine-tuned using a logits-based knowledge distillation method. Specifically, a powerful pre-trained ViT-B/16 model acts as the teacher, helping the new, lightweight models learn to generalize better, all without bloating their size.
The researchers tested their approach on the CIFAR-100 dataset and deployed their models on the highly constrained STM32H7 microcontroller. Their three new models — LMaNet-Elite, LMaNet-Core, and QwNet-Core — achieved 74.5%, 74.2%, and 73% top-1 accuracy, respectively. All of them outperform state-of-the-art competitors like MCUNet and XiNet, while keeping their memory usage under 320KB and computational cost below 100 million MACs.
Beyond just performance, the framework also emphasizes transparency. The explainability module sheds light on how and why certain architecture decisions are made, which is an important step toward trustworthy and interpretable AI on tiny devices.
This unique approach that leverages AI to optimize other AI algorithms could ultimately prove to be an important tool in our efforts to make these algorithms more accessible, more efficient, and more transparent. And that might bring powerful, privacy-preserving AI applications directly to the devices that we carry with us everyday.