Tiny Deep Learning Is No Longer a Contradiction
OnnxStream drastically cuts the amount of RAM needed by deep learning models, allowing Stable Diffusion to run on a Raspberry Pi Zero 2 W.
Stable Diffusion is undoubtedly one of the most popular generative AI tools of the moment, and has played a role in bringing machine learning into the public eye. This deep learning text-to-image model is capable of generating some very impressive photorealistic images, given only a textual description from a user of the system. By leveraging a specialized latent diffusion model, Stable Diffusion has transformed the way AI systems comprehend and produce visual content, making it more accessible and user-friendly for a broader audience.
This model has also helped to democratize advanced machine learning capabilities β it has been open sourced under a permissive license, and is capable of running on relatively modest, consumer-grade hardware. A somewhat modern GPU with at least 8 GB of VRAM is enough to get your own instance of the Stable Diffusion model up and running. Massive cloud infrastructures and Big Tech budgets are not required.
But what about someone that does not even have a recent GPU available to them? Just how low can you go in terms of computational resources to still generate images with Stable Diffusion? An engineer by the name of Vita Plantamura set out on a quest to find out. Spoiler alert β no fancy GPU is necessary. In fact, a computer that had halfway decent specs back when Nickelback was still topping the charts should do it.
Amazingly, Plantamura found a way to get a one billion parameter Stable Diffusion model running on the Raspberry Pi Zero 2 W. While we love this single board computer, the 1 GHz Arm Cortex-A53 processor and 512 MB of SDRAM available on the Pi Zero 2 W do not exactly lend themselves well to running deep learning applications. But with a bit of creative thinking, it turns out that this $15 computer can get the job done.
To achieve this feat, a tool called OnnxStream was developed. Inference engines are generally designed with one primary goal in mind β speed. And this speed comes at the cost of high memory utilization. OnnxStream, on the other hand, streams model weights in as they are needed, rather than fetching everything up front. In this case, the 512 MB of the Raspberry Pi was more than what was needed. A paltry 260 MB proved to be sufficient.
This does slow processing down, of course. Using OnnxStream, models typically run about 0.5 to 2 times slower than on a comparable system with more memory. However, OnnxStream consumes about 55 times less memory than those systems. And that could open up some fantastic opportunities in tinyML, running models on hardware that would have previously been totally inadequate for the job.
Running Stable Diffusion on a Raspberry Pi Zero 2 W is probably not the best idea if you have a far more capable laptop that you are SSHing into the Pi with, however, it is a very impressive accomplishment. And it may unlock new use cases for powerful machine learning applications on resource-constrained devices. Plantamura has open sourced OnnxStream and made it available on GitHub. Be sure to check it out for all the details that you need to get your own impressive tinyML applications up and running.