The Little Computer That Could

The powerful Stable Diffusion XL 1.0 image generator can now run on a Raspberry Pi Zero 2 W with 512 MB of RAM using OnnxStream.

Nick Bild
1 year agoMachine Learning & AI
This image was generated by Stable Diffusion XL 1.0 on a Raspberry Pi Zero 2 W (📷: Vito Plantamura)

Without a doubt, image generation with modern AI tools has streamlined and accelerated the creative process. Artists, designers, and content creators can use these tools to swiftly produce visual content, prototypes, and mock-ups. This efficiency not only saves time but also empowers professionals to explore a wider range of design concepts and iterate on their ideas rapidly. Whether one is generating concept art for a video game, creating product mock-ups for e-commerce, or crafting marketing materials, AI-driven image generation simplifies complex tasks and boosts productivity.

A number of online tools are available that enable custom image generation, however, they often come with a number of usage restrictions and costs that make them unsuitable for power users. As a result, these users may want to run the algorithms on their own hardware, but that can be a challenge. These powerful algorithms generally require a lot of computational horsepower that may not be available to those that want to leverage them.

An engineer named Vito Plantamura has been hard at work in recent months on a project called OnnxStream that allows the powerful Stable Diffusion algorithm to run on small, inexpensive hardware platforms. In fact, over the summer we reported on Plantamura’s success in getting Stable Diffusion 1.5 to run on a Raspberry Pi Zero 2 W single-board computer with 512 MB of RAM with the help of OnnxStream.

That was certainly an impressive feat of engineering, and you might expect that it was about as far as anyone could possibly push the humble Raspberry Pi Zero 2 W. But Plantamura has proven that to be untrue with the latest updates to OnnxStream. It was demonstrated that the same hardware platform could be made to run the much more computationally-intensive Stable Diffusion XL 1.0 algorithm.

Stable Diffusion XL 1.0 produces images that are four times larger than those generated by Stable Diffusion 1.5, and as would be expected, the processing overhead increases accordingly. The minimum recommended VRAM for this updated algorithm is 12 GB, so fitting it into 512 MB was no small task.

The optimizations used to run version 1.5 of the algorithm were carried over, but some additional tricks were required. The U-Net model, which is a critical component in translating a user’s prompt into an image, was reduced in size with a UINT8 dynamic quantization that targeted a subset of the large intermediate tensors. The situation with the VAE decoder was more complicated. UINT8 quantization resulted in images of very poor quality, but 16-bit precision required 4.4 GB of RAM — way too much for the Raspberry Pi Zero 2 W.

Plantamura’s clever solution to this problem involved the use of tiled decoding. Using this strategy, the image to be generated could be split into a 5 by 5 grid, with the decoding process taking place separately for each tile. Each tile was overlapped by 25% with the tiles to its left and top such that they could be blended together and prevent the appearance of sharp lines between the tiles that break up the image.

While you can generate images with under 300 MB of RAM on a Raspberry Pi Zero 2 W, you probably do not want to. Processing time is approximately eleven hours, which does not exactly allow for rapid design iteration. I was able to show that Stable Diffusion 1.5 could run on an OKdo ROCK 5 Model A in a few minutes, however, so if your budget is just a little bit higher, that could be a much more practical option.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles