Hackster is hosting Impact Spotlights: Smart Home. Watch the stream live on Thursday!Hackster is hosting Impact Spotlights: Smart Home. Stream on Thursday!

An LLM on a Stick

Binh Pham built an LLM into a USB stick powered by a Raspberry Pi Zero and gave it a user interface that is incredibly easy to use.

Nick Bild
1 month agoMachine Learning & AI
An LLM runs on this USB stick (📷: Binh Pham)

The days in which generative artificial intelligence (AI) applications could only run on powerful, expensive computing platforms have come to an end, thanks to advances in algorithm design and clever optimization techniques. When combined with the field’s current trend of open-sourcing trained models, there are countless new opportunities that have been opened up for everyone to experiment with cutting-edge AI tools. This has, in turn, led to many efforts to simplify the use of these tools, such as the llamafile and llama.cpp projects.

In a similar vein, an interesting concept for running local large language models (LLMs) has recently been demonstrated by Binh Pham of the Build With Binh YouTube channel. Pham’s idea was to put an entire LLM, including the hardware required for running inferences and the user interface, on a USB stick. By plugging the stick into a computer, one can interact with the LLM by simply creating a text file — no technical skills are required.

Inside the 3D-printed shell of this somewhat-oversized USB stick is a Raspberry Pi Zero single-board computer and a shield that adds a male USB port for interfacing with a host computer. Much of the work of getting an LLM to run on this platform has already been handled by the llama.cpp project, so Pham leveraged that.

It was not entirely straightforward, however, as the Pi Zero is now showing its age. It sports a processor built on the ARMv6 architecture, yet llama.cpp leverages ARMv8-specific instructions for optimization, so the compilation of llama.cpp failed. After doing a lot of research and debugging, Pham was able to track these instructions down in the code and remove them to get a working copy of the software.

After clearing that hurdle, Pham turned his attention to building a user interface that was as simple as possible to use. He ultimately settled on a system in which the Pi Zero would be presented as a USB drive. The user then creates a file on that drive, and the name of the file is fed into a small storytelling LLM as the prompt. It then generates a story based on that prompt and populates the contents of the file with the results produced by the model.

This is an interesting interface, but it is for a very specific use case, so it will not work well for every application. But the much bigger issue is the system’s performance. A tiny 15M parameter model works well enough, processing each token in about 200 milliseconds. But even a 77M parameter model increases that time to 2.5 seconds. Furthermore, these tiny models are not especially good, greatly limiting their utility for any practical uses.

It would be nice to see this project updated to use a Raspberry Pi Zero 2, which should be pretty much a drop-in replacement. It would significantly speed up processing and allow for larger, more useful models to be used. Furthermore, since this newer computer’s processor has an ARMv8 architecture, no source code hacks of llama.cpp would be necessary. It is pretty easy to imagine that a few years in the future even better hardware will be available. At that time, LLMs on a stick might really catch on — but probably with a different user interface than Pham imagined.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Get our weekly newsletter when you join Hackster.
Latest articles
Read more
Related articles