Bringing GPT-4o in From the Cloud to the Edge

How to use the latest GPT-4o LLM to train specialized tinyML models to deploy on microcontroller-sized hardware at the edge.

A MobileNetV2 model trained using a GPT-4o model running on a iPhone (📷: Edge Impulse)

The latest large language models (LLMs), like OpenAI's flagship GPT-4o, live up to their name. They are anything but small. Smaller alternatives, colloquially know as small language models (SLMs), like Microsoft's Phi-3 can depending on the task be just as capable, but can be run using much less computing power and hence much more cheaply. But what if you're trying to operate at the edge with not just much less, but almost no, compute power. Then you might need to take a different approach, which is exactly what Edge Impulse has just done.

The new approach from Edge Impulse is to use GPT-4o to train a Small AI model, one that's two million times smaller than the original GPT-4o LLM, which will run directly on device at the edge.

This is a very different approach something like Picovoice's recently released PicoLLM framework, which is intended to be chained with existing tinyML models used as triggers for the more resource intensive SLM.

Both approaches allow you move inferencing out of the cloud at to the edge, but the Edge Impulse approach potentially allows you to reduce the amount of compute down much further.

Because what they're doing is not exactly the same as the architectures we've seen before now, which use tinyML models to select key frames to feed into a larger SLM or LLM. Here we're using a full-scale LLM running in the cloud to classify and label data to train a "traditional" tinyML model, such as a MobileNetV2 which can be deploy to the edge and run on microcontroller-sized hardware, inside a couple of hundred KB of RAM.

It's a genuinely fascinating short cut to use the larger more resource intensive model as a labeler for training a much smaller tinyML models that can then be used on device. It's going to be intriguing to see if models trained this way perform differently — have different perceptional holes — to models trained directly on human labeled data. Whether these AI-trained models are more or less flexible when presented with different and divergent data than their human-trained counterparts.

Going forward I think we're going to see more of these mixed architectures as tinyML, and Small AI, converge as people try and figure out what works and what doesn't.

aallan

Scientist, author, hacker, maker, and journalist. Building, breaking, and writing. For hire. You can reach me at 📫 alasdair@babilim.co.uk.

Latest Articles