Let's Talk Offline

JetsonGPT is a ChatGPT-like voice assistant that does not require an internet connection or external APIs for operation.

JetsonGPT is a voice assistant that does not require an internet connection (📷: N. Shakhizat)

Chatbots based on large language models (LLMs), such as GPT-4 or LaMDA, have emerged as groundbreaking advancements in natural language processing and artificial intelligence. These models are designed to understand and generate human-like text based on the vast amount of data they have been trained on. With billions of parameters, they excel at various language tasks, including text completion, translation, summarization, sentiment analysis, and even creative writing.

While the rise to prominence of LLMs has only happened recently, their impact on the world has already been significant and transformative. They have revolutionized the way people interact with technology, enabling more natural and intuitive human-computer interfaces. These models have empowered businesses and organizations to automate tasks, streamline workflows, and enhance productivity. Moreover, they have expanded access to information and knowledge, enabling users to obtain instant answers to their questions and discover relevant content efficiently.

After interacting with a chatbot like OpenAI’s ChatGPT or Google’s Bard, many people are left wondering why their voice assistants seem so primitive by comparison. It is safe to say that the major commercial voice assistant products will be powered by LLMs in the near future, but until that time, we are left with a keyboard and web-based interface if we want a richer experience.

The device's hardware (📷: N. Shakhizat)

This present state of affairs has not sat well with hardware hackers, who have developed all sorts of LLM-powered voice assistants of their own. But in general, these devices rely on an internet connection and send data to the cloud. This introduces both latency and privacy concerns.

An interesting project write-up by Nurgaliyev Shakhizat details how one can build an LLM-powered voice assistant without relying on a network connection or external APIs. Called JetsonGPT, this device does everything you would expect a voice assistant to do — it listens for a wakeword, accepts a voice prompt from a user, then it speaks the language model’s response in a natural sounding spoken voice.

Of course running the model locally does mean that the project requires beefier hardware than the API-based solutions. In this case Shakhizat chose Seeed Studio’s reComputer Jetson-20-1-H2 with a Jetson Xavier NX 16 GB module. Capable of performing up to 21 trillion operations per second, this platform is small but mighty with 48 Tensor Cores, 6 Carmel ARM CPUs, and two NVIDIA Deep Learning Accelerator engines. A Seeed Studio Respeaker USB Mic array and a Bluetooth speaker rounded out the bill of materials.

At a high level, JetsonGPT works by continually sampling data from the microphone array to look for a predefined wakeword. Once detected, that triggers the system to capture the user’s voice prompt. That audio sample is converted to text through an automatic speech recognition pipeline.

The text of the user prompt is then fed into a FastChat-based LLM using weights downloaded from a Hugging Face model. This model produces a text-based response, which is then forwarded into a text-to-speech model that produces an audio file that is played on the Bluetooth speaker.

Even on a powerful hardware platform like the Jetson Xavier NX, LLMs can be more than a handful. So to help improve the performance, Shakhizat overclocked both the CPU and GPU frequency. This was sufficient to give acceptable performance, however, there is still a significant delay between prompting the system and receiving a response. But there is a lot of effort going into squeezing more performance out of smaller LLMs at present, so perhaps this will improve in the near future.

Be sure to check out the project write-up to learn how you can create your own LLM-powered voice assistant.

nickbild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Latest Articles