Improving Voice Assistant Koala-ty
This DIY smart speaker leverages ChatGPT on the backend to enable more complex conversations than other voice assistants can provide.
The global market size for voice assistants was $1.5 billion in 2020, but is growing rapidly and is expected to rise to $14.2 billion by 2030. These intelligent personal assistants, powered by artificial intelligence, are increasing in popularity because they can perform a wide range of tasks, from setting reminders and playing music to answering questions and controlling smart home devices.
One of the main benefits of voice assistants is their convenience. Users can simply speak to their device and get an instant response, without having to type or click anything. This makes them ideal for tasks that require hands-free operation, such as driving or cooking. Voice assistants can also be used to access information quickly, such as weather forecasts, news headlines, and sports scores.
However, despite their many benefits, voice assistants also have their limitations. One common frustration for users is their lack of ability to understand complex questions or requests. For many, the initial excitement about the technology starts to wear off after turning the lights on and asking for tomorrow’s weather a few times. All too often, when anything much more complex is asked, it is met with an irrelevant response.
With the appearance of large language models, such as those that power the likes of ChatGPT and Bard, a glimmer of hope has appeared on the horizon for the development of a more capable voice assistant. After all, these models excel at understanding natural language prompts, and at responding to those requests with relevant information, in a human-like manner. If only we could talk to these models instead of only being able to interact via our keyboards.
There may not be a commercial device available just yet that makes this possible, but there are many examples of personal projects that do exactly this. And if you are comfortable plugging a few components into a single board computer and copying some code from GitHub, you should be able to get one up and running yourself. The latest entrant into the field is DaVinci — The ChatGPT Virtual Assistant created by a hobbyist known as DevMiser.
The device, which looks a bit like a startled koala bear, listens for a wake word, like most voice assistants, then accepts a voice command from the user. That prompt is converted to text using Picovoice Leopard, then is sent to the ChatGPT API, where it is run against a GPT-3.5 model. The response from the ChatGPT API is converted to a natural-sounding synthetic human voice by Amazon Polly.
Inside the case is a Raspberry Pi 4 single board computer to handle all the computations and wireless connectivity with external APIs. A USB microphone allows the device to capture the speaker’s requests, while a USB speaker plays the response from the ChatGPT API audibly. There are also a few optional LEDs that can be included to indicate the state of the system, like when it has been triggered by a wake word.
There are several external services that the device relies on, so a number of accounts need to be set up, and access keys created, before DaVinci can go to work. But with the initial set up out of the way, it should be a simple matter of plugging in a few USB components and copying a Python script from the Github repo to build your own copy of the project. Naturally the case is optional, but if you want the koala aesthetic for your own build, the STL design files are also yours for the downloading.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.