Build a Bear with ChatGPT-Driven Voice Assistant Capabilities
Adafruit's M. LeBlanc-Williams used the power of ChatGPT to create a voice assistant teddy reminiscent of Teddy Ruxpin.
ChatGPT is taking the world by storm and people are continuing to discover interesting new uses for the technology. As the name suggests, its original primary purpose was to act as a very sophisticated chat bot. But it can do much more, including writing code and simple articles. It can also mirror some of the capabilities of voice assistants like Siri and Alexa when combined with voice-to-text and text-to-speech services. Adafruit's M. LeBlanc-Williams used that capability to create a voice assistant teddy reminiscent of Teddy Ruxpin.
Teddy Ruxpin is an animatronic teddy bear toy for children first released in 1985, when it became an instant success. Using a cassette player build into the back, the original Teddy Ruxpin could read stories aloud to children while moving around. Because cassette tapes have two audio tracks for stereo sound, the toy could use one audio track for the story and use the other for data. That data would contain movement commands, letting Teddy Ruxpin move along with the audio in a choreographed manner. The technology was quite impressive for the time, but doesn't come close to what LeBlanc-Williams achieved here.
LeBlanc-Williams started with a Peek-A-Boo Teddy Bear from a company called GUND. This is a toy similar to Teddy Ruxpin, but more rudimentary. It can only recite a handful of different pre-recorded audio clips while moving its mouth and pulling up a sheet with its arms to play peek-a-boo. For this project to work, LeBlanc-Williams had to replace the original electronic components with more powerful hardware.
That hardware included a Raspberry Pi 4 Model B single-board computer, an Adafruit Motor HAT, a USB microphone, and an Adafruit I2S 3W Class D amplifier board. The Motor HAT drives the motors that actuate the bear's mouth and arms, while the amplifier board pumps out audio through the bear's built-in speaker. Anyone reproducing this build can use a Mini External USB Stereo Speaker instead of the amplifier board to make things a little easier (but also a bit bulkier).
On the software side, this project requires APIs from both OpenAI and Azure. From the former, it uses ChatGPT and Whisper, which is a NLP (Natural Language Processing) speech recognition service. From the latter, it uses Speech Services to turn the ChatGPT text output into audio.
The workflow goes like this: the user presses a button on the bear's foot to indicate that they're ready to ask a question. They can then speak that question aloud and Whisper will convert it into a text prompt to feed into ChatGPT. While it is "thinking," the Motor HAT moves the blanket up to cover the bear's face. ChatGPT will return a text response, which Azure will use to create an audio file. As that audio plays, the Motor HAT moves the bear's mouth. That movement isn't synced with the speech, but it is close enough for this purpose.
As you can see in the video, this works well and with the power of ChatGPT, the upgraded bear can answer just about any question.