Tristam R. Rolls His Own Slick ESP32 Smart Speaker for Private, Local Home Assistant Voice Control

Built around an Espressif ESP32-LyraT board, this compact smart speaker system performs all its operations on the local network.

Semi-anonymous product manager and maker Tristam R. has added a voice assistant to his smart home, but without relying on off-the-shelf solutions — by rolling his own privacy-respecting version based on an Espressif ESP32 microcontroller and offering Home Assistant support.

"Last year (2023) was Home Assistant's Year of the Voice," Tristam writes of the project, "so I thought there’d be no better way to start 2024 than by building my own Home Assistant powered smart speaker. It's based on an ESP32-LyraT dev board, which doesn't seem to be widely available [but] you can grab one on AliExpress or Amazon."

If you're the privacy-conscious type, this local voice assistant offers spoken home automation without sending your voice to remote servers. (📹: Tristam R.)

The Espressif ESP32-LyraT development board is built specifically with voice-controlled projects in mind, pairing an ESP32-WROVER-B module with microSD storage expansion with integrated audio capabilities including on-board amplification and two microphones for near- or far-field recording. Critical to a voice assistant project is its ability to recognize "wake words" — words or short phrases that bring the device out of sleep and start active recognition of spoken commands, as with "hey Siri" or "Alexa."

Elsewhere in the build is an Adafruit Neopixel Stick, which serves as a highly-visible indicator of when the wake word has been detected, and a Dayton Audio DMA45-4 1.5" driver for the device's responses. Everything is housed in a custom-built 3D-printable chassis, providing a mounting point for the driver, a cut-out for the LED strip, and an upper section that covers the ESP32-LyraT board while leaving its capacitive touch-inputs exposed and its microphones free to listen out for commands.

For software, Tristam turned to the ESPHome firmware with a voice pipeline configured for control of Home Assistant through spoken commands. This uses the open-source Whisper model to perform local speech recognition and Home Assistant's in-house Piper speech synthesis model for responses — all of which runs on the Home Assistant server itself, rather than on the ESP32 microcontroller.

The finishing touch: a fabric cover for the front, which hides the speaker and bare LED strip. "Print the frame STL that's on Printables and then glue some black (or whatever colour you want) fabric onto it," Tristam writes of this part of the build process. "I used [some] superglue and some clothes pegs to hold the fabric in place while the glue cures. You should now have a local voice assistant that'll let you control your smart home without the worry of Mr. Bezos listening in or the reliance on an internet connection!"

The full build guide is available on Tristam's website, while the STL files for the case have been uploaded to Printables under the Creative Commons Attribution-NonCommercial 4.0 International license.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles