Erik Bjorgan Makes Voice Cloning Easy with the Applio- and Piper-Based TextyMcSpeechy
Deployable post-training on modest hardware like a Raspberry Pi, TextyMcSpeechy delivers custom text-to-speech with minimum fuss.
Maker Erik Bjorgan's Raspberry Pi has a new trick up its sleeve: it can clone his voice, or anyone else's, and talk back — using a software workflow he's dubbed TextyMcSpeechy.
"I was looking for a simple way to clone my voice and use it with Piper for text-to-speech [TTS]. When I couldn't find anything simple, I made this," Bjorgan explains of the TextyMcSpeechy project. "With this you can make a TTS voice out of your own voice, or make a TTS voice out of thousands of existing RVC models by morphing a generic dataset using Applio."
TextyMcSpeechy itself is a purely software approach to the problem of convincing speech generation, building atop two existing projects: the Piper on-device text-to-speech neural network and the Applio transformer-based voice conversion tool. Using these, Bjorgan's tool can create custom models that mimic your voice — or anyone else's voice, for that matter.
The project's secret sauce is Applio: by applying its conversion capabilities to an existing voice dataset, it's possible to train Piper to mimic a target voice without needing the dataset to have originally been recorded by that person. "It is best if the person speaking in the dataset has a voice similar in tone and accent to the target voice," Bjorgan warns. "Keep in mind that some datasets include audio from multiple speakers."
While the training part of the process requires a suitably powerful workstation, with an NVIDIA GPU recommended as an accelerator, the speech generation can take place on modest hardware like a Raspberry Pi single-board computer. "I'm going to use it to haunt my smart home with dozens of celebrity robots via Home Assistant's open AI conversations integration," Bjorgan says of his plans for the software.
TextyMcSpeechy is available in Bjorgan's GitHub repository under the permissive MIT license, with more information available in his Reddit thread.