Hackster is hosting Hackster Holidays, Ep. 7: Livestream & Giveaway Drawing. Watch previous episodes or stream live on Friday!Stream Hackster Holidays, Ep. 7 on Friday!

Picovoice Aims to Deliver Snappier LLM-Powered AI Voice Assistants with Orca

By beginning speech synthesis while the LLM is still generating tokens for its response, Orca eliminates awkward delays.

Gareth Halfacree
8 months ago β€’ Machine Learning & AI

Lightweight machine-learning voice recognition and synthesis specialist Picovoice is aiming to make artificial intelligence (AI) systems driven by large language models (LLMs) like OpenAI's ChatGPT more natural β€” by giving them a voice that doesn't pause before delivering its response.

"Latency is a major drawback of LLM-based voice assistants," Picovoice explains of the problem its Orca streaming text-to-to-speech engine aims to solve. "The awkward silence when waiting for the AI agent's response defeats the use of cutting-edge genAI [generative artificial intelligence] to create humanlike interactions. The root cause is the combined delay of the LLM generating the response token-by-token and then the text-to-speech (TTS) synthesizing the audio."

Orca, which uses a "Plan Ahead, Don't Rush It" approach, aims to eliminate awkward pauses in voice responses from LLM assistants. (πŸ“Ή: Picovoice)

Currently, the company explains, the voice assistant industry has focused on speeding up the speech synthesis stage β€” while ignoring the far longer delay in the LLM, which responds to a prompt by chaining tokens into a plausible if not always factual response, generating the text to feed it.

Orca, by contrast, takes what Picovoice calls a "Plan Ahead, Don't Rush It" (PADRI) approach to the problem. Rather than waiting for the LLM to finish generating its response in full, Orca begins speaking during generation β€” meaning a near-two-second pause present in OpenAI's own text-to-speech service is cut to 0.19s.

"Orca isn't necessarily faster than OpenAI's TTS [at speech synthesis]," the company explains. "It may even be slower because OpenAI TTS runs on a data-center-grade NVIDIA GPU, while Orca TTS in [our] demo runs on a consumer-grade x86 AMD CPU. Yet, since Orca can start much earlier, it finishes reading before OpenAI TTS can even start."

More information on Orca is available on the Picovoice website.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles