Picovoice Launches On-Device Speech-to-Text Engines, Cheetah and Leopard

Available free of charge for up to 100 hours of transcription a month, these trainable on-device transcription models offer high accuracy.

Offline edge AI speech interaction specialist Picovoice has announced the launch of its "Speech-to-Text 2.0" engines, offering private-by-design audio transcription at the edge — and, it claims, undercutting its cloud-based rivals in price.

Best known for its voice activation offering, which allows edge devices including single-board computers and microcontrollers to recognize wake-words and commands via custom-trained recognition models, Picovoice's latest update adds full-text transcription to its feature set — operating, as with its other offerings, on-device without transferring any data to remote servers.

The new speech transcription service is split across two engines: Leopard, which offers non-streaming recognition for pre-recorded audio; and Cheetah, which provides real-time transcription at the cost of a 14.34 per cent word error rate (WER) to Leopard's 11 per cent WER. Both models, the company claims, come in at under 20MB, making them suitable for resource-constrained devices — though not microcontrollers.

The base models consist of a 300,000 word vocabulary, Picovoice says, with support for "type-and-train" customization via the Picovoice Console. At launch, the platform supports the English language only; the company has confirmed plans to add German, French, and Spanish this year and will address "further language support" starting 2023 based on market demand.

The company's last major software update for its voice recognition engines brought with it support for running locally on selected Arduino and compatible microcontrollers, as well as more powerful single-board computers like the Raspberry Pi family. Late last year the platform received a free usage tier, offering unlimited voice interactions for up to three simultaneous users — even for commercial use.

The new Picovoice Speech-to-Text 2.0 engines launch with their own free usage tier, again offering support for commercial use for up to 100 hours of audio transcription per month. Above that, its cheapest subscription is priced at $999 with a 75 per cent discount to "early-stage startups" — which, the company points out, makes it between a sixth and a twentieth the cost of its rivals at the 10,000-hour level even without the discount taken into account.

The transcription services follow voice recognition support for selected Arduino microcontroller boards — but require more powerful SBCs. (📹: Picovoice)

The speech-to-text platform has launched with support for Linux, Windows, and macOS desktops and laptops, Android and iOS mobile devices, the Raspberry Pi 3 and Raspberry Pi 4 single-board computer ranges, and NVIDIA's Jetson Nano. As with language support, additional platforms are due to be added this year — along with additional software development kits (SDKs) atop the launch Python, C, iOS, Android, Go, React Native, Flutter, Java, and .NET support.

More details, and the form to fill for a free-tier account, are available on the Picovoice website.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Get our weekly newsletter when you join Hackster.
Latest articles
Read more
Related articles