Espressif Aims to Promote the AIoT with an ESP-BOX, ChatGPT Demo Project
Using offline speech recognition and OpenAI's ChatGPT APIs, you can turn an ESP-BOX into an interactive tool for conversation and more.
Espressif's Ali Hassan Shah is looking to put the company's microcontrollers at the forefront of the Artificial Intelligence of Things (AIoT) revolution, penning a guide to infusing projects with OpenAI's popular yet divisive ChatGPT large language model using the ESP-BOX all-in-one development kit range.
"The world is witnessing a technological revolution, and OpenAI is at the forefront of this change," Shah claims. "One of its most exciting innovations is ChatGPT — that utilizes natural language processing to create more engaging and intuitive user experiences. The integration of OpenAI APIs with IoT devices has opened up a world of possibilities. ChatGPT with ESP-BOX [is] a powerful combination that can take IoT devices to the next level."
OpenAI's ChatGPT is a large language model which offers a conversational response to natural-language queries. It's extremely convincing, and has been used to power projects ranging from automated newspapers to robot control. Its shortcomings, however, have given others cause for concern — particularly as it is entirely incapable, fundamentally, of knowing whether the responses it is providing are in any way factually accurate.
For projects where that's understood, though, it's a powerful tool — and one Espressif is promoting for use with its ESP-BOX family of all-in-one microcontroller development kits. Designed for edge-AI work and powered by the ESP32-S3 microcontroller module, the ESP-BOX includes a display and integrated far-field microphone array. Hook that latter feature up to Espressif's ESP-SR offline speech recognition framework and OpenAI's ChatGPT, and you've a recipe for a device which can really understand you — or, at least, which gives that impression.
"The OpenAI API provides numerous functions that developers can leverage to enhance their applications. In our project, we utilized the Audio-to-Text and Completion APIs and implemented them using C-language code based on ESP-IDF," Shah explains. This was then connected to a text-to-speech system — Espressif's offline speech synthesis engine currently only supporting the Chinese language — and the LVGL library to drive the display.
Shah's full write-up, with source code snippets, is available on the Espressif blog.