Speech-to-text technology is a game-changer for a wide range of projects. From enabling hands-free control in smart homes to creating accessible solutions for individuals with disabilities, the ability to convert spoken words into text opens endless possibilities. Whether you're building voice-activated automation, transcribing notes on the go, or integrating voice recognition into chatbots, speech-to-text can simplify user interactions and bring your ideas to life. With the ESP32 Dev Board and Deepgram Speech-to-Text API, you can achieve this seamlessly and efficiently, making it a must-have feature for innovative IoT projects.
Why Choose the ESP32 for Speech-to-Text?The ESP32 is a versatile microcontroller with built-in Wi-Fi and Bluetooth, making it ideal for IoT applications. Its dual-core processor and ample memory allow it to handle complex tasks like speech-to-text conversion without breaking a sweat. By leveraging the Deepgram Speech-to-Text API, we can achieve real-time speech recognition while keeping the ESP32’s processing demands minimal.
How It WorksThe ESP32 captures audio input through the INMP441 microphone and stores the recorded audio on an SD card. The stored audio file is then read from the SD card and sent to the Deepgram Speech-to-Text API. The API processes the audio data and returns the transcribed text, which can then be used for various applications like home automation, note-taking, or even chatbot interactions.
Hardware SetupConnect the INMP441 Microphone:
- Connect the I2S pins (WS, SD, and SCK) of the INMP441 to the corresponding pins on the ESP32 Dev Board.
- Ensure proper power and ground connections.
- Connect the INMP441 Microphone:
I2S MIC ESP32
GND -> GND
VDD -> 3.3V
SD -> D35
SCK -> D33
WS -> D22
L/R -> 3.3V
Connect the SD Card Module:
- Connect the SD card module to the SPI pins of the ESP32 (MOSI, MISO, SCK, and CS).
- Insert the 8GB SD card into the module.
- Connect the SD Card Module:
SD Card Module ESP32
GND -> GND
Vcc -> VIn
MISO -> D19
MOSI -> D23
SCK -> D18
CS -> D5
To use the Deepgram Speech-to-Text API, you need an API key. Follow these steps to create one:
Sign Up for a Deepgram Account:
- Visit Deepgram's website and create a free account.
- Sign Up for a Deepgram Account:
- With the new account you get $200 free credit
- Later click on "Create API Key"
- Give that it a name and later you can get the API key just copy and save it
Open the Arduino IDE and install the following libraries:
- ESP32 Core for Arduino (via the Board Manager version 3.4.0)
- HTTPClient for sending HTTP requests to the Deepgram API (Builtin Library)
- ArduinoJson for parsing JSON responses from the API (Need to Install)
The code files for this is already uploaded in the GitHub repository attached with the article
Detailed Tutorial VideoThis complete step by step process of doing Speech-to-Text using Deepgram is explained in our tutorial video so you can also refer to that video to clear all your doubts and also to see the practical working demo of it
#HappyMaking
Comments
Please log in or sign up to comment.