Hackster is hosting Hackster Holidays, Finale: Livestream & Giveaway Drawing. Watch previous episodes or stream live on Tuesday!Stream Hackster Holidays, Finale on Tuesday!
Tinkerdoodle DIY
Published © MIT

Deep Learning Speech Commands Recognition on ESP32

Train a neural network model in 10 minutes, and use it on ESP32 with MicroPython to control a light switch. Everything done in browser.

BeginnerFull instructions provided15 minutes13,838
Deep Learning Speech Commands Recognition on ESP32

Things used in this project

Hardware components

M5StickC ESP32-PICO Mini IoT Development Board
M5Stack M5StickC ESP32-PICO Mini IoT Development Board
×1
SG90 Micro-servo motor
SG90 Micro-servo motor
×1

Software apps and online services

Tinkerdoodle online IDE

Story

Read more

Schematics

Connect SG-90 micro servo to M5StickC

Code

Model training using TensorFlow.js

JavaScript
Code snippet for model training. Refer to the page source of https://www.tinkerdoodle.cc/user/junfeng/speech-commands.html for full code.
// Full code and UI: https://www.tinkerdoodle.cc/user/junfeng/speech-commands.html
// Code is FYI only. You can just use the UI to train your model.
//
model = tf.sequential();
// Dense layer takes the output from base model as features.
model.add(tf.layers.dense({inputShape: [numFlattenFeatures], units: labels.length}));
// Softmax layer for classification.
model.add(tf.layers.softmax());
model.compile({
    optimizer: tf.train.adam(),
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']
});
var epochs = 20;
var currentEpoch = 1;
var info = await model.fit(x, y, {
    epochs,
    batchSize: 16,
    callbacks: {
        onBatchEnd: (batch, logs) => {
            if (batch === 0) {
                setMessage(`Epoch ${currentEpoch} out of ${epochs}.`);
                currentEpoch++;
            }
        }
    }
});
var finalAccuracy = info.history.acc[epochs - 1].toFixed(4);
tf.dispose([x, y]);

MicroPython program running on ESP32

Python
Code is based on M5StickC pinout. If you use different ESP32 board, change the Pin numbers accordingly. Refer to https://tinkerdoodle.cc/user/_/notebooks/Shared/Junfeng/Speech%20Commands%20Model.ipynb for full code and how to flash firmware.
# Full code: https://tinkerdoodle.cc/user/_/notebooks/Shared/Junfeng/Speech%20Commands%20Model.ipynb
#
# Demo program to control a servo connected to M5StickC using speech commands.
# It supports saving samples for model fine-tuning.
# To adjust the label, press right side button.
# To save sample, press front button.
import gc
import m5stickc_lcd
import speech_model
from machine import I2S, PWM, Pin, reset

# Use the M5StickC built-in microphone.
mic = I2S(I2S.NUM0, ws=Pin(0), sdin=Pin(34), mode=I2S.MASTER_PDW,
    dataformat=I2S.B16, channelformat=I2S.ONLY_RIGHT,
    samplerate=16000, dmacount=16, dmalen=256)
lcd = m5stickc_lcd.ST7735()
# M5StickC is capable of running one model inference every 224ms.
# 7168 / (16000 * 2) = 0.224
buffer = bytearray(7168)
servo = PWM(Pin(26), freq=50, duty=70)
label = ''
label_index = -1

def save_feature(pin):
    if label_index != -1:
        speech_model.save(speech_model.labels[label_index])
    else:
        speech_model.save(label)
    reset()

def select_label(pin):
    global label_index
    label_index = (label_index + 1) % len(speech_model.labels)
    lcd.fill(0)
    lcd.text(speech_model.labels[label_index], 10, 30, 0xffff)
    lcd.show()

# Use the front and right side buttons for sample capture.
Pin(37, Pin.IN).irq(handler=save_feature, trigger=Pin.IRQ_FALLING)
Pin(39, Pin.IN).irq(handler=select_label, trigger=Pin.IRQ_FALLING)
lcd.text('Ready!', 10, 10, 0xffff)
lcd.show()
gc.collect()

while True:
    mic.readinto(buffer)
    l, prob = speech_model.predict(buffer)
    gc.collect()
    if l == '[OTHER]' or prob <= 70:
        continue
    label = l
    speech_model.snapshot()
    if label == 'k': # Update to your own label.
        servo.duty(100)
    elif label == 'g': # Update to your own label.
        servo.duty(40)
    lcd.fill(0)
    lcd.text(label, 10, 30, 0xffff)
    lcd.text(str(prob), 10, 50, 0xffff)
    lcd.show()

mic.deinit()
lcd.fill(0)
lcd.text('Done', 10, 10, 0xffff)
lcd.show()

Credits

Tinkerdoodle DIY
3 projects • 8 followers

Comments