Overview
Development Environment
Data Collection
Training
Testing
Deployment
Hardware setup
Run inferencing
Demo
Casing
Live Demo
Conclusion

Published July 19, 2021 © MIT

Snoring Guardian

Using a Tensorflow Lite Micro model this edge device keeps an ear out and vibrates if you are snoring.

IntermediateFull instructions providedOver 1 day11,682

Things used in this project

Hardware components

Arduino Nano 33 BLE Sense

Low Current Lithium Ion Battery Pack - 2.5Ah (USB)

M5Stack Vibration Motor Unit (N20)

Software apps and online services

TensorFlow

Arduino IDE

Edge Impulse Studio

Hand tools and fabrication machines

Soldering iron (generic)

Solder Wire, Lead Free

Story

Overview

Snoring is estimated to affect 57% of men and 40% of women in the United States. It even occurs in up to 27% of children. These statistics demonstrate snoring is widespread, but its severity and health implications can vary. Snoring can be light, occasional, and unconcerning, or it may be the sign of a serious underlying sleep-related breathing disorder. Snoring is caused by the rattling and vibration of tissues near the airway in the back of the throat. During sleep, the muscles loosen, narrowing the airway, and as we inhale and exhale, the moving air causes the tissue to flutter and make noise. Obstructive sleep apnea is a breathing disorder in which the airway gets blocked or collapsed during sleep, causing repeated lapses in breath. Snoring is one of the most common symptoms of obstructive sleep apnea. Unless someone else tells them, most people who snore are not aware of it, and this is part of why sleep apnea is under-diagnosed. In this project I have built a proof of concept of a non-invasive low-powered edge device which monitors and vibrates if you are snoring.

Development Environment

We are using Edge Impulse Studio for the feature generation and TensorFlow Lite model creation and training. We need to sign up a free account at https://studio.edgeimpulse.com and create a project to get started. For the local development work MacOS is used.

Data Collection

We have used Audioset, a large-scale dataset of manually annotated audio events, to download Snoring and other nature sounds which may occur during night. AudioSet consists of an expanding ontology of 632 audio event classes and a collection of human-labeled 10-second sound clips drawn from YouTube videos. The audio are extracted from the YouTube videos of the select events and converted into Waveform Audio file format (wav) with 16-bit depth mono channel at 16KHz sample rate. The following categories selected from the Audioset Ontology are downloaded. The first column is the category ID and second column is category label.

/m/01d3sd  Snoring 
/m/07yv9   Vehicle
/m/01jt3m  Toilet flush
/m/06mb1   Rain
/m/03m9d0z Wind
/m/07c52   Television
/m/06bz3   Radio
/m/028v0c  Silence
/m/03vt0   Insect
/m/07qjznl Tick-tock
/m/0bt9lr  Dog
/m/01hsr_  Sneeze
/m/01b_21  Cough
/m/07ppn3j Sniff
/m/07pbtc8 Walk, footsteps
/m/02fxyj  Humming
/m/07q6cd_ Squeak
/m/0btp2   Traffic noise, roadway noise
/m/09l8g   Human Voice
/m/07pggtn Chirp, tweet
/t/dd00002 Baby cry, infant cry
/m/04rlf   Music

The datasets are divided into two categories, Snoring and Noise. Two CSV files are created snoring.csv and noise.csv by filtering the balanced train, unbalanced train and evaluation datasets CSV files which contains YouTube clip URL and other metadata and can be downloaded from here.

The below bash script (download.sh) is used to download the video clip and extract the audio as wav file. Please install youtube-dl and ffmpeg before running the command below.

#!/bin/bash

SAMPLE_RATE=16000
# fetch_youtube_clip(videoID, startTime, endTime)
fetch_youtube_clip() {
  echo "Fetching $1 ($2 to $3)..."
  outname="$1_$2"
  if [ -f "${outname}.wav" ]; then
    echo "File already exists."
  return
fi
  youtube-dl https://youtube.com/watch?v=$1 \
  --quiet --extract-audio --audio-format wav \
  --output "$outname.%(ext)s"
  if [ $? -eq 0 ]; then
    yes | ffmpeg -loglevel quiet -i "./$outname.wav" -ar $SAMPLE_RATE \
    -ac 1 -ss "$2" -to "$3" "./${outname}_out.wav"
    mv "./${outname}_out.wav" "./$outname.wav"
  else
    sleep 1
  fi
}

grep -E '^[^#]' | while read line
do
  fetch_youtube_clip $(echo "$line" | sed -E 's/, / /g')
done

To execute the script run the command below.

$ cat noise.csv | ./download.sh
$ cat snoring.csv | ./download.sh

The datasets are uploaded to the Edge Impulse Studio using the Edge Impulse Uploader. Please follow instructions here to install Edge Impulse CLI tools and execute the commands below.

$ edge-impulse-uploader --category split --label snoring  snoring/*.wav
$ edge-impulse-uploader --category split --label noise  noise/*.wav

The commands above also splits the datasets into training and testing samples. We can see the uploaded datasets in the Edge Impulse Studio's Data Acquisition page.

The Snoring events audio clips have background noise in between multiple snoring events which are removed from the clips by splitting the segments. The Noise category audio clips are used without any modifications.

We can do splitting by selecting each sample and clicking on Split sample from the drop down menu but it is a time-consuming and tedious work. Luckily there is an Edge Impulse SDK API which can be used to automate the process.

import json
import requests
import logging
import threading

API_KEY = "<Insert Edge Impulse API Key here from the Dashboard > Keys"
projectId = "<Your project ID, can be found at Edge Impulse dashboard"
headers = {
  "Accept": "application/json",
  "x-api-key": API_KEY
}
def segment(tid, ids):
    for sampleId in ids:
    url1 = "https://studio.edgeimpulse.com/v1/api/{}/raw-data/{}/find-segments".format(projectId, sampleId)
    payload1 = {
        "shiftSegments": True,
        "segmentLengthMs": 1500
    }
    response1 = requests.request("POST", url1, json=payload1, headers=headers)
    resp1 = json.loads(response1.text)
    segments = resp1["segments"]
    if len(segments) == 0:
        continue
    payload2 = {"segments": segments}
    url2 = "https://studio.edgeimpulse.com/v1/api/{}/raw-data/{}/segment".format(projectId, sampleId)
    response2 = requests.request("POST", url2, json=payload2, headers=headers)
    logging.info('{} {} {}'.format(tid, sampleId, response2.text))

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
    datefmt="%H:%M:%S")
    querystring = {"category":"testing", "excludeSensors":"true"}
    url = "https://studio.edgeimpulse.com/v1/api/{}/raw-data".format(projectId)
    response = requests.request("GET", url, headers=headers, params=querystring)
    resp = json.loads(response.text)
    id_list = list(map(lambda s: s["id"], resp["samples"]))
    div = 8
    n = int(len(id_list) / div)
    threads = list()
    for i in range(div):
    if i == (div - 1):
        ids = id_list[n*i: ]
    else:
        ids = id_list[n*i: n*(i+1)]

    x = threading.Thread(target=segment, args=(i, ids))
    threads.append(x)
    x.start()
    for thread in threads:
        thread.join()
    logging.info("Finished")

Training

Go to the Impulse Design > Create Impulse page and click at the Add a processing block and choose Spectrogram which is a visual way of representing the signal strength, or “loudness”, of a signal over time at various frequencies present in a particular waveform. Also, at the same page click at the Add a learning block and choose Neural Network(Keras) which learns patterns from data, and can apply these to new data. We have chosen 1000ms Window size and 125ms Window increase. Now click on the Save Impulse button.

Now go to the Impulse Design > Spectrogram page and change the parameters as shown in the image below and click at Save parameters button. We have chosen Frame Length = 0.02s, frame stride = 0.01538s, frequency bands = 128 (FFT size), and Noise floor = -54 dB. The Noise floor is used to filter out the background noise in the spectrogram. It first divides the window into multiple overlapping frames. The size and number of frames can be adjusted with the parameters Frame length and Frame stride. For example, with a window of 1000ms, frame length of 20ms and stride of 15.38ms, it will create 65 time frames. Each time frame is then divided into frequency bins using an FFT (Fast Fourier Transform) and we compute its power spectrum. The number of frequency bins equals to the Frequency bands parameter divided by 2 plus 1. The features generated by the Spectrogram block are equal to the number of generated time frames times the number of frequency bins.

Clicking on Save parameters button redirects to another page where we should click on Generate Feature button. It usually takes couple of minutes to complete feature generation. We can see the 3D visualization of the generated features in the Feature Explorer.

Now go to the Impulse Design > NN Classifier page and select Switch to Keras (expert) mode from the drop down menu and define the model architecture. There are many off-the-shelf audio classification models available but they have large number of parameters hence not suitable for the microcontrollers with 256KB or less memory. After a lot of trials we have created a model architecture shown below.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, ReLU, Dropout, MaxPooling2D, Dense
from tensorflow.keras.optimizers.schedules import InverseTimeDecay
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers.experimental import preprocessing

sys.path.append('./resources/libraries')
import ei_tensorflow.training

channels = 1
columns = 65
rows = int(input_length / (columns * channels))

norm_layer = preprocessing.Normalization()
norm_layer.adapt(train_dataset.map(lambda x, _: x))

# model architecture
model = Sequential()
model.add(Reshape((rows, columns, channels), input_shape=(input_length, )))
model.add(preprocessing.Resizing(24, 24, interpolation='nearest'))
model.add(norm_layer)

model.add(Conv2D(16, kernel_size=3))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))

model.add(Conv2D(32, kernel_size=3))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
model.add(MaxPooling2D(pool_size=2, strides=2, padding='same'))
model.add(Dropout(0.7))

model.add(Flatten())

model.add(Dense(64))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
#model.add(Dropout(0.50))

model.add(Dense(32))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
#model.add(Dropout(0.50))

model.add(Dense(classes, activation='softmax', name='y_pred'))

BATCH_SIZE = 64

lr_schedule = InverseTimeDecay(
  0.0005,
  decay_steps=train_sample_count//BATCH_SIZE*15,
  decay_rate=1,
  staircase=False)

def get_optimizer():
  return Adam(lr_schedule)

train_dataset = train_dataset.batch(BATCH_SIZE, drop_remainder=False)
validation_dataset = validation_dataset.batch(BATCH_SIZE, drop_remainder=False)
callbacks.append(BatchLoggerCallback(BATCH_SIZE, train_sample_count))

# train the neural network
model.compile(loss='categorical_crossentropy', optimizer=get_optimizer(), metrics=['accuracy'])

print(model.summary())

model.fit(train_dataset, epochs=70, validation_data=validation_dataset, verbose=2, callbacks=callbacks)

While defining the model architecture we have tried our best to optimize it for the TinyML use case. Since 64x65 single channel spectrogram features would have large numbers of training parameters and the compiled model would not fit into the available microcontroller RAM, we have resized the spectrogram to 24x24 size which is a sweet spot for size vs accuracy of the model. Also, we have used restricted-ranged activation (ReLU6) because ReLU6 restricts the output to [0, 6] and the post training quantization does not degrade the accuracy. The model summary is given below.

Model: "sequential"
_________________________________________________________________ 
Layer (type)                 Output Shape              Param #    
================================================================= 
reshape (Reshape)            (None, 64, 65, 1)         0          
_________________________________________________________________ 
resizing (Resizing)          (None, 24, 24, 1)         0         
 _________________________________________________________________ 
normalization (Normalization (None, 24, 24, 1)         3          
_________________________________________________________________ 
conv2d (Conv2D)              (None, 22, 22, 16)        160        
_________________________________________________________________ 
re_lu (ReLU)                 (None, 22, 22, 16)        0          
_________________________________________________________________ 
conv2d_1 (Conv2D)            (None, 20, 20, 32)        4640       
_________________________________________________________________ 
re_lu_1 (ReLU)               (None, 20, 20, 32)        0          
_________________________________________________________________ 
max_pooling2d (MaxPooling2D) (None, 10, 10, 32)        0          
_________________________________________________________________ 
dropout (Dropout)            (None, 10, 10, 32)        0          
_________________________________________________________________ 
flatten (Flatten)            (None, 3200)              0          
_________________________________________________________________ 
dense (Dense)                (None, 64)                204864     
_________________________________________________________________ 
re_lu_2 (ReLU)               (None, 64)                0          
_________________________________________________________________ 
dense_1 (Dense)              (None, 32)                2080       
_________________________________________________________________ 
re_lu_3 (ReLU)               (None, 32)                0          
_________________________________________________________________ 
y_pred (Dense)               (None, 2)                 66         
================================================================= 
Total params: 211,813 
Trainable params: 211,810 
Non-trainable params: 3

Now click at Start Training button and wait for around an hour until training is completed. We can see the Training output below. The model has 94.6% accuracy.

Testing

We can test the model on the test datasets by going to the Model testing page and click on Classify all button. The model has 88.58% accuracy on the test datasets.

Deployment

Since we will be deploying the model at Arduino Nano BLE sense, at the Deployment page we will choose Create Library > Arduino option. For the Select optimization option, we will choose Enable EON Compiler which reduces the memory usage of the model. Also, we will opt for Quantized (Int8) model. Now click at the Build button and in few seconds the library bundle will be downloaded at the local computer.

Hardware setup

We will be using Arduino Nano 33 BLE Sense which have an onboard microphone. Since the 5V pin comes disconnected by default on the Arduino Nano 33 BLE Sense, to power the vibration motor using 5V pin we would need to make a solder bridge between the two pads marked as VUSB (highlighted by red rectangle in the image below).

The vibration motor is connected using a Grove connector directly soldered on the Arduino Nano BLE sense header pins. The schematics can be found in the Schematics section.

Run inferencing

Please follow instructions here to download and install Arduino IDE. After installation, open the Arduino IDE and install the board package for Arduino Nano 33 BLE Sense by going to Tools > Board > Boards Manager. Search the board package as shown below and install it.

After board package installation is completed, choose the Arduino Nano 33 BLE from Tools > Board > Arduino Mbed OS Nano Boards menu. Also, select serial port of the connected development boards from Tools > Port menu. We need to install RingBuffer library using the Library Manager (Tool > Manage Libraries...).

Below is the code for inferencing. The application captures the audio events continuously using the double buffer.

// If your target is limited in memory remove this macro to save 10K RAM
#define EIDSP_QUANTIZE_FILTERBANK   0

/**
   Define the number of slices per model window. E.g. a model window of 1000 ms
   with slices per model window set to 4. Results in a slice size of 250 ms.
   For more info: https://docs.edgeimpulse.com/docs/continuous-audio-sampling
*/
#define EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW 3

/* Includes ---------------------------------------------------------------- */
#include <PDM.h>
#include <Scheduler.h>
#include <RingBuf.h>
#include <snore_detection_inferencing.h>

/** Audio buffers, pointers and selectors */
typedef struct {
  signed short *buffers[2];
  unsigned char buf_select;
  unsigned char buf_ready;
  unsigned int buf_count;
  unsigned int n_samples;
} inference_t;

static inference_t inference;
static bool record_ready = false;
static signed short *sampleBuffer;
static bool debug_nn = false; // Set this to true to see e.g. features generated from the raw signal
static int print_results = -(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW);

bool alert = false;

RingBuf<uint8_t, 10> last_ten_predictions;
int greenLED = 23;
int vibratorPin = 3;   // Vibration motor connected to D3 PWM pin
bool is_motor_running = false;

void run_vibration()
{
  if (alert)
  {
    is_motor_running = true;

    for (int i = 0; i < 2; i++)
    {
      analogWrite(vibratorPin, 30);
      delay(1000);
      analogWrite(vibratorPin, 0);
      delay(1500);
    }
    
    is_motor_running = false;
  } else {
    if (is_motor_running)
    {
      analogWrite(vibratorPin, 0);
    }
  }
  yield();
}



/**
   @brief      Printf function uses vsnprintf and output using Arduino Serial

   @param[in]  format     Variable argument list
*/
void ei_printf(const char *format, ...) {
  static char print_buf[1024] = { 0 };

  va_list args;
  va_start(args, format);
  int r = vsnprintf(print_buf, sizeof(print_buf), format, args);
  va_end(args);

  if (r > 0) {
    Serial.write(print_buf);
  }
}

/**
   @brief      PDM buffer full callback
               Get data and call audio thread callback
*/
static void pdm_data_ready_inference_callback(void)
{
  int bytesAvailable = PDM.available();

  // read into the sample buffer
  int bytesRead = PDM.read((char *)&sampleBuffer[0], bytesAvailable);

  if (record_ready == true) {
    for (int i = 0; i<bytesRead >> 1; i++) {
      inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[i];

      if (inference.buf_count >= inference.n_samples) {
        inference.buf_select ^= 1;
        inference.buf_count = 0;
        inference.buf_ready = 1;
      }
    }
  }
}

/**
   @brief      Init inferencing struct and setup/start PDM

   @param[in]  n_samples  The n samples

   @return     { description_of_the_return_value }
*/
static bool microphone_inference_start(uint32_t n_samples)
{
  inference.buffers[0] = (signed short *)malloc(n_samples * sizeof(signed short));

  if (inference.buffers[0] == NULL) {
    return false;
  }

  inference.buffers[1] = (signed short *)malloc(n_samples * sizeof(signed short));

  if (inference.buffers[0] == NULL) {
    free(inference.buffers[0]);
    return false;
  }

  sampleBuffer = (signed short *)malloc((n_samples >> 1) * sizeof(signed short));

  if (sampleBuffer == NULL) {
    free(inference.buffers[0]);
    free(inference.buffers[1]);
    return false;
  }

  inference.buf_select = 0;
  inference.buf_count = 0;
  inference.n_samples = n_samples;
  inference.buf_ready = 0;

  // configure the data receive callback
  PDM.onReceive(&pdm_data_ready_inference_callback);

  PDM.setBufferSize((n_samples >> 1) * sizeof(int16_t));

  // initialize PDM with:
  // - one channel (mono mode)
  // - a 16 kHz sample rate
  if (!PDM.begin(1, EI_CLASSIFIER_FREQUENCY)) {
    ei_printf("Failed to start PDM!");
  }

  // set the gain, defaults to 20
  PDM.setGain(127);

  record_ready = true;

  return true;
}

/**
   @brief      Wait on new data

   @return     True when finished
*/
static bool microphone_inference_record(void)
{
  bool ret = true;

  if (inference.buf_ready == 1) {
    ei_printf(
      "Error sample buffer overrun. Decrease the number of slices per model window "
      "(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)\n");
    ret = false;
  }

  while (inference.buf_ready == 0) {
    delay(1);
  }

  inference.buf_ready = 0;

  return ret;
}

/**
   Get raw audio signal data
*/
static int microphone_audio_signal_get_data(size_t offset, size_t length, float * out_ptr)
{
  numpy::int16_to_float(&inference.buffers[inference.buf_select ^ 1][offset], out_ptr, length);

  return 0;
}

/**
   @brief      Stop PDM and release buffers
*/
static void microphone_inference_end(void)
{
  PDM.end();
  free(inference.buffers[0]);
  free(inference.buffers[1]);
  free(sampleBuffer);
}


void setup()
{
  Serial.begin(115200);

  pinMode(greenLED, OUTPUT);
  pinMode(greenLED, LOW); 
  pinMode(vibratorPin, OUTPUT);  // sets the pin as output

  // summary of inferencing settings (from model_metadata.h)
  ei_printf("Inferencing settings:\n");
  ei_printf("\tInterval: %.2f ms.\n", (float)EI_CLASSIFIER_INTERVAL_MS);
  ei_printf("\tFrame size: %d\n", EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE);
  ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);
  ei_printf("\tNo. of classes: %d\n", sizeof(ei_classifier_inferencing_categories) /
            sizeof(ei_classifier_inferencing_categories[0]));

  run_classifier_init();
  if (microphone_inference_start(EI_CLASSIFIER_SLICE_SIZE) == false) {
    ei_printf("ERR: Failed to setup audio sampling\r\n");
    return;
  }

  Scheduler.startLoop(run_vibration);
}

void loop()
{

  bool m = microphone_inference_record();

  if (!m) {
    ei_printf("ERR: Failed to record audio...\n");
    return;
  }

  signal_t signal;
  signal.total_length = EI_CLASSIFIER_SLICE_SIZE;
  signal.get_data = &microphone_audio_signal_get_data;
  ei_impulse_result_t result = {0};

  EI_IMPULSE_ERROR r = run_classifier_continuous(&signal, &result, debug_nn);
  if (r != EI_IMPULSE_OK) {
    ei_printf("ERR: Failed to run classifier (%d)\n", r);
    return;
  }

  if (++print_results >= (EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)) {
    // print the predictions
    ei_printf("Predictions ");
    ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
              result.timing.dsp, result.timing.classification, result.timing.anomaly);
    ei_printf(": \n");

    for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
      ei_printf("    %s: %.5f\n", result.classification[ix].label,
                result.classification[ix].value);

      if (ix == 1 && !is_motor_running && result.classification[ix].value > 0.9) {
        if (last_ten_predictions.isFull()) {
          uint8_t k;
          last_ten_predictions.pop(k);
        }

        last_ten_predictions.push(ix);

        uint8_t count = 0;

        for (uint8_t j = 0; j < last_ten_predictions.size(); j++) {
          count += last_ten_predictions[j];
          //ei_printf("%d, ", last_ten_predictions[j]);
        }
        //ei_printf("\n");
        ei_printf("Snoring\n");
        pinMode(greenLED, HIGH); 
        if (count >= 5) {
          ei_printf("Trigger vibration motor\n");
          alert = true;
        }
      }  else {
        ei_printf("Noise\n");
        pinMode(greenLED, LOW); 
        alert = false;
      }

      print_results = 0;
    }
  }
}


#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_MICROPHONE
#error "Invalid model for current sensor."
#endif

To run the inferencing sketch, clone the application repository using the command below.

$ git clone https://github.com/metanav/Snoring_Guardian.git

Import the library bundle Snoring_Guardian/Snoring_detection_inferencing.zip using the menu Sketch > Include Library > Add.ZIP Library in the Arduino IDE. Open the inferencing sketch by navigating the menu File > Examples > Snoring_detection_inferencing > tflite_micro_snoring_detection and compile/upload the firmware to the connected development board. We can see the inferencing output using the Tools > Serial Monitor with baud rate 115200 bps.

Demo

Casing

The final version device is placed inside a pouch bag with a power bank. There is a small opening in the pouch bag which allows sound to be heard by the microphone which is positioned near the opening.

Live Demo

Conclusion

This project presents a solution for a real life problem which seems funny but needs careful attention. It is an easy-to-use and convenient device which respects users privacy by running the inferencing at the edge. This project also showcases that a simple neural network can be used to solve complex problem with signal processing done in the right way and can be run on a low-powered resource-constrained tiny devices. Although the TensorFlow Lite Micro model runs quite well but there is still room for improvement. With more training data the model can be made more accurate and robust.

This project was created for the TensorFlow Microcontroller Challenge. I would like to thank Google's TensorFlow Micro team to provide me a complimentary Google I/O Kit supplied by Sparkfun Electronics.