Snoring is estimated to affect 57% of men and 40% of women in the United States. It even occurs in up to 27% of children. These statistics demonstrate snoring is widespread, but its severity and health implications can vary. Snoring can be light, occasional, and unconcerning, or it may be the sign of a serious underlying sleep-related breathing disorder. Snoring is caused by the rattling and vibration of tissues near the airway in the back of the throat. During sleep, the muscles loosen, narrowing the airway, and as we inhale and exhale, the moving air causes the tissue to flutter and make noise. Obstructive sleep apnea is a breathing disorder in which the airway gets blocked or collapsed during sleep, causing repeated lapses in breath. Snoring is one of the most common symptoms of obstructive sleep apnea. Unless someone else tells them, most people who snore are not aware of it, and this is part of why sleep apnea is under-diagnosed. In this project I have built a proof of concept of a non-invasive low-powered edge device which monitors and vibrates if you are snoring.
Development EnvironmentWe are using Edge Impulse Studio for the feature generation and TensorFlow Lite model creation and training. We need to sign up a free account at https://studio.edgeimpulse.com and create a project to get started. For the local development work MacOS is used.
Data CollectionWe have used Audioset, a large-scale dataset of manually annotated audio events, to download Snoring and other nature sounds which may occur during night. AudioSet consists of an expanding ontology of 632 audio event classes and a collection of human-labeled 10-second sound clips drawn from YouTube videos. The audio are extracted from the YouTube videos of the select events and converted into Waveform Audio file format (wav) with 16-bit depth mono channel at 16KHz sample rate. The following categories selected from the Audioset Ontology are downloaded. The first column is the category ID and second column is category label.
/m/01d3sd Snoring
/m/07yv9 Vehicle
/m/01jt3m Toilet flush
/m/06mb1 Rain
/m/03m9d0z Wind
/m/07c52 Television
/m/06bz3 Radio
/m/028v0c Silence
/m/03vt0 Insect
/m/07qjznl Tick-tock
/m/0bt9lr Dog
/m/01hsr_ Sneeze
/m/01b_21 Cough
/m/07ppn3j Sniff
/m/07pbtc8 Walk, footsteps
/m/02fxyj Humming
/m/07q6cd_ Squeak
/m/0btp2 Traffic noise, roadway noise
/m/09l8g Human Voice
/m/07pggtn Chirp, tweet
/t/dd00002 Baby cry, infant cry
/m/04rlf Music
The datasets are divided into two categories, Snoring and Noise. Two CSV files are created snoring.csv and noise.csv by filtering the balanced train, unbalanced train and evaluation datasets CSV files which contains YouTube clip URL and other metadata and can be downloaded from here.
The below bash script (download.sh) is used to download the video clip and extract the audio as wav file. Please install youtube-dl and ffmpeg before running the command below.
#!/bin/bash
SAMPLE_RATE=16000
# fetch_youtube_clip(videoID, startTime, endTime)
fetch_youtube_clip() {
echo "Fetching $1 ($2 to $3)..."
outname="$1_$2"
if [ -f "${outname}.wav" ]; then
echo "File already exists."
return
fi
youtube-dl https://youtube.com/watch?v=$1 \
--quiet --extract-audio --audio-format wav \
--output "$outname.%(ext)s"
if [ $? -eq 0 ]; then
yes | ffmpeg -loglevel quiet -i "./$outname.wav" -ar $SAMPLE_RATE \
-ac 1 -ss "$2" -to "$3" "./${outname}_out.wav"
mv "./${outname}_out.wav" "./$outname.wav"
else
sleep 1
fi
}
grep -E '^[^#]' | while read line
do
fetch_youtube_clip $(echo "$line" | sed -E 's/, / /g')
done
To execute the script run the command below.
$ cat noise.csv | ./download.sh
$ cat snoring.csv | ./download.sh
The datasets are uploaded to the Edge Impulse Studio using the Edge Impulse Uploader. Please follow instructions here to install Edge Impulse CLI tools and execute the commands below.
$ edge-impulse-uploader --category split --label snoring snoring/*.wav
$ edge-impulse-uploader --category split --label noise noise/*.wav
The commands above also splits the datasets into training and testing samples. We can see the uploaded datasets in the Edge Impulse Studio's Data Acquisition page.
The Snoring events audio clips have background noise in between multiple snoring events which are removed from the clips by splitting the segments. The Noise category audio clips are used without any modifications.
We can do splitting by selecting each sample and clicking on Split sample from the drop down menu but it is a time-consuming and tedious work. Luckily there is an Edge Impulse SDK API which can be used to automate the process.
import json
import requests
import logging
import threading
API_KEY = "<Insert Edge Impulse API Key here from the Dashboard > Keys"
projectId = "<Your project ID, can be found at Edge Impulse dashboard"
headers = {
"Accept": "application/json",
"x-api-key": API_KEY
}
def segment(tid, ids):
for sampleId in ids:
url1 = "https://studio.edgeimpulse.com/v1/api/{}/raw-data/{}/find-segments".format(projectId, sampleId)
payload1 = {
"shiftSegments": True,
"segmentLengthMs": 1500
}
response1 = requests.request("POST", url1, json=payload1, headers=headers)
resp1 = json.loads(response1.text)
segments = resp1["segments"]
if len(segments) == 0:
continue
payload2 = {"segments": segments}
url2 = "https://studio.edgeimpulse.com/v1/api/{}/raw-data/{}/segment".format(projectId, sampleId)
response2 = requests.request("POST", url2, json=payload2, headers=headers)
logging.info('{} {} {}'.format(tid, sampleId, response2.text))
if __name__ == "__main__":
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
datefmt="%H:%M:%S")
querystring = {"category":"testing", "excludeSensors":"true"}
url = "https://studio.edgeimpulse.com/v1/api/{}/raw-data".format(projectId)
response = requests.request("GET", url, headers=headers, params=querystring)
resp = json.loads(response.text)
id_list = list(map(lambda s: s["id"], resp["samples"]))
div = 8
n = int(len(id_list) / div)
threads = list()
for i in range(div):
if i == (div - 1):
ids = id_list[n*i: ]
else:
ids = id_list[n*i: n*(i+1)]
x = threading.Thread(target=segment, args=(i, ids))
threads.append(x)
x.start()
for thread in threads:
thread.join()
logging.info("Finished")
TrainingGo to the Impulse Design > Create Impulse page and click at the Add a processing block and choose Spectrogram which is a visual way of representing the signal strength, or “loudness”, of a signal over time at various frequencies present in a particular waveform. Also, at the same page click at the Add a learning block and choose Neural Network(Keras) which learns patterns from data, and can apply these to new data. We have chosen 1000ms Window size and 125ms Window increase. Now click on the Save Impulse button.
Now go to the Impulse Design > Spectrogram page and change the parameters as shown in the image below and click at Save parameters button. We have chosen Frame Length = 0.02s, frame stride = 0.01538s, frequency bands = 128 (FFT size), and Noise floor = -54 dB. The Noise floor is used to filter out the background noise in the spectrogram. It first divides the window into multiple overlapping frames. The size and number of frames can be adjusted with the parameters Frame length and Frame stride. For example, with a window of 1000ms, frame length of 20ms and stride of 15.38ms, it will create 65 time frames. Each time frame is then divided into frequency bins using an FFT (Fast Fourier Transform) and we compute its power spectrum. The number of frequency bins equals to the Frequency bands parameter divided by 2 plus 1. The features generated by the Spectrogram block are equal to the number of generated time frames times the number of frequency bins.
Clicking on Save parameters button redirects to another page where we should click on Generate Feature button. It usually takes couple of minutes to complete feature generation. We can see the 3D visualization of the generated features in the Feature Explorer.
Now go to the Impulse Design > NN Classifier page and select Switch to Keras (expert) mode from the drop down menu and define the model architecture. There are many off-the-shelf audio classification models available but they have large number of parameters hence not suitable for the microcontrollers with 256KB or less memory. After a lot of trials we have created a model architecture shown below.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, ReLU, Dropout, MaxPooling2D, Dense
from tensorflow.keras.optimizers.schedules import InverseTimeDecay
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers.experimental import preprocessing
sys.path.append('./resources/libraries')
import ei_tensorflow.training
channels = 1
columns = 65
rows = int(input_length / (columns * channels))
norm_layer = preprocessing.Normalization()
norm_layer.adapt(train_dataset.map(lambda x, _: x))
# model architecture
model = Sequential()
model.add(Reshape((rows, columns, channels), input_shape=(input_length, )))
model.add(preprocessing.Resizing(24, 24, interpolation='nearest'))
model.add(norm_layer)
model.add(Conv2D(16, kernel_size=3))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
model.add(Conv2D(32, kernel_size=3))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
model.add(MaxPooling2D(pool_size=2, strides=2, padding='same'))
model.add(Dropout(0.7))
model.add(Flatten())
model.add(Dense(64))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
#model.add(Dropout(0.50))
model.add(Dense(32))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(ReLU(6.0))
#model.add(Dropout(0.50))
model.add(Dense(classes, activation='softmax', name='y_pred'))
BATCH_SIZE = 64
lr_schedule = InverseTimeDecay(
0.0005,
decay_steps=train_sample_count//BATCH_SIZE*15,
decay_rate=1,
staircase=False)
def get_optimizer():
return Adam(lr_schedule)
train_dataset = train_dataset.batch(BATCH_SIZE, drop_remainder=False)
validation_dataset = validation_dataset.batch(BATCH_SIZE, drop_remainder=False)
callbacks.append(BatchLoggerCallback(BATCH_SIZE, train_sample_count))
# train the neural network
model.compile(loss='categorical_crossentropy', optimizer=get_optimizer(), metrics=['accuracy'])
print(model.summary())
model.fit(train_dataset, epochs=70, validation_data=validation_dataset, verbose=2, callbacks=callbacks)
While defining the model architecture we have tried our best to optimize it for the TinyML use case. Since 64x65 single channel spectrogram features would have large numbers of training parameters and the compiled model would not fit into the available microcontroller RAM, we have resized the spectrogram to 24x24 size which is a sweet spot for size vs accuracy of the model. Also, we have used restricted-ranged activation (ReLU6) because ReLU6 restricts the output to [0, 6] and the post training quantization does not degrade the accuracy. The model summary is given below.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
reshape (Reshape) (None, 64, 65, 1) 0
_________________________________________________________________
resizing (Resizing) (None, 24, 24, 1) 0
_________________________________________________________________
normalization (Normalization (None, 24, 24, 1) 3
_________________________________________________________________
conv2d (Conv2D) (None, 22, 22, 16) 160
_________________________________________________________________
re_lu (ReLU) (None, 22, 22, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 20, 20, 32) 4640
_________________________________________________________________
re_lu_1 (ReLU) (None, 20, 20, 32) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 10, 10, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 10, 10, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 3200) 0
_________________________________________________________________
dense (Dense) (None, 64) 204864
_________________________________________________________________
re_lu_2 (ReLU) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 32) 2080
_________________________________________________________________
re_lu_3 (ReLU) (None, 32) 0
_________________________________________________________________
y_pred (Dense) (None, 2) 66
=================================================================
Total params: 211,813
Trainable params: 211,810
Non-trainable params: 3
Now click at Start Training button and wait for around an hour until training is completed. We can see the Training output below. The model has 94.6% accuracy.
We can test the model on the test datasets by going to the Model testing page and click on Classify all button. The model has 88.58% accuracy on the test datasets.
Since we will be deploying the model at Arduino Nano BLE sense, at the Deployment page we will choose Create Library > Arduino option. For the Select optimization option, we will choose Enable EON Compiler which reduces the memory usage of the model. Also, we will opt for Quantized (Int8) model. Now click at the Build button and in few seconds the library bundle will be downloaded at the local computer.
Hardware setupWe will be using Arduino Nano 33 BLE Sense which have an onboard microphone. Since the 5V pin comes disconnected by default on the Arduino Nano 33 BLE Sense, to power the vibration motor using 5V pin we would need to make a solder bridge between the two pads marked as VUSB (highlighted by red rectangle in the image below).
The vibration motor is connected using a Grove connector directly soldered on the Arduino Nano BLE sense header pins. The schematics can be found in the Schematics section.
Please follow instructions here to download and install Arduino IDE. After installation, open the Arduino IDE and install the board package for Arduino Nano 33 BLE Sense by going to Tools > Board > Boards Manager. Search the board package as shown below and install it.
After board package installation is completed, choose the Arduino Nano 33 BLE from Tools > Board > Arduino Mbed OS Nano Boards menu. Also, select serial port of the connected development boards from Tools > Port menu. We need to install RingBuffer library using the Library Manager (Tool > Manage Libraries...).
Below is the code for inferencing. The application captures the audio events continuously using the double buffer.
// If your target is limited in memory remove this macro to save 10K RAM
#define EIDSP_QUANTIZE_FILTERBANK 0
/**
Define the number of slices per model window. E.g. a model window of 1000 ms
with slices per model window set to 4. Results in a slice size of 250 ms.
For more info: https://docs.edgeimpulse.com/docs/continuous-audio-sampling
*/
#define EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW 3
/* Includes ---------------------------------------------------------------- */
#include <PDM.h>
#include <Scheduler.h>
#include <RingBuf.h>
#include <snore_detection_inferencing.h>
/** Audio buffers, pointers and selectors */
typedef struct {
signed short *buffers[2];
unsigned char buf_select;
unsigned char buf_ready;
unsigned int buf_count;
unsigned int n_samples;
} inference_t;
static inference_t inference;
static bool record_ready = false;
static signed short *sampleBuffer;
static bool debug_nn = false; // Set this to true to see e.g. features generated from the raw signal
static int print_results = -(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW);
bool alert = false;
RingBuf<uint8_t, 10> last_ten_predictions;
int greenLED = 23;
int vibratorPin = 3; // Vibration motor connected to D3 PWM pin
bool is_motor_running = false;
void run_vibration()
{
if (alert)
{
is_motor_running = true;
for (int i = 0; i < 2; i++)
{
analogWrite(vibratorPin, 30);
delay(1000);
analogWrite(vibratorPin, 0);
delay(1500);
}
is_motor_running = false;
} else {
if (is_motor_running)
{
analogWrite(vibratorPin, 0);
}
}
yield();
}
/**
@brief Printf function uses vsnprintf and output using Arduino Serial
@param[in] format Variable argument list
*/
void ei_printf(const char *format, ...) {
static char print_buf[1024] = { 0 };
va_list args;
va_start(args, format);
int r = vsnprintf(print_buf, sizeof(print_buf), format, args);
va_end(args);
if (r > 0) {
Serial.write(print_buf);
}
}
/**
@brief PDM buffer full callback
Get data and call audio thread callback
*/
static void pdm_data_ready_inference_callback(void)
{
int bytesAvailable = PDM.available();
// read into the sample buffer
int bytesRead = PDM.read((char *)&sampleBuffer[0], bytesAvailable);
if (record_ready == true) {
for (int i = 0; i<bytesRead >> 1; i++) {
inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[i];
if (inference.buf_count >= inference.n_samples) {
inference.buf_select ^= 1;
inference.buf_count = 0;
inference.buf_ready = 1;
}
}
}
}
/**
@brief Init inferencing struct and setup/start PDM
@param[in] n_samples The n samples
@return { description_of_the_return_value }
*/
static bool microphone_inference_start(uint32_t n_samples)
{
inference.buffers[0] = (signed short *)malloc(n_samples * sizeof(signed short));
if (inference.buffers[0] == NULL) {
return false;
}
inference.buffers[1] = (signed short *)malloc(n_samples * sizeof(signed short));
if (inference.buffers[0] == NULL) {
free(inference.buffers[0]);
return false;
}
sampleBuffer = (signed short *)malloc((n_samples >> 1) * sizeof(signed short));
if (sampleBuffer == NULL) {
free(inference.buffers[0]);
free(inference.buffers[1]);
return false;
}
inference.buf_select = 0;
inference.buf_count = 0;
inference.n_samples = n_samples;
inference.buf_ready = 0;
// configure the data receive callback
PDM.onReceive(&pdm_data_ready_inference_callback);
PDM.setBufferSize((n_samples >> 1) * sizeof(int16_t));
// initialize PDM with:
// - one channel (mono mode)
// - a 16 kHz sample rate
if (!PDM.begin(1, EI_CLASSIFIER_FREQUENCY)) {
ei_printf("Failed to start PDM!");
}
// set the gain, defaults to 20
PDM.setGain(127);
record_ready = true;
return true;
}
/**
@brief Wait on new data
@return True when finished
*/
static bool microphone_inference_record(void)
{
bool ret = true;
if (inference.buf_ready == 1) {
ei_printf(
"Error sample buffer overrun. Decrease the number of slices per model window "
"(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)\n");
ret = false;
}
while (inference.buf_ready == 0) {
delay(1);
}
inference.buf_ready = 0;
return ret;
}
/**
Get raw audio signal data
*/
static int microphone_audio_signal_get_data(size_t offset, size_t length, float * out_ptr)
{
numpy::int16_to_float(&inference.buffers[inference.buf_select ^ 1][offset], out_ptr, length);
return 0;
}
/**
@brief Stop PDM and release buffers
*/
static void microphone_inference_end(void)
{
PDM.end();
free(inference.buffers[0]);
free(inference.buffers[1]);
free(sampleBuffer);
}
void setup()
{
Serial.begin(115200);
pinMode(greenLED, OUTPUT);
pinMode(greenLED, LOW);
pinMode(vibratorPin, OUTPUT); // sets the pin as output
// summary of inferencing settings (from model_metadata.h)
ei_printf("Inferencing settings:\n");
ei_printf("\tInterval: %.2f ms.\n", (float)EI_CLASSIFIER_INTERVAL_MS);
ei_printf("\tFrame size: %d\n", EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE);
ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);
ei_printf("\tNo. of classes: %d\n", sizeof(ei_classifier_inferencing_categories) /
sizeof(ei_classifier_inferencing_categories[0]));
run_classifier_init();
if (microphone_inference_start(EI_CLASSIFIER_SLICE_SIZE) == false) {
ei_printf("ERR: Failed to setup audio sampling\r\n");
return;
}
Scheduler.startLoop(run_vibration);
}
void loop()
{
bool m = microphone_inference_record();
if (!m) {
ei_printf("ERR: Failed to record audio...\n");
return;
}
signal_t signal;
signal.total_length = EI_CLASSIFIER_SLICE_SIZE;
signal.get_data = µphone_audio_signal_get_data;
ei_impulse_result_t result = {0};
EI_IMPULSE_ERROR r = run_classifier_continuous(&signal, &result, debug_nn);
if (r != EI_IMPULSE_OK) {
ei_printf("ERR: Failed to run classifier (%d)\n", r);
return;
}
if (++print_results >= (EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)) {
// print the predictions
ei_printf("Predictions ");
ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
result.timing.dsp, result.timing.classification, result.timing.anomaly);
ei_printf(": \n");
for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
ei_printf(" %s: %.5f\n", result.classification[ix].label,
result.classification[ix].value);
if (ix == 1 && !is_motor_running && result.classification[ix].value > 0.9) {
if (last_ten_predictions.isFull()) {
uint8_t k;
last_ten_predictions.pop(k);
}
last_ten_predictions.push(ix);
uint8_t count = 0;
for (uint8_t j = 0; j < last_ten_predictions.size(); j++) {
count += last_ten_predictions[j];
//ei_printf("%d, ", last_ten_predictions[j]);
}
//ei_printf("\n");
ei_printf("Snoring\n");
pinMode(greenLED, HIGH);
if (count >= 5) {
ei_printf("Trigger vibration motor\n");
alert = true;
}
} else {
ei_printf("Noise\n");
pinMode(greenLED, LOW);
alert = false;
}
print_results = 0;
}
}
}
#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_MICROPHONE
#error "Invalid model for current sensor."
#endif
To run the inferencing sketch, clone the application repository using the command below.
$ git clone https://github.com/metanav/Snoring_Guardian.git
Import the library bundle Snoring_Guardian/Snoring_detection_inferencing.zip using the menu Sketch > Include Library > Add.ZIP Library in the Arduino IDE. Open the inferencing sketch by navigating the menu File > Examples > Snoring_detection_inferencing > tflite_micro_snoring_detection and compile/upload the firmware to the connected development board. We can see the inferencing output using the Tools > Serial Monitor with baud rate 115200 bps.
DemoCasingThe final version device is placed inside a pouch bag with a power bank. There is a small opening in the pouch bag which allows sound to be heard by the microphone which is positioned near the opening.
This project presents a solution for a real life problem which seems funny but needs careful attention. It is an easy-to-use and convenient device which respects users privacy by running the inferencing at the edge. This project also showcases that a simple neural network can be used to solve complex problem with signal processing done in the right way and can be run on a low-powered resource-constrained tiny devices. Although the TensorFlow Lite Micro model runs quite well but there is still room for improvement. With more training data the model can be made more accurate and robust.
This project was created for the TensorFlow Microcontroller Challenge. I would like to thank Google's TensorFlow Micro team to provide me a complimentary Google I/O Kit supplied by Sparkfun Electronics.
Comments