If you’ve been a parent for longer than 24 hours, you know the expression “sleep like a baby” is just another lie. Babies don’t sleep soundly, like, at all. They grunt and groan, cough and sigh, scooch and squirm and wiggle. Everything is fun and games until your sweet little one wakes up crying in the middle of night on a daily basis. Babies have all the liberty to wake-up and sleep whenever they want, but often their parents don't have this. This is where E-nanny comes to rescue.
WorkingE-nanny is based on a Keyword Spotting (KWS) algorithm. I know what you are thinking right now. Isn't an audio classification problem rather than KWS. I also thought the same first, but it seems KWS give more accuracy. Perhaps babies cry have similar rhythm of our natural languages. Once the "event" is detected the device will initiate a series of "counter measures" to put the baby back to sleep before the parent wake up. Beware! If the baby wakes up due to hunger or say dry napkin there is nothing my device can do and you'll have to put him back to sleep yourself. But babies more often due to far silly reasons, say a nightmare, so the device will be working in most cases.
Step 1: Collecting DataFor the Audio corpus I have completely relied on opensource sound databases like FreeSound. All audio I have used is copyright free and will be available in my GitHub page. I have collected 3 types of audio. They are:
1. Cry
2. Cough
3. Noise (which I recorded my self)
All in all, I am having around 15 mins of data. (10 mins cry, 5 mins noise, and a miniscule amount of cough)
Step 2: Cleaning DataI am using Audacity for cleaning. Since I have mainly used free audio that was recorded by someone else I made sure that no unwanted sound is included and cropped it whenever I found such cases. Also leaving long pauses between crying (babies cry like that) will create confusion for the NN. So I have also removed long pauses if anywhere present. Amplify the audio if the amplitude is weak. Then I have exported all the audio into wav files at 16kHz sampling rate (16kHz is required by Edgeimpulse for KWS). Finally I renamed the audios in the format "label.number.wav" where label can be Cry, Cough or Noise. This is done to make labelling easy in the future inside edge impulse.
Before:
After:
Now it's time for the interesting part, training your model. Since there is good amount of material on this part in Edgeimpulse site itself, I won't dive deep into explanation. If you are a beginner seeing this video will be helpful.
After creating a new project upload your audio corpus from last step.
Start by create a new impulse as following
Now got MFE option and generate feature
Now it's time to train our model
Ok, training completed and our model looks promising as data is neatly clustered. The accuracy seems not great. The main issue is cough that have miniscule amount of data. I'll try to improve the accuracy on the next part of the project.
Finally, let's deploy our model. For that download it is an Arduino library.
I am using an Esp32 and I2S microphone. But any 32 bit microcontroller and microphone will work. For interfacing inmp441 with esp32 I have taken reference of atomic14's code.
In platformIO create a new project based on any ESP32 boards. After project is created you have to unpack the zip file you downloaded from Edgeimpulse to the lib folder. (My model is attached below). The final the code is also attached below.
E-nanny in ActionI am playing a random YouTube video of baby crying on my laptop when inference was running on ESP32. As ESP32 detects baby crying it turns on the on board LED (blue LED)
Comments
Please log in or sign up to comment.