Before we can use the MEMS Microphone together with the Raspberry Pi we need to set up some Hard- and Software components. Please have a look at this guide on hackster.io to get everything working. When you return, you should have a Raspberry Pi with a Raspian OS and a working microphone.
The hardware connection should be as follows: Either use the Infineon Shield 2 Go or make your own custom connection with wires:
From now on, we assume you are in a command prompt on your Raspberry Pi, either via ssh or on a monitor. The Raspbery Pi also needs an internet connection during the development phase of the AI. After deploying back to your device, you can disconnect it from the internet We also assume you have basic knowledge of a command line environment (changing directory, editing files etc.)
Creating an account and connecting your deviceSince we are going to use Edge Impulse for this project, the first step is going to be registering an account on their webpage.
After that, we first need to install some dependencies on the Raspberry Pi:
curl -sL https://deb.nodesource.com/setup_12.x | sudo bash -
sudo apt install -y \
gcc g++ make build-essential nodejs sox gstreamer1.0-tools \
gstreamer1.0-plugins-good gstreamer1.0-plugins-base \
gstreamer1.0-plugins-base-apps
npm config set user root && sudo npm install edge-impulse-linux -g --unsafe-perm
Before executing the first command, it is always important to have a brief look at a script you want to install and have downloaded from the web. Make sure it is indeed what you want it to be and does not compromise your system.
The next step is connecting your device to Edge Impulse:
edge-impulse-linux --disable-camera
Since we only want to use a microphone, we don't want to get prompted for a camera selection.
When asked for your Edge Impulse account credentials, submit them and after that, select the snd_rpi_i2s_car - snd_rpi_i2s_card
microphone.
If you ever want to reset your local configuration on the Raspberry Pi (for example if you selected the wrong microphone), the command is:
edge-impulse-linux --disable-camera --clean
This will start the same prompt as before, but reset everything you already specified. No ML data is lost however.
After setting up the connection on the Raspberry Pi, verify it in your Edge Impulse Page.
Log into your account and locate the Raspberry Pi under Devices.
Now that we have our device set up, we need to think about the data for our keyword detection AI. We should use a keyword, that has at least three syllables, like "Hey Siri, OK Google" etc. We selected "Hello World" as our phrase, since it is not that common in a normal conversation and has three distinct syllables.
We want to distinguish our keyword from two other types of audio: Noise and unknown other audio. Therefore we need 3 different types of samples.
For the noise and unknown audio we can use the dataset provided by Edge Impulse. It contains more than enough noise and unknown samples for our purpose. Download the file and unzip it. To import the data into Edge Impulse, go to "Data acquisition" in your project page and click on the upload button next to "Collect Data".
We noticed that 3 to 4 minutes of each sample type are enough for our basic model. Since each file contains one second of audio, we now want to select about 200 files from the unknown category.
Tick the "Training" option under "Upload into category" (we will later rebalance our set ourselves). The label can be inferred from the filename.
Then, begin the upload.
Repeat the process for the noise samples.
You now should have about 7 minutes of audio in your database. Now we need to upload recordings of our keyword. This process can get a bit tedious since you will need 3 to 4 minutes of a real person saying your keyword.
Lets start with recording some samples through your Raspberry Pi and the MEMS Microphone.
While the edge-impulse client is running on your Raspberry Pi, you should see a "Record new data" section in the "Data Aquisition tab". Set the label to "helloworld", the length to 10000ms and the frequency to 16000Hz. When you press "Start sampling", the Raspberry Pi will start to record for 10 seconds and then upload the audio to edge-impulse. Repeat "Hello World" into the microphone. Speak like you would normally say these words.
After the recording is finished, you will see a new sample under collected data. Click the three dots next to it and select "Split sample". Now you can split up the recording into 1 second long chunks of only your keyword. Make sure, your samples are all 1 second long, otherwise errors might occur later.
The keywords should be easily recognizable from the silence in between. If edge-impulse does not detect every instance you say, just add segments with the button in the top left corner.
Here you might encounter the problem, that your recorded samples are very low volume (for me, the audio visualisation showed around +- 100, whereas the "unknown" datasets are around +-20000). After several tries to fix this in the audio setings on the Raspberry Pi, it looks like edge-impulse overwrites these settings for it's recordings. However, it turns out, that this is not a big problem, as you'll see at the end.
To increase your dataset, you should best include as many different voices as possible, so ask friends and family to send you e.g. voice-chats of them saying "Hello World". You can then upload these samples directly in the "Data Aquisition" panel, the same way you did for the "noise" and "unknown" samples. You should also split these samples afterwards.
There is however a catch to uploading e.g. voice-chats: The audio-file settings like sample rate etc. must be the same, as for the other files. On Linux, you can use this command to convert them to the right settings:
ffmpeg -i <original_voice_chat_file> -map_channel 0.0.0 -b:a 256k -ar 16000 <file_to_upload.wav>
This converts <original_voice_chat_file>
to <file_to_upload.wav>
. -map_channel 0.0.0 -b:a 256k -ar 16000
sets the new file to be mono-audio, the bitrate to 256k and the frequency to 16000 in that order. Make sure to create a .wav
file by naming your output file that way.
On a Windows machine, you can use an online audio converter or install ffmpeg for windows.
When you collected around 3-4 Minutes of your keyword, goto the "Dashboard" and rebalance your dataset (all the way at the bottom). This will mark some samples for testing, which the AI won't see until it is fully trained. This is an important measure to detect overfitting to your dataset.
Impulse DesignNow we want to create our machine learning process, that takes in our data and puts out a model able to detect our keyword.
In edge-impulse, this is very easy:
Go to "Impulse Design" in the browser panel. Here you can select multiple blocks, that preprocess and learn your data.
The first block "Time series data" is already selected and correctly set.
Next, we want to add the processing block "Audio (MFCC)" to extract audio features from human voice (Wikipedia provides some insights into what is happening here).
Lastly, we add the learning block "Classification (Keras)". This is the block, that actually learns to recognize the keyword.
Click "save impulse" on the right.
Now, when you open the three-bar menu on the right, you should see "MFCC" and "NN Classifier" under "Impulse Design". Click "MFCC" to go to the settings of this block. Here you can play around with the different parameters. Using the default settings works fine however.
When you're done, click on "save parameters" and you will be directed to the "Generate Features" screen. Here, you can also simply click the corresponding button. You might also want to take a look at the visualisation of your data. Especially, if something changes after the feature generation.
This process might take a while.
Learning and TestingNow go to the "NN Classifier" panel to start the machine learning process. Here you also can leave the default settings, except for the "Add noise" feature, set it from high to low. Click on "Start training".
When the process is finished, you'll see some statistics evaluating the quality of your algorithm. The overall accuracy is actually not that important. What we are interested in, is the correct detection of the "HELLOWORLD" keyword in the confusion matrix. It should be labeled deep green and higher than 85%.
Now that the AI has learned your keyword, we want to test it. Go to "Model testing" and click "Classify all". The accuracy as well as the confusion matrix should not be drastically different from the ones in the classification step. If they are, you might want to collect more data or shrink your "NN classifier".
Deploying to your Raspberry PiNow we want to port our model onto the Raspberry Pi. Go to "Deployment" and select "Linux boards" at the bottom. Select the default optimization for the classifier and click "build". This will create a model file ready for the Raspberry Pi.
When the build process is finished, go to your Raspberry Pi and execute the command to use the built model in a edge-impulse provided example:
edge-impulse-linux-runner
This will run the model locally on the Raspberry Pi and show you the results of the live keyword detection. The numbers running down the screen show the certainty, that the AI detected one of the three different categories. Try it out by saying your keyword. The corresponding number should increase for about two to three instances.
Congratulations, you created a complete keyword detection AI.
What's next?To use your model in a real project, you will have to build some infrastructure around it.
Edge-impulse provides several tutorials on how to include AI models into e.g. python projects:
Comments