Published November 1, 2021 © MIT

ML with Raspberry Pi and Infineon MEMS Microphone

We use Edge Impulse to enable keyword detection on a Raspberry PI with an Infineon IM69D130 MEMS Microphone.

IntermediateFull instructions provided4 hours1,885

ML with Raspberry Pi and Infineon MEMS Microphone

Things used in this project

Hardware components

Raspberry Pi 4 Model B

Infineon S2GO MEMSMIC IM69D MEMS Microphone

Infineon Shield2Go Adapter for Raspberry Pi

Software apps and online services

Raspberry Pi Raspbian

Edge Impulse Studio

Story

Getting ready

Before we can use the MEMS Microphone together with the Raspberry Pi we need to set up some Hard- and Software components. Please have a look at this guide on hackster.io to get everything working. When you return, you should have a Raspberry Pi with a Raspian OS and a working microphone.

The hardware connection should be as follows: Either use the Infineon Shield 2 Go or make your own custom connection with wires:

1 / 2

From now on, we assume you are in a command prompt on your Raspberry Pi, either via ssh or on a monitor. The Raspbery Pi also needs an internet connection during the development phase of the AI. After deploying back to your device, you can disconnect it from the internet We also assume you have basic knowledge of a command line environment (changing directory, editing files etc.)

Creating an account and connecting your device

Since we are going to use Edge Impulse for this project, the first step is going to be registering an account on their webpage.

After that, we first need to install some dependencies on the Raspberry Pi:

curl -sL https://deb.nodesource.com/setup_12.x | sudo bash -
sudo apt install -y \
    gcc g++ make build-essential nodejs sox gstreamer1.0-tools \
    gstreamer1.0-plugins-good gstreamer1.0-plugins-base \
    gstreamer1.0-plugins-base-apps
npm config set user root && sudo npm install edge-impulse-linux -g --unsafe-perm

Before executing the first command, it is always important to have a brief look at a script you want to install and have downloaded from the web. Make sure it is indeed what you want it to be and does not compromise your system.

The next step is connecting your device to Edge Impulse:

edge-impulse-linux --disable-camera

Since we only want to use a microphone, we don't want to get prompted for a camera selection.

When asked for your Edge Impulse account credentials, submit them and after that, select the snd_rpi_i2s_car - snd_rpi_i2s_card microphone.

If you ever want to reset your local configuration on the Raspberry Pi (for example if you selected the wrong microphone), the command is:

edge-impulse-linux --disable-camera --clean

This will start the same prompt as before, but reset everything you already specified. No ML data is lost however.

After setting up the connection on the Raspberry Pi, verify it in your Edge Impulse Page.

Log into your account and locate the Raspberry Pi under Devices.

The Raspberry Pi is listed as an online device.

Collecting Data

Now that we have our device set up, we need to think about the data for our keyword detection AI. We should use a keyword, that has at least three syllables, like "Hey Siri, OK Google" etc. We selected "Hello World" as our phrase, since it is not that common in a normal conversation and has three distinct syllables.

We want to distinguish our keyword from two other types of audio: Noise and unknown other audio. Therefore we need 3 different types of samples.

For the noise and unknown audio we can use the dataset provided by Edge Impulse. It contains more than enough noise and unknown samples for our purpose. Download the file and unzip it. To import the data into Edge Impulse, go to "Data acquisition" in your project page and click on the upload button next to "Collect Data".

The upload data button is located next to the data overview

We noticed that 3 to 4 minutes of each sample type are enough for our basic model. Since each file contains one second of audio, we now want to select about 200 files from the unknown category.

Tick the "Training" option under "Upload into category" (we will later rebalance our set ourselves). The label can be inferred from the filename.

Then, begin the upload.

Repeat the process for the noise samples.

You now should have about 7 minutes of audio in your database. Now we need to upload recordings of our keyword. This process can get a bit tedious since you will need 3 to 4 minutes of a real person saying your keyword.

Lets start with recording some samples through your Raspberry Pi and the MEMS Microphone.

While the edge-impulse client is running on your Raspberry Pi, you should see a "Record new data" section in the "Data Aquisition tab". Set the label to "helloworld", the length to 10000ms and the frequency to 16000Hz. When you press "Start sampling", the Raspberry Pi will start to record for 10 seconds and then upload the audio to edge-impulse. Repeat "Hello World" into the microphone. Speak like you would normally say these words.

After the recording is finished, you will see a new sample under collected data. Click the three dots next to it and select "Split sample". Now you can split up the recording into 1 second long chunks of only your keyword. Make sure, your samples are all 1 second long, otherwise errors might occur later.

The keywords should be easily recognizable from the silence in between. If edge-impulse does not detect every instance you say, just add segments with the button in the top left corner.

The splitting sample interface

Here you might encounter the problem, that your recorded samples are very low volume (for me, the audio visualisation showed around +- 100, whereas the "unknown" datasets are around +-20000). After several tries to fix this in the audio setings on the Raspberry Pi, it looks like edge-impulse overwrites these settings for it's recordings. However, it turns out, that this is not a big problem, as you'll see at the end.

To increase your dataset, you should best include as many different voices as possible, so ask friends and family to send you e.g. voice-chats of them saying "Hello World". You can then upload these samples directly in the "Data Aquisition" panel, the same way you did for the "noise" and "unknown" samples. You should also split these samples afterwards.

There is however a catch to uploading e.g. voice-chats: The audio-file settings like sample rate etc. must be the same, as for the other files. On Linux, you can use this command to convert them to the right settings:

ffmpeg -i <original_voice_chat_file> -map_channel 0.0.0 -b:a 256k -ar 16000 <file_to_upload.wav>

This converts <original_voice_chat_file> to <file_to_upload.wav>. -map_channel 0.0.0 -b:a 256k -ar 16000 sets the new file to be mono-audio, the bitrate to 256k and the frequency to 16000 in that order. Make sure to create a .wav file by naming your output file that way.

On a Windows machine, you can use an online audio converter or install ffmpeg for windows.

When you collected around 3-4 Minutes of your keyword, goto the "Dashboard" and rebalance your dataset (all the way at the bottom). This will mark some samples for testing, which the AI won't see until it is fully trained. This is an important measure to detect overfitting to your dataset.

Impulse Design

Now we want to create our machine learning process, that takes in our data and puts out a model able to detect our keyword.

In edge-impulse, this is very easy:

Go to "Impulse Design" in the browser panel. Here you can select multiple blocks, that preprocess and learn your data.

The first block "Time series data" is already selected and correctly set.

Next, we want to add the processing block "Audio (MFCC)" to extract audio features from human voice (Wikipedia provides some insights into what is happening here).

Lastly, we add the learning block "Classification (Keras)". This is the block, that actually learns to recognize the keyword.

Click "save impulse" on the right.

Now, when you open the three-bar menu on the right, you should see "MFCC" and "NN Classifier" under "Impulse Design". Click "MFCC" to go to the settings of this block. Here you can play around with the different parameters. Using the default settings works fine however.

When you're done, click on "save parameters" and you will be directed to the "Generate Features" screen. Here, you can also simply click the corresponding button. You might also want to take a look at the visualisation of your data. Especially, if something changes after the feature generation.

This process might take a while.

Learning and Testing

Now go to the "NN Classifier" panel to start the machine learning process. Here you also can leave the default settings, except for the "Add noise" feature, set it from high to low. Click on "Start training".

When the process is finished, you'll see some statistics evaluating the quality of your algorithm. The overall accuracy is actually not that important. What we are interested in, is the correct detection of the "HELLOWORLD" keyword in the confusion matrix. It should be labeled deep green and higher than 85%.

The statistical results for the model.

Now that the AI has learned your keyword, we want to test it. Go to "Model testing" and click "Classify all". The accuracy as well as the confusion matrix should not be drastically different from the ones in the classification step. If they are, you might want to collect more data or shrink your "NN classifier".

Deploying to your Raspberry Pi

Now we want to port our model onto the Raspberry Pi. Go to "Deployment" and select "Linux boards" at the bottom. Select the default optimization for the classifier and click "build". This will create a model file ready for the Raspberry Pi.

When the build process is finished, go to your Raspberry Pi and execute the command to use the built model in a edge-impulse provided example:

edge-impulse-linux-runner

This will run the model locally on the Raspberry Pi and show you the results of the live keyword detection. The numbers running down the screen show the certainty, that the AI detected one of the three different categories. Try it out by saying your keyword. The corresponding number should increase for about two to three instances.

The CLI output of the example. In the lower half, you see an increased probability for the keyword.

Congratulations, you created a complete keyword detection AI.

What's next?

To use your model in a real project, you will have to build some infrastructure around it.

Edge-impulse provides several tutorials on how to include AI models into e.g. python projects:

ML with Raspberry Pi and Infineon MEMS Microphone

Things used in this project

Hardware components

Software apps and online services

Story

Getting ready

Creating an account and connecting your device

Collecting Data

Impulse Design

Learning and Testing

Deploying to your Raspberry Pi

What's next?

Code

i2s-microphone github repository

Credits

Infineon Team

Comments

Embed the widget on your own site

ML with Raspberry Pi and Infineon MEMS Microphone

ML with Raspberry Pi and Infineon MEMS Microphone

Things used in this project

Hardware components

Software apps and online services

Story

Getting ready

Creating an account and connecting your device

Collecting Data

Impulse Design

Learning and Testing

Deploying to your Raspberry Pi

What's next?

Code

i2s-microphone github repository

Credits

Infineon Team

Comments

Related channels and tags