Humans can detect sounds in a frequency range from about 20 Hz to 20 KHz. The maximum hearing frequency can reduce as people age. Animals, on the other hand, are able to communicate and detect sounds out of this frequency range. Elephants, for example, communicate with infrasound, sound below the human hearing range, whereas animals such as bats and dolphins use ultrasound, sound above the human hearing range, for echolocation. By providing humans with the sense of detecting infrasound and ultrasound, they could be able to feel the sounds when animals communicate or echolocate. Not only this could expand the human sense, but this could also make humans feel more connected to the animals around them and nature.
Aim/ ObjectiveBy providing humans with the sense of detecting infrasound and ultrasound, they could be able to feel the sounds when animals communicate. Not only this could expand the human sense, but this could also make humans feel more connected to the animals around them and enjoy nature. For this idea, I will be using the PDM Microphone sound sensor to detect infrasound and ultrasound. I will train a machine learning model, using Edge Impusle, to classify between different animal sounds and let the user know about them. The machine learning model will be trained with various type of sounds at different situations. More datasets will increase the accuracy. The machine learning model would then be deployed to the Nordic Semiconductor nRF5340 DK development board to classify the sounds from real-time audio. The result will be displayed in the Adafruit Display. People using the wearable will be able to feel infrasound and ultrasound as vibrations with the help of haptic motors. As far as I know, there isn't such solution to this problem, so I think my idea is unique and practical. The wearable can be updated and the machine learning model could also be trained using more data afterwards. This is helpful as it makes people feel more connected to Nature and it also expands the human hearing range. They will be able to sense sounds out of their hearing range and would also be able to feel echolocation and infrasound communication.
Project descriptionThe Nordic Semiconductor development board and the other components will be assembled as a wearable which can be work by the user. The device will be able to classify between different animal sounds from real-time audio and the result will be displayed in the Adafruit display. You will receive haptic feedback from the motors as soon as the wearable detects a sound and starts classifying. The wearable will be powered by a LiPo battery. The wearable will be designed to be less bulky and easy to wear.
Interfacing Nordic Semiconductor nRF5340 DK with Edge ImpulseI have followed the guide from Edge Impulse to interface Nordic Semiconductor nRF5340 DK with Edge Impulse.
The guide can be found here: https://docs.edgeimpulse.com/docs/nordic-semi-nrf5340-dk
The Nordic Semiconductor nRF5340 DK is a development board with dual Cortex-M33 microcontrollers, QSPI flash, and an integrated BLE radio - and it's fully supported by Edge Impulse. You'll be able to sample raw data, build models, and deploy trained machine learning models directly from the studio. As the nRF5340 DK does not have any built-in sensors we recommend you to pair this development board with the X-NUCLEO-IKS02A1 shield (with a MEMS accelerometer and a MEMS microphone). The nRF5340 DK is available for around 50 USD from a variety of distributors including Digikey and Mouser.
If you don't have the X-NUCLEO-IKS02A1 shield you can use the Data forwarder to capture data from any other sensor, and then follow the Running your impulse locally: On your Zephyr-based Nordic Semiconductor development board tutorial to run your impulse. Or, you can modify the example firmware (based on nRF Connect) to interact with other accelerometers or PDM microphones that are supported by Zephyr. See the firmware repository for more information.
The Edge Impulse firmware for this development board is open source and hosted on GitHub: edgeimpulse/firmware-nrf52840-5340. [source]
Installing dependencies[source]
To set this device up in Edge Impulse, you will need to install the following software:
- Edge Impulse CLI
- On Linux:GNU Screen: install for example via
sudo apt install screen
.
🚧Problems installing the CLISee the Installation and troubleshooting guide.Connecting to Edge Impulse
[source]
With all the software in place it's time to connect the development board to Edge Impulse.
1. Plugging in the X-NUCLEO-IKS02A1 MEMS expansion shield[source]
Remove the pin header protectors on the nRF5340 DK and plug the X-NUCLEO-IKS02A1 shield into the development board.
Note: Make sure that the shield does not touch any of the pins in the middle of the development board. This might cause issues when flashing the board or running applications.
2. Connect the development board to your computer[source]
Use a micro-USB cable to connect the development board to your computer. There are two USB ports on the development board, use the one on the short side of the board. Then, set the power switch to 'on'.
3. Update the firmware[source]
The development board does not come with the right firmware yet. To update the firmware:
- The development board is mounted as a USB mass-storage device (like a USB flash drive), with the name
JLINK
. Make sure you can see this drive. - Download the latest Edge Impulse firmware.
- Drag the
nrf5340-dk.bin
file to theJLINK
drive. - Wait 20 seconds and press the BOOT/RESET button.
[source]
From a command prompt or terminal, run:
edge-impulse-daemon
This starts a wizard which asks you to log in and choose an Edge Impulse project. If you want to switch projects run the command with --clean
.
Alternatively, recent versions of Google Chrome and Microsoft Edge can collect data directly from your development board, without the need for the Edge Impulse CLI. See this blog post for more information.
5. Verifying that the device is connected[source]
That's all! Your device is now connected to Edge Impulse. To verify this, go to your Edge Impulse project, and click Devices. The device will be listed here.
Next steps: building a machine learning model[source]
With everything set up you can now build your first machine learning model with these tutorials:
TroubleshootingFailed to flash
[source]
If your board fails to flash new firmware (a FAIL.txt
file might appear on the JLINK
drive) you can also flash using nrfjprog
.
- Install the nRF Command Line Tools.
- Flash new firmware via:
nrfjprog --program path-to-your.bin -f NRF53 --sectoranduicrerase
Recognizing sounds from audioI followed the guide from Edge Impulse to perform this task.
The guide can be found here: https://docs.edgeimpulse.com/docs/audio-classification
In this tutorial, you'll use machine learning to build a system that can recognize when a particular sound is happening—a task known as audio classification. The system you create will be able to recognize the sound of water running from a faucet, even in the presence of other background noise.
You'll learn how to collect audio data from microphones, use signal processing to extract the most important information, and train a deep neural network that can tell you whether the sound of running water can be heard in a given clip of audio. Finally, you'll deploy the system to an embedded device and evaluate how well it works.
At the end of this tutorial, you'll have a firm understanding of how to classify audio using Edge Impulse.
There is also a video version of this tutorial:
1. Prerequisites[source]
Follow the steps to connect your development board to Edge Impulse. If your device is connected under Devices in the studio you can proceed.
2. Collecting your first dataTo build this project, you'll need to collect some audio data that will be used to train the machine learning model. Since the goal is to detect the sound of animal sounds, you'll need to collect some examples of that. You'll also need some examples of typical background noise that doesn't contain the sound of animals, so the model can learn to discriminate between the two. These two types of examples represent the two classes we'll be training our model to detect: background noise or animal.
You can use your device to collect some data. In the studio, go to the Data acquisition tab. This is the place where all your raw data is stored, and - if your device is connected to the remote management API - where you can start sampling new data.
Let's start by recording an example of background noise that doesn't contain the sound of a running faucet. Under Record new data, select your device, set the label to noise
, the sample length to 1000
, and the sensor to Built-in microphone
. This indicates that you want to record 1 second of audio, and label the recorded data as noise
. You can later edit these labels if needed.
After you click Start sampling, the device will capture a second of audio and transmit it to Edge Impulse. The LED will light while recording is in progress, then light again during transmission.
When the data has been uploaded, you will see a new line appear under 'Collected data'. You will also see the waveform of the audio in the 'RAW DATA' box. You can use the controls underneath to listen to the audio that was captured.
3. Build a datasetSince you now know how to capture audio with Edge Impulse, it's time to start building a dataset. For a simple audio classification model like this one, we should aim to capture around 10 minutes of data. We have two classes, and it's ideal if our data is balanced equally between each of them. This means we should aim to capture the following data:
- 5 minutes of background noise, with the label "noise"
- 5 minutes of animal sound, with the label "animal"
In the real world, there are usually additional sounds present alongside the sounds we care about. For example, an animal sound is often accompanied by the sound of nature. Background noise might also include the sounds of television, kids playing, or cars driving past outside.
It's important that your training data contains these types of real world sounds. If your model is not exposed to them during training, it will not learn to take them into account, and it will not perform well during real-world usage.
Data capture and transmission[source]
The amount of audio that can be captured in one go varies depending on a device's memory. The ST B-L475E-IOT01A developer board has enough memory to capture 60 seconds of audio at a time, and the Arduino Nano 33 BLE Sense has enough memory for 16 seconds. To capture 60 seconds of audio, set the sample length to 60000
. Because the board transmits data quite slowly, it will take around 7 minutes before a 60 second sample appears in Edge Impulse.
Once you've captured around 10 minutes of data, it's time to start designing an Impulse.
4. Design an Impulse[source]
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the "MFCC" signal processing block. MFCC stands for Mel Frequency Cepstral Coefficients. This sounds scary, but it's basically just a way of turning raw audio—which contains a large amount of redundant information—into simplified form.
🚧Spectrogram blockEdge Impulse supports three different blocks for audio classification: MFCC, MFE and spectrogram blocks. If your accuracy is not great using the MFCC block you can switch to the spectrogram block, which typically does a better job at non-voice audio (but leads to slightly bigger models).
We'll then pass this simplified audio data into a Neural Network block, which will learn to distinguish between the two classes of audio (animal sound and noise).
In the studio, go to the Create impulse tab. You'll see a Raw data block.
As mentioned above, Edge Impulse slices up the raw samples into windows that are fed into the machine learning model during training. The Window size field controls how long, in milliseconds, each window of data should be. A one second audio sample will be enough to determine whether it is an animal sound or not, so you should make sure Window size is set to 1000 ms. You can either drag the slider or type a new value directly.
Each raw sample is sliced into multiple windows, and the Window increase field controls the offset of each subsequent window from the first. For example, a Window increase value of 1000 ms would result in each window starting 1 second after the start of the previous one.
By setting a Window increase that is smaller than the Window size, we can create windows that overlap. This is actually a great idea. Although they may contain similar data, each overlapping window is still a unique example of audio that represents the sample's label. By using overlapping windows, we can make the most of our training data. For example, with a Window size of 1000 ms and a Window increase of 100 ms, we can extract 10 unique windows from only 2 seconds of data.
Make sure the the Window increase field is set to 300 ms. The Raw data block should match the screenshot above.
Next, click Add a processing block and choose the 'MFCC' block. Once you're done with that, click Add a learning block and select 'Neural Network (Keras)'. Finally, click Save impulse.
5. Configure the MFCC blockNow that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFCC tab in the left hand navigation menu.
This page allows you to configure the MFCC block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFCC's output for a piece of audio, which is known as a spectrogram.
The MFCC block transforms a window of audio into a table of data where each row represents a range of frequencies and each column represents a span of time. The value contained within each cell reflects the amplitude of its associated range of frequencies during that span of time. The spectrogram shows each cell as a colored block, the intensity which varies depends on the amplitude.
The patterns visible in a spectrogram contain information about what type of sound it represents.
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, and drag the white window on the audio waveform to select different windows of data.
Handily, Edge Impulse provides sensible defaults that will work well for many use cases, so we can leave these values unchanged.
The spectrograms generated by the MFCC block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFCC blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. If you have a full 10 minutes of data, the process will take a while to complete.
Once this process is complete the feature explorer shows a visualization of your dataset. Here dimensionality reduction is used to map your features onto a 3D space, and you can use the feature explorer to see if the different classes separate well, or find mislabeled data (if it shows in a different cluster). You can find more information in visualizing complex datasets.
Next, we'll configure the neural network and begin training.
6. Configure the neural network[source]
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFCC as an input, and try to map this to one of two classes—noise or animal.
Click on NN Classifier in the left hand menu.
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFCC spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents noise, and the probability that the input represents an animal sound.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras.
Before you begin training, you should change some values in the configuration. First, set the Number of training cycles to 300. This means the full set of data will be run through the neural network 300 times during training. If too few cycles are run, the network won't have learned everything it can from the training data. However, if too many cycles are run, the network may start to memorize the training data and will no longer perform well on data it has not seen before. This is called overfitting.
Next, you should change the Minimum confidence rating to 0.7. This means that when the neural network makes a prediction (for example, that there is 0.8 probability that some audio contains animal sound) Edge Impulse will disregard it unless it is above the threshold of 0.7.
To begin training, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Last training performance panel appear at the bottom of the page.
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 80% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (an Arm Cortex-M4F running at 80MHz). Peak memory usage gives an idea of how much RAM will be required to run the model on-device.
7. Classifying new data[source]
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Edge Impulse provides some helpful tools for testing our model, including a way to capture live data from your device and immediately attempt to classify it. To try it out, click on Live classification in the left hand menu. Your device should show up in the 'Classify new data' panel. Capture 5 seconds of background noise by clicking Start sampling.
The sample will be captured, uploaded, and classified. Once this has happened, you'll see a breakdown of the results.
Once the sample is uploaded, it is split into windows. These windows are then classified.
Of course, it's possible some of the windows may be classified incorrectly. If your model didn't perform perfectly, don't worry. We'll get to troubleshooting later.
🚧Misclassifications and uncertain resultsIt's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.8. Model testing
[source]
Using the Live classification tab, you can easily try out your model and get an idea of how it performs. But to be really sure that it is working well, we need to do some more rigorous testing. That's where the Model testing tab comes in. If you open it up, you'll see the sample we just captured listed in the Test data panel.
In addition to its training data, every Edge Impulse project also has a test dataset. Samples captured in Live classification are automatically saved to the test dataset, and the Model testing tab lists all of the test data.
To use the sample we've just captured for testing, we should correctly set its expected outcome. Click the ⋮
icon and select Edit expected outcome, then enter noise
. Now, select the sample using the checkbox to the left of the table and click Classify selected.
You'll see that the model's accuracy has been rated based on the test data. Right now, this doesn't give us much more information that just classifying the same sample in the Live classification tab. But if you build up a big, comprehensive set of test samples, you can use the Model testing tab to measure how your model is performing on real data.
Ideally, you'll want to collect a test set that contains a minimum of 25% the amount of data of your training set. So, if you've collected 10 minutes of training data, you should collect at least 2.5 minutes of test data. You should make sure this test data represents a wide range of possible conditions, so that it evaluates how the model performs with many different types of inputs. For example, collecting test audio for several different animals is a good idea.
You can use the Data acquisition tab to manage your test data. Open the tab, and then click Test data at the top. Then, use the Record new data panel to capture a few minutes of test data, including audio for both background noise and animal sound. Make sure the samples are labelled correctly. Once you're done, head back to the Model testing tab, select all the samples, and click Classify selected.
Samples that contain a lot of misclassifications are valuable, since they have examples of types of audio that our model does not currently fit. It's often worth adding these to your training data, which you can do by clicking the ⋮
icon and selecting Move to training set. If you do this, you should add some new test data to make up for the loss!
Testing your model helps confirm that it works in real life, and it's something you should do after every change. However, if you often make tweaks to your model to try to improve its performance on the test dataset, your model may gradually start to overfit to the test dataset, and it will lose its value as a metric. To avoid this, continually add fresh data to your test dataset.
❗️Data hygieneIt's extremely important that data is never duplicated between your training and test datasets. Your model will naturally perform well on the data that it was trained on, so if there are duplicate samples then your test results will indicate better performance than your model will achieve in the real world.9. Model troubleshooting
[source]
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
- The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by adding the correct label in the 'Expected outcome' field, clicking
⋮
, then selecting Copy to training set. - The model has not been trained enough. Increase number of epochs to
200
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample'). - The model is overfitting and thus performs poorly on new data. Try reducing the number of epochs, reducing the learning rate, or adding more data.
- The neural network architecture is not a great fit for your data. Play with the number of layers and neurons and see if performance improves.
As you see, there is still a lot of trial and error when building neural networks. Edge Impulse is continually adding features that will make it easier to train an effective model.
10. Deploying to your deviceImpulses can be deployed as a C++ library. This packages all your signal processing blocks, configuration and learning blocks up into a single package. You can include this package in your own application to run the impulse locally.
I have followed this tutorial to deploy the Machine learning model to my Nordic Semiconductor nRF5340 DK development board.
Improvements in futureI was unable to purchase the haptic motors to provide feedback via my wearable, so I hope to add this when I upgrade this wearable.
Measure power consumption and modify this wearable to perform as a low-power consumption device.
Comments