In this project, I used the LAUNCHXL-CC1352P1 development board from Texas Instruments to perform inference on a keyword spotting TinyML model trained and quantized using EdgeImpulse's web interface. The model was trained on a very small, handmade dataset to identify the words "stop" and "go" and background noise. This model is integrated with the TI RSLK-MAX robot to create a simple audio-based controller. Watch the video below for a quick demonstration and overview.
Device SetupBefore creating and training a model, we must set up the hardware and software prerequisites. The following links will are all that is required to set up the hardware and development environment:
- LAUNCHXL-CC1352 hardware setup and firmware installation - https://docs.edgeimpulse.com/docs/ti-launchxl.
- Install TI's UniFlash (to flash the program to the CC1352), - https://www.ti.com/tool/UNIFLASH.
- Install TI's Code Composer Studio (CCS) for some compiler dependencies (Note: I didn't actually use the CCS GUI for compiling and deployment, but this is possible to do) - https://www.ti.com/tool/CCSTUDIO.
- Install TI's SimpleLink CC13x2 SDK to make it easier to interface with the CC1352 and the CC3200AUDBOOST (a lot easier than pure register-level programming!) - https://www.ti.com/tool/download/SIMPLELINK-CC13X2-26X2-SDK. There is also an "i2secho" project that uses FreeRTOS included in the SDK examples that is useful for understanding how the CC1352 interfaces with the CC3200AUDBOOST.
EdgeImpulse (https://edgeimpulse.com) is a web-based service that provides simple dataset creation/segmentation, model creation and training, and TinyML model deployment for a variety of devices. The proejct I used can be found at https://studio.edgeimpulse.com/public/75538/latest.
To create our model, we first need to create the dataset. For the noise, I recorded one minute of background noise with quiet music to (ideally) simulate varied frequency noise in the background and another minute of pure background noise. For the "stop" keyword, I recorded one minute in two 30-second segments (due to memory limitations on the CC1352) while saying the word in varying tones and speeds. I followed a similar approach for recording one minute for the "go" keyword. All three classes of data were recorded directly on the CC1352 device to ensure that the data is representative of the quality that will be used for live inferencing on-device. I ended up with 3 minutes and 22 seconds worth of audio -- a really small amount compared to most datasets! This certainly limited the performance of the model, but I was too excited to deploy it to record any more!
Once the data is recorded, it needs to be segmented into small chunks around the keywords and split into a train and test dataset. EdgeImpulse does a pretty strong job of segmenting by default and, for the most part, manual intervention and selection is not necessary. I used the default segment length of 1000ms for all of my segments. However, in retrospect, a smaller, tighter bound may have allowed the model to learn slightly better. We will also see later during the model testing that a tighter bound will show better results when using a small window as some of the windows within a keyword segment will truly be noise, but the true label for it will be a keyword leading to an "incorrect" classification and an artifically lower accuracy.
After segmenting the data, I let EdgeImpulse automatically split the dataset into 80% train, 20% test for each of the three classes. Since the dataset is so small, the percentages won't be exact, but it'll be close enough.
The next step is to design an "impulse" (EdgeImpulse's term for the data pipeline through preprocessing and the neural network). I'll skip over the hours and hours I spent trying to design and find a model that I could get to fit on the CC1352 and just go straight into final model I settled on that had good accuracy and fit on the device. One option for creating a model is to use the EON Tuner to run a fairly intensive search for models and then sort by RAM usage or latency to find a model that is fairly real-time (latency less than your window size) and fits in the CC1352's tiny 80KB SRAM. However, do note that these models will generally take windows of the full size of the signals that you segmented, and you very likely will not be able to record and analyze a full second worth of audio at a time.
For my model, I settled on a window size of 350ms and a window increase of 210ms. This small of a window size allowed it to easily fit on my device (and I probably could have increased up to 400ms after some final optimizations of the device program post-training). When you're loading the model onto the device, there are a few scenarios that will tell you that your model is too large:
- When compiling, the compiler will generate an error that you have overrun the SRAM by X KB.
- The CC1352 only runs as many inferences as you have input buffers and needs to be reset to run more. This means your model is taking too long to run an inference.
Remember that you will need to allocate twice as many bytes for each I2S input buffer as is required for one inference since the data from the onboard microphone is an int16 datatype (signed 16-bit integer). This means that each buffer must allocate 11.2 KB for a 350ms window sampling at 16 KHz (16 KHz * 0.35s = 5.6 KB * 2 = 11.2KB). Four buffers means you're already over 50% SRAM allocation without even considering the model or any of the overhead of reading from the mic! This is where inference latency comes into play as you want your model to run fast enough so that you need as few buffers as possible to drastically reduce your SRAM usage.
After running the EON Tuner, I selected a model that uses the MFE processing block and reduced the number of filters from 32 to 40 to reduce the memory usage just a little bit more. We can then generate the features from the processing block and continue to the neural network
Now we just need to set up the neural network before training and deployment. For a simple keyword-spotting model, we don't need too many layers and want to use mostly convolutional except for the final layer to keep the parameters down. The model I ended up going with (one generated by EON Tuner) is a simple four layer model with three Conv1D and one Dense layers with a couple of dropouts to reduce overfitting. Training over 200 epochs with 20% validation and a 0.075 learning rate resulted in a 93.7% validation accuracy. We can also see from the confusion matrix that the model has a tendency to skew towards the noise and stop classes and is usually very accurate when selecting "go" as it has a precision of 0.97. This is actually the desired behavior (in the case where we can't have 100% across the board!) as it's safer for the system to not understand a "go" or mishear a "stop" than to mishear a "go" and the bot starts driving away unintentially. It's much safer to fail towards stopping than towards driving!
The model testing window gives us a little more information on the performance of the model using the testing dataset. Here we can see each of the results for each testing audio segment. Since our window is 350ms and the slide is 210ms, we have four windows per 1000ms sample. This shows us how well the model is at detecting noise correctly, while the "stop" and "go" classes are severely lagging behind. However, this is slightly misleading. Looking at some of the "stop" and "go" classifications, we can see that the model tends to recognize three out of the four windows in the segment as the correct label and the other is noise. If we revisit some of the audio samples, we can actually see that the "stop" and "go" keywords are very quick to say, leaving a significant portion of the audio samples as pure noise -- exactly what the model is predicting! So while the model testing accuracy leaves a lot to be desired, the real-world performance looks promising.
Now we have a TinyML model with pretty good performance and a small foot print. It's time to put it on our CC1352 and get inferencing!
DeploymentOn the deployment page on the EdgeImpulse website, there are a number of different ways we can build our trained model for edge deployment. Under the "Build Firmware" section, we can select "TI LAUNHXL-CC1352P" to build a simple executable that will flash a pre-built program onto the device for inference with predictions transmitted back throug UART to your computer. Since we want to have a little more control on what the device does with it's inference results, we'll select the "C++ library" option. Once you select an option, you will see a new table open at the bottom of the page. This table provides you the option to quantize the model into signed 8-bit or leave it in 32-bit floating-point calculations. The 8-bit model will generally take slightly less RAM and less flash than the 32-bit float model. 8-bit calculations are also much faster than FP ops, so the latency should be significantly better. With the model we created earlier, the accuracy is actually just slightly better with the quanitzed model, so there's really no reason not to build this one.
EdgeImpulse provides instructions on how to integrate your neural network with an example wrapper program they have written at https://docs.edgeimpulse.com/docs/running-your-impulse-ti-launchxl. The GitHub repository for the EdgeImpulse wrapper program can be found at https://github.com/edgeimpulse/example-standalone-inferencing-ti-launchxl/. This program is a basic continuous inference on the device with communication through UART.
I found I had significant problems with my models not fitting in the SRAM when using this script, so I instead opted to merge the script with the i2secho example from the SimpleLink SDK (https://dev.ti.com/tirex/explore/node?node=AHYRPoLbHfUcGCZrpKAw1w__pTTHBmu__LATEST) to crete a slightly more optimized program. Of course, I still had plenty of problems getting this to work, but at least they were problems that I made!
EdgeImpulse also has another example project for the CC1352 at https://github.com/edgeimpulse/firmware-ti-launchxl. This program includes a menu over UART to control the device. However, is seems many of the functions are not implemented and I was never able to get a model that didn't run out of memory during the audio processing/DSP phase. However, this project is important because it lists all of the pre-requisites and explains how to build the FreeRTOS kernel which we will need to have built for our project (make sure to follow the prerequisite steps here if you haven't before!).
Additionally, exploring the source of this project taught me how that we need to edit the "raw_feature_get_data" function to use the ARM int16 to float conversion function "arm_q15_to_float" instead of simply assigning the int16 buffer (or casting it to a float) to the features array. Additionally, I used this to find the proper include files to add to the makefile (it was not included in either the example inference or the i2secho project!). This finding was key to me getting the project to work and to pass the correct data to the model.
The i2secho example uses FreeRTOS while the EdgeImpulse example does not use an RTOS. Since the i2secho is a little more complex (and I was most confused about the process of reading data from the microphone), I opted to build off of the i2secho source and integrate the source from the EdgeImpulse example. This required significant trial and error, especially concerning merging the makefiles to ensure everything was included properly. To make it easier for both of us and to spare pages of describing every difficulty I faced (also, I didn't take any notes), I instead provide the code in my repository at https://github.com/NickChiapputo/CC1352-EdgeImpulse-AudioML. You will need to change the following items to match your system setup:
- In
cc1352/gcc/makefile
, changeSIMPLELINK_CC13X2_26X2_SDK_INSTALL_DIR
,SYSCONFIG_TOOL
, andGCC_ARMCOMPILER
to match the install locations on your device. - In the directory where your SimpleLink CC13x2 and CC26x2 SDK is installed, edit the
imports.mak
file and change theXDC_INSTALL_DIR
,SYSCONFIG_TOOL
,FREERTOS_INSTALL_DIR
,CCS_ARMCOMPILER
,TICLANG_ARMCOMPILER
, andGCC_ARMCOMPILER
paths to match the setup on your device.
Once these paths are edited, you should be able to use the following commands (from the cc1352/
directory) to compile and load the program onto your CC1352 (unfortunately, I only know how to do this on a Linux machine):
$ cd gcc/; make clean; make; cd -
$ dslite.sh -c tools/user_files/configs/cc1352p1f3.ccxml -l tools/user_files/settings/generated.ufsettings -e -f -v gcc/build/edge-impulse-standalone.out
Barring any errors, your code should now be running on the CC1352 and classifying your audio! Open up a serial communication with the device and you can see what it is classifying in real-time. By default, it will only display "NOISE", "STOP", or "GO". This is to save on some latency as UART communication can be costly when sending significant amounts of data. There are some comment blocks in the cc1352/ei_main.cpp
file in the controlThread
function that you can uncomment to get some more information printed out. Trying saying "go", the green LED (DIO7) should turn on and the red LED (DIO6) should turn off. Now say "stop", the LEDs should swap again! Once you have your model running on the device, it's time to see how we can send our commands wirelessly!
While the CC1352 does have wireless capabilities, we will be using a simple 433 MHz RF transceiver pair as it is easier to use this to communicate with the MSP432 on the TI-RSLK and it comes with very little processor time and memory overhead that could increase our inference latency. For the transmitter connected to the CC1352, we will need the following components:
- 1x HT12E
- 1x 433 MHz transmitter
- 1x 100 KΩ resistor
- 1x 1 MΩ resistor
- 6x MtM jumper wires
- 3x MtF jumper wires
- 1x Breadboard
Using these components build the schematic in the image below (also included at the end of this project as a PDF). The six MtM jumper wires are used to connect VCC/GND on the TX and on the HT12E to the breadboard's power rail, the DATA on the TX to the DOUT pin on the HT12E, and to connect ~TE (transmission enable) pin to the negative power rail. The MtF jumper wires are used to connect the positive and negative power rails to the CC1352 (I recommend the 5V and GND pins just below the CC3200AUDBOOST instead of using MtM wires and connecting on the back) and one wire connecting AD11 to DIO6 on top of the CC3200AUDBOOST module. Since the red LED (connected on DIO6) is turned on when "STOP" is the active command, this will let us easily transmit this information without having to use another GPIO pin. The remaining data pins can be left empty for future commands you may want to issue! Make sure the address pins are left unconnected so the TX and RX ends can communicate and that you put the 100 KΩ and 1 MΩ resistors in series so they act as a 1.1 MΩ resistor across the oscillator pins.
Once the transmitter circuit is set up on the CC1352, we need to build the receiver on the breadboard on the top of the TI-RSLK. The process is very similar, except this time I've added an LED on the data output and on the valid transmission pin. These are good for debugging, but is not required for the project to work. You may notice in my images I actually use an LED on all four of the data output pins. You can always expand your dataset and train the model on multiple keywords and create up to 16 commands to wirelessly transmit! For the receiver, we will need the following components:
- 1x 433 MHz receiver
- 1x HT12D
- 1x 47 KΩ resistor
- 1x 2.2 KΩ resistor
- 5x MtM jumper wires
- 4x MtF jumper wires
- (optional) 2x LED
- (optional) 2x 330 Ω resistor
Using these components, build the schematic in the image below (also included at the end of the project as a PDF). The five MtM jumper wires connect the RX and the HT12D VCC/GND pins to the breadboard's power rails and connect the data in pins on the RX and HT12D together. The four FtM jumper wires connect the breadboard power tails to the 5V and GND pins on the MSP432 and connect the valid transmission pin to P2.4 and D11 to P2.5 on the MSP432. These last two wlil send the signal received from the wireless transmission to the MSP432 to control the RSLK. We can use the valid transmission line to determine if we have a good connection with the transmitter, allowing us to stop or perform some other action if we lose connection. Without the antennas soldered on to the 433 MHz TX/RX modules, they have only a few feet of range and are strongly limited by line of sight.
At this point, our circuits should all be set up! While the RSLK won't listen to the commands just yet, if you placed the LEDs on the receiver you can power up both boards and use the "stop" and "go" commands to verify that the information is being correctly transmitted.
The implementation for the RSLK can be found on the repository for this project (https://github.com/NickChiapputo/CC1352-EdgeImpulse-AudioML) under the rslk/
directory. It can be built with using the commands:
$ cd Debug/; make clean; make; cd -
The compiled code can be flashed to the RSLK using the command
$ dslite.sh -c targetConfigs/MSP432P401R.ccxml -e -f -v Debug/rslk.out
Before compiling, be sure to change the CG_TOOL_ROOT
path in the makefile to the appropriate path for your system setup. The "rslk.out" and "rslk.hex" rules also have a few paths that need to be changed to match your system. This implementation makes use of the MSP432 Driver Library to make peripheral functionalities easier to implement by avoiding register level actions. It can be downloaded from https://software-dl.ti.com/msp430/msp430_public_sw/mcu/msp430/MSP432_Driver_Library/latest/index_FDS.html. The install directory path for the DriverLib must be changed in the makefile in the ORDERED_OBJS
variable. Additionally, the link to the.lib file for the DriverLib must be changed in the subdir_vars.mk
file and a number of paths need to be changed in the subdir_rules.mk
file. These files are all generated from Code Composer Studio, so it may be easier to use that to generate the appropriate files instead of manually checking and changing the paths.
The source for the RSLK includes a simple API for the bump sensors, tachometer, motors, and UART to make it easier to contorl and monitor the RSLK systems. The program reads from the valid transmission and D11 pins on the HT12D. If the valid transmission pin is high (transmission is valid), then the program checks the control signal on D11. If the transmission is not valid, then the blinking LED on the MSP432 will blink blue. This can of course (and probably should be!) modified to either immediately stop the vehicle, or stop it after some time (e.g., 2 seconds) without a connection. Because of the weak connection between the 433 MHz TX and RX, the connection is frequently broken, so I opted to not stop the vehicle when it loses connection. Fortunately, we can also use the bump sensor detection to stop the vehicle when it runs into an object.
If the D11 pin is high (equivalently, DIO6 on the CC1352 is high), then we are sending the stop signal and the vehicle will set the PWM for both motors to 0%. If D11 pin is low (equivalently, DIO6 is low and DIO7 is high), then we are sending the go signal, then the vehicle will set the PWM for both motors to 20%. This value can be changed by editing the MOTOR_GO_SPEED
macro.
Once you get this program flashed onto the RSLK, your system should be all ready to go! Make sure to put some batteries in the RSLK and flip the switch beforehand. Test out saying "go" in different pitches and varying lengths. With my model, it was better to speak in a monotone voice and drag out "go" while "stop" need to be higher pitched and spoken quickly. Depending on how diverse your dataset is, you may need to be careful about how you say the keywords!
Comments