After discovering the "2024 Winter Break Event" event organized by eetree, I eagerly participated. With excitement, I chose the Seeed XIAO-esp32s3 from ten available platforms. My interest in machine learning and AI led me to select this platform, which supports embedded machine learning. I aimed to embark on a project centered around smart pet home automation.
With a dog at home and considering the powerful capabilities of the XIAO-esp32s3, including a microphone and camera, I initially entertained the idea of integrating numerous functions into my project. I thought about using the network camera feature to monitor my dog's activities remotely, including recognizing its posture and performing object detection. Additionally, I considered implementing keyword recognition to detect and analyze my pet's facial expressions. However, the complexity of integrating all these features and the potential heat generation posed significant challenges. Eventually, I decided to simplify by employing machine learning techniques for keyword recognition.
- Utilize AI-powered sound recognition technology to identify and record pet sounds, such as barking or meowing. When the pet exhibits signs of stress, the system should automatically play calming music through Home Assistant to alleviate the pet's anxiety.
1. The Seeed XIAO-esp32s3 is programmed to recognize three types of sounds: dog barking, dog whining, and ambient noise.
2. Home Assistant is configured to respond differently based on the sounds identified by the Seeed XIAO-esp32s3:
- When the system detects dog whining, it triggers a notification through smart speakers to alert the owner and plays calming music to soothe the pet's emotions.
- When dog barking is detected, the system alerts the owner through smart speakers, serving as a form of alarm notification.
III、mplementation Process1.Model Training:- Collect a total of 10 minutes or more of dog barking, ambient noise, and dog whining audio samples. These samples are inputted into Edge Impulse.
- Automatically or manually segment the audio samples into one-second segments.
- Divide all types of samples into an 80% training set and a 20% testing set.
- Utilize pulse to acquire raw data and employ signal processing techniques to extract features.
- Use learning blocks to classify new data.
- Train the model using machine learning techniques in Edge Impulse.
- After testing, the model achieves an accuracy of 91.41%.
- Download the trained model and import it into Arduino.
- Modify the file names in the example program to match the names of the downloaded model files.
- Click compile and run. Open the serial monitor.
- Set the dialogue box in the bottom right corner to "NL and CR" and 115200 baud rate (default).
- Wait for the compilation and download to complete.
- If successful, the serial communication monitor should display as shown in the figure below:
- Follow the tutorial video on Bilibili (BV1KS4y1F7HM), which requires a GitHub account. Special thanks to the content creator: "我叫小纪".
- - Install Oracle VM VirtualBox and create a Linux virtual machine. (Installation package provided in the video description by the content creator.)
- Unzip the compressed package to obtain the .vdi file, which serves as the installation file for Home Assistant. It's possible that the version might require updating. If the installation fails, you may need to download the latest version from the following address: [Home Assistant Installation for Linux]
- Create a virtual machine and click on "New".
- Finally, click on "Start" to initiate the virtual machine and wait for Home Assistant to start. The initial startup process may take some time, so please be patient.
- Then, in your web browser, type "homeassistant:8123" and press Enter to access it. Be prepared for potential extended waiting periods during this setup phase; in my experience, it may take several hours.
- Next, you'll need to register a username and password (make sure to remember them). Then, in the bottom left corner, click on your profile picture and enable advanced mode. Once this is done, the installation process is complete.
The steps involved here are relatively complex, and due to space constraints, it's recommended to continue learning by watching the video on Bilibili. Here's the link to the video: [BV1KS4y1F7HM].
5.MQTT InstallationHaving prior experience with this protocol, which is commonly used in the Internet of Things (IoT) domain due to its lightweight publish-subscribe mechanism, I added the MQTT add-on in Home Assistant. Then, I created an MQTT server within Home Assistant. Next, in Arduino, I added the PubSubClient library. With just a few lines of code to establish a WiFi connection, connect to MQTT, and publish messages in the Keyword Spotting (KWS) recognition program, achieving communication between the ESP32 and Home Assistant became straightforward.
To install MQTT:
1. Click on "Configuration."
2. Select "Add-On Store."
3. Search for "MQTT."
4. Click on "Mosquitto broker."
5. Install "Mosquitto broker."
- Set the MQTT server name and password according to the illustration.
1. Click on "Configuration".
2. Navigate to "Devices & Services".
3. Select "Add Integration".
4. Search for "MQTT".
5. Click on "MQTT".
6. Enter the server name and password you just set up.
- Then you can set the MQTT topics. The following image shows the MQTT settings and testing interface, where you can set topics and test receiving topic content from the listening topic box.
1. Click on "Configuration".
2. Navigate to "Automations & Scenes".
3. Select "Create Automation".
4. Follow the prompts to set up the automation.
bool m = microphone_inference_record(); // 录制音频
if (!m) {
ei_printf("ERR: Failed to record audio...\n"); // 打印错误信息
return;
}
- `microphone_inference_record()` is a segment of code placed within the `void loop()` function, defined at the end of the script. It continuously loops and calls this function to record audio.
static bool microphone_inference_record(void)
{
bool ret = true;
while (inference.buf_ready == 0) {
delay(10);
}
inference.buf_ready = 0;
return ret;
}
- The function `microphone_inference_record()` is a static function designed to wait for new data. When `inference.buf_ready` is 0, the function enters a loop with a 10-millisecond delay in each iteration. When `inference.buf_ready` is not 0, the function sets `inference.buf_ready` to 0 and returns True. `inference.buf_ready` serves as a flag indicating whether recording sampling can occur.
/******************************************************************************************************/
signal_t signal;
signal.total_length = EI_CLASSIFIER_RAW_SAMPLE_COUNT;//信号数据的总长度
signal.get_data = µphone_audio_signal_get_data;//获取音频信号数据
ei_impulse_result_t result = { 0 };
- A `signal_t` structure variable `signal` is created.
- The `total_length` field of the `signal` structure is set to `EI_CLASSIFIER_RAW_SAMPLE_COUNT`, indicating the total length of the signal data.
- The `get_data` field is set to a pointer pointing to the function `microphone_audio_signal_get_data`, which is responsible for obtaining audio signal data.
- This code segment, located after the previous recording function in the `loop()`, continuously runs the classifier on the recorded samples.
//使用 signal 结构、result 变量和可能的 debug_nn 参数调用了 run_classifier 函数。判断提供的信号数据上运行分类器是否有错误。
EI_IMPULSE_ERROR r = run_classifier(&signal, &result, debug_nn); // 运行分类器
if (r != EI_IMPULSE_OK) {
ei_printf("ERR: Failed to run classifier (%d)\n", r); // 打印错误信息
return;
}
- The purpose of this code snippet is to perform inference using a classifier on the audio signal data obtained after recording. Subsequently, it checks whether the inference was successful.
int pred_index = 0; // 初始化预测索引
float pred_value = 0; // 初始化预测值
// 打印预测结果
ei_printf("Predictions ");
ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
result.timing.dsp, result.timing.classification, result.timing.anomaly);
ei_printf(": \n");
// 遍历分类结果
for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
ei_printf(" %s: ", result.classification[ix].label); // 打印类别标签
ei_printf_float(result.classification[ix].value); // 打印预测值
ei_printf("\n");
// 更新最大预测值和对应的索引
if (result.classification[ix].value > pred_value){
pred_index = ix;
pred_value = result.classification[ix].value;
}
}
Immediately following the previous code snippet, this code segment handles and prints the classifier's prediction results.
static int microphone_audio_signal_get_data(size_t offset, size_t length, float *out_ptr)
{
numpy::int16_to_float(&inference.buffer[offset], out_ptr, length);
return 0;
}
- This function is used to retrieve raw audio signal data.
- It calls the `numpy::int16_to_float` function to convert 16-bit integers extracted from the audio buffer into floating-point numbers. The results are then stored in the provided output pointer `out_ptr`.
The Keyword Spotting (KWS) code provided in the official example has undergone minimal modifications. My main task has been understanding the purpose of each code segment and adding comments to it.
2.MQTT message sending code#include <WiFi.h>
#include <PubSubClient.h>
// WiFi连接信息 改为自己的 热点也可以 用2.5G
const char* ssid = "XXX";
const char* password = "*******";
// MQTT服务器信息
const char* mqttServer = "192.168.230.250";
const int mqttPort = 1883;
const char* mqttUsername = "admin";
const char* mqttPassword = "hlj20020511";
// 创建WiFi客户端和MQTT客户端实例
WiFiClient wifiClient;
PubSubClient mqttClient(wifiClient);
- This is the basic setup for MQTT within Arduino code. It primarily includes the use of the WiFi and PubSubClient libraries, which respectively prepare for connecting to WiFi and the MQTT server. Then, the necessary connection information is configured. Subsequently, WiFi client and MQTT client instances are created.
// 设置MQTT连接函数
void setupMQTT() {
mqttClient.setServer(mqttServer, mqttPort);
}
// 重新连接MQTT服务器函数
void reconnectMQTT() {
while (!mqttClient.connected()) {
Serial.println("正在连接到MQTT...");
if (mqttClient.connect("ESP32Client", mqttUsername, mqttPassword)) {
Serial.println("连接到MQTT");
} else {
Serial.print("MQTT连接失败,rc=");
Serial.print(mqttClient.state());
Serial.println(" 5秒钟后重试...");
delay(5000);
}
}
}
- MQTT connection function, with comments already provided for easy understanding.
if ((pred_index == 2) && (pred_value > 0.8)){
mqttClient.publish("dog bark", "环境音~");
}
else if ((pred_index == 1) && (pred_value > 0.8)){
mqttClient.publish("dog bark", "狗狗在大叫呢!!!");
}else if ((pred_index == 0) && (pred_value > 0.8)){
mqttClient.publish("dog bark", "狗狗在撒娇呢~");
}
- The variable `pred_index` represents the three types recognized by the Keyword Spotting (KWS) model (0 - whining, 1 - barking, 2 - ambient noise). The code snippet above uses the model's classification result. It employs an if statement to select the appropriate message ("topic", "message") and then calls the library function `mqttclient.publish` to send it to the MQTT server (Home Assistant).
The main challenge lies in the first-time encounter with platforms such as Home Assistant, ESPHome, HACS, Xiaomi MIOT Auto, etc. The tutorials are not comprehensive or smooth, and there is a wide distribution of various installation methods. Network issues also consume a lot of effort in finding experimental methods suitable for one's current environment. In hindsight, it's not particularly difficult, but it requires patience in information retrieval, adaptation, and addressing network problems.
- Future Plans: Although these challenges are one-time experiences, the benefits are significant. As the saying goes, "learning from mistakes." In this way, I can have a long-lasting smart home control center, which I am very satisfied with.
Audio samples are relatively easy to annotate compared to images, and Edge Impulse supports automatic annotation, which is really good. However, after completing this project, I found that the recognition rate is not very sensitive in reality. Dog barking sounds are not all the same "woof" sound; there are differences in tone and there are obvious errors, such as recognizing barking as ambient noise, whining as barking, or delayed responses after several seconds of continuous barking.
Future Plans:In the future, I plan to increase the number of training samples and add more types of sounds. I will collect barking sounds based on dog breeds (sounds from the same breed should be similar), and I will also include meowing sounds from cats. It would be best to collect sounds using on-board microphones, but this is very difficult because we cannot control when pets make noise.
3.Power Consumption and ConvenienceUsing a Linux virtual machine on my computer to install Home Assistant is temporary because it needs to be kept running all the time, which consumes a lot of power and resources.
Future Plans: I want to use Home Assistant for the long term. There are many installation methods worth trying, such as installing it on cloud servers (Tencent Cloud, Alibaba Cloud, etc.), Raspberry Pi, NAS, etc.
VI、Reference Materials- Tutorial on Keyword Spotting (KWS) using Edge Impulse
- Tutorial on Keyword Spotting (KWS) in the Wiki Documentation
- Installation Tutorial for Home Assistant (Bilibili Video)
- Installation Tutorial for ESPHome on Home Assistant (Bilibili Video)
The ESP32 recognizes three types of sounds: ambient noise, dog barking, and dog whining. Then, it sends the results to Home Assistant via MQTT. After setting up the recognition for each type in Home Assistant, it triggers notifications, plays music, or performs other control operations accordingly. (Bilibili Video)
VIII、Learning Records- Embedded Machine Learning
I used the MIC routine on the xiao_esp32s3 to record some dog sound samples as training data for keyword machine learning. Later, I found that this method was not very efficient, so I searched online for sound samples. In the end, I collected sound samples totaling over ten minutes. Following the original tutorial by MJRoBot, I went through each step one by one, which took quite a long time (o(╥﹏╥)o). It was my first attempt, and I didn't consider the need for a larger sample size or determine the types of sounds to recognize (eventually settled on ambient noise, dog barking, and dog whining).
I am also amazed by my first encounter with embedded machine learning. Previously, my impression of machine learning was that it required powerful GPUs for computation. Now, I am surprised that I can use a tiny ESP32-S3, which is smaller than my thumb, for machine learning tasks.
- Home Assistant
After considering how to connect the xiao-esp32s3 to the internet, I found a smart home system called Home Assistant from the event website. Driven by curiosity, I started to explore: installing a virtual machine, setting up Home Assistant, HACS, ESPHome, MQTT, and so on.
I also encounted challenges when installing Home Assistant for the first time. Integrating Xiaomi Mi Home into Home Assistant was a long and arduous process o(╥﹏╥)o. I followed the Bilibili video tutorial recommended in the reference materials, but it required a lot of patience!
- ESPHome
while there are tutorials on the official wiki, I encountered difficulties because I installed Linux on a Windows virtual machine instead of using a Raspberry Pi as instructed in the tutorial. After multiple failed attempts, I spent several days searching for tutorials and videos until I finally succeeded. Among the resources I found, the most helpful was the Bilibili video tutorial: BV1uV4y1H7e2 (link available in the reference materials)
However, to meet the final task requirements, MQTT was still necessary, and ESPHome was not utilized. In my personal opinion, ESPHome is more suitable for interfacing sensors with ESP chips, transmitting sensor data to ESPHome integrated within Home Assistant, and controlling them via the Home Assistant dashboard. This indeed proves to be very convenient. However, in the case of running a keyword recognition program on ESP32, when downloading firmware (.bin or .yaml files) in ESPHome, I understand that the downloaded firmware will run on ESP32, thus making it impossible to run the keyword recognition program. Although I had some intuition about this beforehand, I still conducted experiments, confirming my expectations. Overall, ESPHome is very convenient for DIY home sensor projects.
- MQTT
I have added screenshots of the automation deployment in Home Assistant for three types of voice recognition: dog barking, ambient noise, and dog whining. Each type of sound triggers corresponding actions.
- Overall, I learned a lot of IoT knowledge that I had wanted to learn before and explored many new areas that I had not previously encountered. I am very happy with my experience on this project and grateful to eetree and Seeed for giving us students the opportunity to experience such great products.
- o( ̄▽ ̄)d
Comments