Published March 2, 2024 © Apache-2.0

Pet sound detection based on Seeed XIAO-esp32s3

Based on the Seeed XIAO-esp32s3, we have developed a pet sound detection system integrated with Home Assistant and Mi Home (MIJIA). This sys

IntermediateShowcase (no instructions)7 days556

Pet sound detection based on Seeed XIAO-esp32s3

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Sense

Software apps and online services

Arduino IDE

Home Assistant

MQTT

Story

"2024 Winter Break Event"

After discovering the "2024 Winter Break Event" event organized by eetree, I eagerly participated. With excitement, I chose the Seeed XIAO-esp32s3 from ten available platforms. My interest in machine learning and AI led me to select this platform, which supports embedded machine learning. I aimed to embark on a project centered around smart pet home automation.

My dog

With a dog at home and considering the powerful capabilities of the XIAO-esp32s3, including a microphone and camera, I initially entertained the idea of integrating numerous functions into my project. I thought about using the network camera feature to monitor my dog's activities remotely, including recognizing its posture and performing object detection. Additionally, I considered implementing keyword recognition to detect and analyze my pet's facial expressions. However, the complexity of integrating all these features and the potential heat generation posed significant challenges. Eventually, I decided to simplify by employing machine learning techniques for keyword recognition.

Project flow chart

I、Task Requirements

Utilize AI-powered sound recognition technology to identify and record pet sounds, such as barking or meowing. When the pet exhibits signs of stress, the system should automatically play calming music through Home Assistant to alleviate the pet's anxiety.

II、Function Implementation

1. The Seeed XIAO-esp32s3 is programmed to recognize three types of sounds: dog barking, dog whining, and ambient noise.

2. Home Assistant is configured to respond differently based on the sounds identified by the Seeed XIAO-esp32s3:

- When the system detects dog whining, it triggers a notification through smart speakers to alert the owner and plays calming music to soothe the pet's emotions.

- When dog barking is detected, the system alerts the owner through smart speakers, serving as a form of alarm notification.

III、mplementation Process

1.Model Training:

- Collect a total of 10 minutes or more of dog barking, ambient noise, and dog whining audio samples. These samples are inputted into Edge Impulse.

- Automatically or manually segment the audio samples into one-second segments.

- Divide all types of samples into an 80% training set and a 20% testing set.

- Utilize pulse to acquire raw data and employ signal processing techniques to extract features.

- Use learning blocks to classify new data.

- Train the model using machine learning techniques in Edge Impulse.

- After testing, the model achieves an accuracy of 91.41%.

edge impulse

2.Model Deployment

- Download the trained model and import it into Arduino.

- Modify the file names in the example program to match the names of the downloaded model files.

- Click compile and run. Open the serial monitor.

- Set the dialogue box in the bottom right corner to "NL and CR" and 115200 baud rate (default).

- Wait for the compilation and download to complete.

- If successful, the serial communication monitor should display as shown in the figure below:

Serial Communication Monitor

3. Home Assistant Installation

- Follow the tutorial video on Bilibili (BV1KS4y1F7HM), which requires a GitHub account. Special thanks to the content creator: "我叫小纪".

- Install Oracle VM VirtualBox and create a Linux virtual machine. (Installation package provided in the video description by the content creator.)

Homeassistant install 1

Unzip the compressed package to obtain the .vdi file, which serves as the installation file for Home Assistant. It's possible that the version might require updating. If the installation fails, you may need to download the latest version from the following address: [Home Assistant Installation for Linux]

Homeassistant install 2

Create a virtual machine and click on "New".

1 / 5 • Homeassistant install 3

Finally, click on "Start" to initiate the virtual machine and wait for Home Assistant to start. The initial startup process may take some time, so please be patient.
Then, in your web browser, type "homeassistant:8123" and press Enter to access it. Be prepared for potential extended waiting periods during this setup phase; in my experience, it may take several hours.

Homeassistant install 8

Next, you'll need to register a username and password (make sure to remember them). Then, in the bottom left corner, click on your profile picture and enable advanced mode. Once this is done, the installation process is complete.

Homeassistant install 9

4.Integrating Xiaomi Smart Home with Home Assistant

The steps involved here are relatively complex, and due to space constraints, it's recommended to continue learning by watching the video on Bilibili. Here's the link to the video: [BV1KS4y1F7HM].

5.MQTT Installation

Having prior experience with this protocol, which is commonly used in the Internet of Things (IoT) domain due to its lightweight publish-subscribe mechanism, I added the MQTT add-on in Home Assistant. Then, I created an MQTT server within Home Assistant. Next, in Arduino, I added the PubSubClient library. With just a few lines of code to establish a WiFi connection, connect to MQTT, and publish messages in the Keyword Spotting (KWS) recognition program, achieving communication between the ESP32 and Home Assistant became straightforward.

To install MQTT:

1. Click on "Configuration."

2. Select "Add-On Store."

3. Search for "MQTT."

4. Click on "Mosquitto broker."

5. Install "Mosquitto broker."

MQTT 1

Set the MQTT server name and password according to the illustration.

MQTT 2

1. Click on "Configuration".

2. Navigate to "Devices & Services".

3. Select "Add Integration".

4. Search for "MQTT".

5. Click on "MQTT".

6. Enter the server name and password you just set up.

Then you can set the MQTT topics. The following image shows the MQTT settings and testing interface, where you can set topics and test receiving topic content from the listening topic box.

MQTT 3

6.Automation Setup

1. Click on "Configuration".

2. Navigate to "Automations & Scenes".

3. Select "Create Automation".

4. Follow the prompts to set up the automation.

自动化设置

IV、Code Analysis

1.Keyword Spotting (KWS) Code

bool m = microphone_inference_record(); // 录制音频
     if (!m) {
         ei_printf("ERR: Failed to record audio...\n"); // 打印错误信息
         return;
     }

`microphone_inference_record()` is a segment of code placed within the `void loop()` function, defined at the end of the script. It continuously loops and calls this function to record audio.

static bool microphone_inference_record(void)
{
    bool ret = true;

     while (inference.buf_ready == 0) {
         delay(10);
     }

    inference.buf_ready = 0;
     return ret;
}

The function `microphone_inference_record()` is a static function designed to wait for new data. When `inference.buf_ready` is 0, the function enters a loop with a 10-millisecond delay in each iteration. When `inference.buf_ready` is not 0, the function sets `inference.buf_ready` to 0 and returns True. `inference.buf_ready` serves as a flag indicating whether recording sampling can occur.

/******************************************************************************************************/

signal_t signal;
    signal.total_length = EI_CLASSIFIER_RAW_SAMPLE_COUNT;//信号数据的总长度
    signal.get_data = &microphone_audio_signal_get_data;//获取音频信号数据
    ei_impulse_result_t result = { 0 };

A `signal_t` structure variable `signal` is created.
The `total_length` field of the `signal` structure is set to `EI_CLASSIFIER_RAW_SAMPLE_COUNT`, indicating the total length of the signal data.
The `get_data` field is set to a pointer pointing to the function `microphone_audio_signal_get_data`, which is responsible for obtaining audio signal data.
This code segment, located after the previous recording function in the `loop()`, continuously runs the classifier on the recorded samples.

//使用 signal 结构、result 变量和可能的 debug_nn 参数调用了 run_classifier 函数。判断提供的信号数据上运行分类器是否有错误。
     EI_IMPULSE_ERROR r = run_classifier(&signal, &result, debug_nn); // 运行分类器
 
     if (r != EI_IMPULSE_OK) {
         ei_printf("ERR: Failed to run classifier (%d)\n", r); // 打印错误信息
         return;
 }

The purpose of this code snippet is to perform inference using a classifier on the audio signal data obtained after recording. Subsequently, it checks whether the inference was successful.

int pred_index = 0; // 初始化预测索引
    float pred_value = 0; // 初始化预测值

 // 打印预测结果
         ei_printf("Predictions ");
         ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
            result.timing.dsp, result.timing.classification, result.timing.anomaly);
         ei_printf(": \n");

 // 遍历分类结果
         for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
             ei_printf("    %s: ", result.classification[ix].label); // 打印类别标签
             ei_printf_float(result.classification[ix].value); // 打印预测值
             ei_printf("\n");
 
 // 更新最大预测值和对应的索引
             if (result.classification[ix].value > pred_value){
               pred_index = ix;
               pred_value = result.classification[ix].value;
     }
 }

Immediately following the previous code snippet, this code segment handles and prints the classifier's prediction results.

static int microphone_audio_signal_get_data(size_t offset, size_t length, float *out_ptr)
{
     numpy::int16_to_float(&inference.buffer[offset], out_ptr, length);

     return 0;
}

This function is used to retrieve raw audio signal data.
It calls the `numpy::int16_to_float` function to convert 16-bit integers extracted from the audio buffer into floating-point numbers. The results are then stored in the provided output pointer `out_ptr`.

The Keyword Spotting (KWS) code provided in the official example has undergone minimal modifications. My main task has been understanding the purpose of each code segment and adding comments to it.

2.MQTT message sending code

#include <WiFi.h>
#include <PubSubClient.h>
// WiFi连接信息 改为自己的 热点也可以 用2.5G
const char* ssid = "XXX";
const char* password = "*******";
// MQTT服务器信息
const char* mqttServer = "192.168.230.250";
const int mqttPort = 1883;
const char* mqttUsername = "admin";
const char* mqttPassword = "hlj20020511";
// 创建WiFi客户端和MQTT客户端实例
WiFiClient wifiClient;
PubSubClient mqttClient(wifiClient);

This is the basic setup for MQTT within Arduino code. It primarily includes the use of the WiFi and PubSubClient libraries, which respectively prepare for connecting to WiFi and the MQTT server. Then, the necessary connection information is configured. Subsequently, WiFi client and MQTT client instances are created.

// 设置MQTT连接函数
void setupMQTT() {
  mqttClient.setServer(mqttServer, mqttPort);
}

// 重新连接MQTT服务器函数
void reconnectMQTT() {
 while (!mqttClient.connected()) {
    Serial.println("正在连接到MQTT...");
 if (mqttClient.connect("ESP32Client", mqttUsername, mqttPassword)) {
      Serial.println("连接到MQTT");
 } else {
      Serial.print("MQTT连接失败，rc=");
      Serial.print(mqttClient.state());
      Serial.println(" 5秒钟后重试...");
 delay(5000);
 }
 }
}

MQTT connection function, with comments already provided for easy understanding.

if ((pred_index == 2) && (pred_value > 0.8)){
      mqttClient.publish("dog bark", "环境音~");
 }
 else if ((pred_index == 1) && (pred_value > 0.8)){
      mqttClient.publish("dog bark", "狗狗在大叫呢！！！");
 }else if ((pred_index == 0) && (pred_value > 0.8)){
      mqttClient.publish("dog bark", "狗狗在撒娇呢~");
 }

The variable `pred_index` represents the three types recognized by the Keyword Spotting (KWS) model (0 - whining, 1 - barking, 2 - ambient noise). The code snippet above uses the model's classification result. It employs an if statement to select the appropriate message ("topic", "message") and then calls the library function `mqttclient.publish` to send it to the MQTT server (Home Assistant).

V、Main Issues and Future Plans

1.. Environment Configuration

The main challenge lies in the first-time encounter with platforms such as Home Assistant, ESPHome, HACS, Xiaomi MIOT Auto, etc. The tutorials are not comprehensive or smooth, and there is a wide distribution of various installation methods. Network issues also consume a lot of effort in finding experimental methods suitable for one's current environment. In hindsight, it's not particularly difficult, but it requires patience in information retrieval, adaptation, and addressing network problems.

Future Plans: Although these challenges are one-time experiences, the benefits are significant. As the saying goes, "learning from mistakes." In this way, I can have a long-lasting smart home control center, which I am very satisfied with.

2. Accuracy and Diversity of Speech Recognition

Audio samples are relatively easy to annotate compared to images, and Edge Impulse supports automatic annotation, which is really good. However, after completing this project, I found that the recognition rate is not very sensitive in reality. Dog barking sounds are not all the same "woof" sound; there are differences in tone and there are obvious errors, such as recognizing barking as ambient noise, whining as barking, or delayed responses after several seconds of continuous barking.

Future Plans:In the future, I plan to increase the number of training samples and add more types of sounds. I will collect barking sounds based on dog breeds (sounds from the same breed should be similar), and I will also include meowing sounds from cats. It would be best to collect sounds using on-board microphones, but this is very difficult because we cannot control when pets make noise.

3.Power Consumption and Convenience

Using a Linux virtual machine on my computer to install Home Assistant is temporary because it needs to be kept running all the time, which consumes a lot of power and resources.

Future Plans: I want to use Home Assistant for the long term. There are many installation methods worth trying, such as installing it on cloud servers (Tencent Cloud, Alibaba Cloud, etc.), Raspberry Pi, NAS, etc.

VI、Reference Materials

VII、Effect Display

The ESP32 recognizes three types of sounds: ambient noise, dog barking, and dog whining. Then, it sends the results to Home Assistant via MQTT. After setting up the recognition for each type in Home Assistant, it triggers notifications, plays music, or performs other control operations accordingly. (Bilibili Video)

VIII、Learning Records

Embedded Machine Learning

I used the MIC routine on the xiao_esp32s3 to record some dog sound samples as training data for keyword machine learning. Later, I found that this method was not very efficient, so I searched online for sound samples. In the end, I collected sound samples totaling over ten minutes. Following the original tutorial by MJRoBot, I went through each step one by one, which took quite a long time (o(╥﹏╥)o). It was my first attempt, and I didn't consider the need for a larger sample size or determine the types of sounds to recognize (eventually settled on ambient noise, dog barking, and dog whining).

Off-topic remark

I am also amazed by my first encounter with embedded machine learning. Previously, my impression of machine learning was that it required powerful GPUs for computation. Now, I am surprised that I can use a tiny ESP32-S3, which is smaller than my thumb, for machine learning tasks.

Home Assistant

After considering how to connect the xiao-esp32s3 to the internet, I found a smart home system called Home Assistant from the event website. Driven by curiosity, I started to explore: installing a virtual machine, setting up Home Assistant, HACS, ESPHome, MQTT, and so on.

Dashboard

I also encounted challenges when installing Home Assistant for the first time. Integrating Xiaomi Mi Home into Home Assistant was a long and arduous process o(╥﹏╥)o. I followed the Bilibili video tutorial recommended in the reference materials, but it required a lot of patience!

ESPHome

while there are tutorials on the official wiki, I encountered difficulties because I installed Linux on a Windows virtual machine instead of using a Raspberry Pi as instructed in the tutorial. After multiple failed attempts, I spent several days searching for tutorials and videos until I finally succeeded. Among the resources I found, the most helpful was the Bilibili video tutorial: BV1uV4y1H7e2 (link available in the reference materials)

The method used for downloading the necessary firmware for the first time.

However, to meet the final task requirements, MQTT was still necessary, and ESPHome was not utilized. In my personal opinion, ESPHome is more suitable for interfacing sensors with ESP chips, transmitting sensor data to ESPHome integrated within Home Assistant, and controlling them via the Home Assistant dashboard. This indeed proves to be very convenient. However, in the case of running a keyword recognition program on ESP32, when downloading firmware (.bin or .yaml files) in ESPHome, I understand that the downloaded firmware will run on ESP32, thus making it impossible to run the keyword recognition program. Although I had some intuition about this beforehand, I still conducted experiments, confirming my expectations. Overall, ESPHome is very convenient for DIY home sensor projects.

MQTT

I have added screenshots of the automation deployment in Home Assistant for three types of voice recognition: dog barking, ambient noise, and dog whining. Each type of sound triggers corresponding actions.

1 / 4 • mqtt

Overall, I learned a lot of IoT knowledge that I had wanted to learn before and explored many new areas that I had not previously encountered. I am very happy with my experience on this project and grateful to eetree and Seeed for giving us students the opportunity to experience such great products.
o(￣▽￣)ｄ

Code

Pet sound detection based on Seeed XIAO-esp32s3

#include <Arduino.h>
#include <WiFi.h>
#include <PubSubClient.h>
//如果您的目标内存有限，请移除此宏以节省10KB的RAM
// If your target is limited in memory remove this macro to save 10K RAM

//定义EIDSP_QUANTIZE_FILTERBANK为0
#define EIDSP_QUANTIZE_FILTERBANK   0

// WiFi连接信息
const char* ssid = "K40";
const char* password = "heliangji";


//#include <XIAO-ESP32S3-KWS_inferencing.h>
#include <llly-project-1_inferencing.h>
#include <I2S.h>
#define SAMPLE_RATE 16000U
#define SAMPLE_BITS 16

#define LED_BUILT_IN 21 
// MQTT服务器信息
const char* mqttServer = "192.168.230.250";
const int mqttPort = 1883;
const char* mqttUsername = "admin";
const char* mqttPassword = "hlj20020511";

// 创建WiFi客户端和MQTT客户端实例
WiFiClient wifiClient;
PubSubClient mqttClient(wifiClient);

// 设置WiFi连接函数
void setupWiFi() {
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("正在连接WiFi...");
  }
  Serial.println("已连接到WiFi");
}

// 设置MQTT连接函数
void setupMQTT() {
  mqttClient.setServer(mqttServer, mqttPort);
}

// 重新连接MQTT服务器函数
void reconnectMQTT() {
  while (!mqttClient.connected()) {
    Serial.println("正在连接到MQTT...");
    if (mqttClient.connect("ESP32Client", mqttUsername, mqttPassword)) {
      Serial.println("连接到MQTT");
    } else {
      Serial.print("MQTT连接失败，rc=");
      Serial.print(mqttClient.state());
      Serial.println(" 5秒钟后重试...");
      delay(5000);
    }
  }
}

/** 这段代码定义了一个音频处理的结构体，包括缓冲区、指针和选择器。 */
typedef struct {
    int16_t *buffer;  // 音频缓冲区指针
    uint8_t buf_ready;  // 缓冲区是否准备好的标志
    uint32_t buf_count; // 缓冲区中样本数量
    uint32_t n_samples;   // 缓冲区中样本数量
} inference_t;  // 推断结构体

static inference_t inference; // 静态推断变量
static const uint32_t sample_buffer_size = 2048;   // 静态样本缓冲区大小
static signed short sampleBuffer[sample_buffer_size];   // 静态样本缓冲区
static bool debug_nn = false;   // 设置为 true 可以查看生成的原始信号特征等
static bool record_status = true;  // 录制状态标志


/**
 * @brief      Arduino setup function
 */
void setup()
{
    Serial.begin(115200);
    setupWiFi();
    setupMQTT();
    /////////////////////////////////////////////////////////////////////////////////////////////////////
    // 在这里放置你的setup代码，以运行一次：
    Serial.begin(115200); // 以115200波特率开始串口通信
    // 注释掉以下行以取消等待USB连接（需要本机USB）
    while (!Serial);
    Serial.println("Edge Impulse Inferencing Demo");   // 打印"Edge Impulse Inferencing Demo"到串口

    pinMode(LED_BUILT_IN, OUTPUT); // 将引脚设置为输出
    digitalWrite(LED_BUILT_IN, HIGH); //关闭内置LED

    I2S.setAllPins(-1, 42, 41, -1, -1); //设置I2S引脚 
    //  初始化I2C若失败则无限循环
    if (!I2S.begin(PDM_MONO_MODE, SAMPLE_RATE, SAMPLE_BITS)) {
      Serial.println("Failed to initialize I2S!");
    while (1) ;
  }
    
    // summary of inferencing settings (from model_metadata.h)
    ei_printf("Inferencing settings:\n");
    ei_printf("\tInterval: ");
    ei_printf_float((float)EI_CLASSIFIER_INTERVAL_MS);
    ei_printf(" ms.\n");
    ei_printf("\tFrame size: %d\n", EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE);
    ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);
    ei_printf("\tNo. of classes: %d\n", sizeof(ei_classifier_inferencing_categories) / sizeof(ei_classifier_inferencing_categories[0]));

    ei_printf("\nStarting continious inference in 2 seconds...\n");
    ei_sleep(2000);

    if (microphone_inference_start(EI_CLASSIFIER_RAW_SAMPLE_COUNT) == false) {
        //ERR: 无法分配音频缓冲区（大小 %d），这可能是由于您的模型的窗口长度
        ei_printf("ERR: Could not allocate audio buffer (size %d), this could be due to the window length of your model\r\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT);
        return;
    }
    //录音中.......
    ei_printf("Recording...\n");
    
}

/**
 * @brief      Arduino main function. Runs the inferencing loop.
 * 该函数是一个Arduino主函数，用于运行推断循环。首先，它通过调用 microphone_inference_record() 函数来录制音频。
 * 如果录制失败，则打印错误消息并返回。然后，它使用录制的音频信号调用 run_classifier() 函数进行推断，并将结果存储在 result 变量中。
 * 如果推断失败，则打印错误消息并返回。接下来，它打印推断结果，包括DSP时间、分类时间和异常时间。
 * 然后，它遍历所有类别，并打印每个类别的标签和值。如果某个类别的值大于之前的最大值，则更新最大值和对应的类别索引。
 * 最后，根据最大值的索引和值，控制LED的亮灭状态。如果异常检测功能被启用，则打印异常得分。
 */
void loop()
{
      // 检查MQTT连接状态，如果未连接则尝试重新连接
    if (!mqttClient.connected()) {
      reconnectMQTT();
    }
    // 处理MQTT消息
    mqttClient.loop();
    /*****************************************************************************************************************************/
    bool m = microphone_inference_record(); // 录制音频
    if (!m) {
        ei_printf("ERR: Failed to record audio...\n");  // 打印错误信息
        return;
    }

    signal_t signal;
    signal.total_length = EI_CLASSIFIER_RAW_SAMPLE_COUNT;
    signal.get_data = &microphone_audio_signal_get_data;
    ei_impulse_result_t result = { 0 };

    EI_IMPULSE_ERROR r = run_classifier(&signal, &result, debug_nn);  // 运行分类器
    if (r != EI_IMPULSE_OK) {
        ei_printf("ERR: Failed to run classifier (%d)\n", r); // 打印错误信息
        return;
    }

    int pred_index = 0;     // 初始化预测索引
    float pred_value = 0;   // 初始化预测值

    // 打印预测结果
    ei_printf("Predictions ");
    ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",
        result.timing.dsp, result.timing.classification, result.timing.anomaly);
    ei_printf(": \n");
    for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
        ei_printf("    %s: ", result.classification[ix].label);
        ei_printf_float(result.classification[ix].value);
        ei_printf("\n");

        if (result.classification[ix].value > pred_value){
           pred_index = ix;
           pred_value = result.classification[ix].value;
      }
    }
    // 显示推断结果 开灯 关灯
    if ((pred_index == 2) && (pred_value > 0.8)){
//      digitalWrite(LED_BUILT_IN, LOW); //Turn on
      mqttClient.publish("dog bark", "环境音~");
    }
    else  if ((pred_index == 1) && (pred_value > 0.8)){
//      digitalWrite(LED_BUILT_IN, HIGH); //Turn off
      mqttClient.publish("dog bark", "狗狗在大叫呢！！！");
    }else  if ((pred_index == 0) && (pred_value > 0.8)){
      mqttClient.publish("dog bark", "狗狗在撒娇呢~");
    }

    
#if EI_CLASSIFIER_HAS_ANOMALY == 1
    ei_printf("    anomaly score: ");//异常得分
    ei_printf_float(result.anomaly);
    ei_printf("\n");
#endif
}
/*
# 音频推断回调函数
# 参数：
#   n_bytes：字节数
*/
static void audio_inference_callback(uint32_t n_bytes)
{
    //遍历字节数的一半
    for(int i = 0; i < n_bytes>>1; i++) {
        inference.buffer[inference.buf_count++] = sampleBuffer[i];  //将样本数据存储到推断缓冲区
        // 如果推断缓冲区已满
        if(inference.buf_count >= inference.n_samples) {
          inference.buf_count = 0;  //  重置推断缓冲区计数器
          inference.buf_ready = 1;  //  设置推断缓冲区已准备好
        }
    }
}

/*
该函数是一个任务函数，用于捕获I2S数据并进行处理。
它首先读取指定数量的I2S数据，并进行一些处理，然后调用audio_inference_callback函数进行推断。
如果record_status为真，则继续循环读取和处理数据，否则退出循环。
最后，该函数会删除自身任务。
*/

static void capture_samples(void* arg) {

  const int32_t i2s_bytes_to_read = (uint32_t)arg;
  size_t bytes_read = i2s_bytes_to_read;
  //循环捕获数据，直到停止记录状态
  while (record_status) {

    /* 一次性从I2S读取数据 - 为XIAO ESP2S3 Sense和I2S.h库修改 */
    // i2s_read((i2s_port_t)1, (void*)sampleBuffer, i2s_bytes_to_read, &bytes_read, 100);
    esp_i2s::i2s_read(esp_i2s::I2S_NUM_0, (void*)sampleBuffer, i2s_bytes_to_read, &bytes_read, 100);

    //  如果读取的字节数小于等于0，则输出错误信息
    if (bytes_read <= 0) {
      ei_printf("Error in I2S read : %d", bytes_read);
    }
    else {
        //  如果读取的字节数小于指定的字节数，则输出部分读取信息
        if (bytes_read < i2s_bytes_to_read) {
        ei_printf("Partial I2S read");
        }

        // 对数据进行缩放（否则声音太轻）
        for (int x = 0; x < i2s_bytes_to_read/2; x++) {
            sampleBuffer[x] = (int16_t)(sampleBuffer[x]) * 8;
        }
        //  如果记录状态为真，则调用音频推断回调函数
        if (record_status) {
            audio_inference_callback(i2s_bytes_to_read);
        }
        else {
            break;  //  如果记录状态为假，则跳出循环
        }
    }
  }
  vTaskDelete(NULL);  //释放任务
}

/**
 * @brief      初始化推断结构并设置/启动PDM
 *
 * @param[in]  n_samples  The n samples 样本数量
 *
   * @return     { description_of_the_return_value }返回描述返回值的描述
   * 该函数用于启动麦克风的推断过程。它接受一个参数n_samples，表示要推断的样本数量。
   * 函数首先分配了一个大小为n_samples的int16_t类型数组，并将其赋值给inference.buffer。
   * 如果分配失败，则返回false。然后，函数将inference.buf_count设置为0，inference.n_samples设置为n_samples，inference.buf_ready设置为0。
   * 接下来，函数调用ei_sleep函数等待100毫秒。
   * 最后，函数创建了一个名为CaptureSamples的任务，并返回true。
 */
static bool microphone_inference_start(uint32_t n_samples)
{
    inference.buffer = (int16_t *)malloc(n_samples * sizeof(int16_t));

    if(inference.buffer == NULL) {
        return false;
    }

    inference.buf_count  = 0;
    inference.n_samples  = n_samples;
    inference.buf_ready  = 0;

//    if (i2s_init(EI_CLASSIFIER_FREQUENCY)) {
//        ei_printf("Failed to start I2S!");
//    }

    ei_sleep(100);

    record_status = true;

    xTaskCreate(capture_samples, "CaptureSamples", 1024 * 32, (void*)sample_buffer_size, 10, NULL);

    return true;
}

/**
 * @brief      Wait on new data
 *
 * @return     True when finished
 * 该函数是一个静态函数，用于等待新的数据。当inference.buf_ready为0时，函数会进入一个循环，每次循环延迟10毫秒。
 * 当inference.buf_ready不为0时，函数会将inference.buf_ready设置为0，并返回True。
 */
static bool microphone_inference_record(void)
{
    bool ret = true;

    while (inference.buf_ready == 0) {
        delay(10);
    }

    inference.buf_ready = 0;
    return ret;
}

/**
 * Get raw audio signal data
 * 该函数用于获取麦克风音频信号数据。它接受一个偏移量和长度参数，以及一个指向输出数据的指针。
 * 函数使用numpy库将指定偏移量和长度范围内的整数音频信号转换为浮点数信号，并将结果存储在输出指针指向的内存中。
 * 最后，函数返回0表示成功获取音频信号数据。
 */
static int microphone_audio_signal_get_data(size_t offset, size_t length, float *out_ptr)
{
    numpy::int16_to_float(&inference.buffer[offset], out_ptr, length);

    return 0;
}

/**
 * @brief      Stop PDM and release buffers
 * 该函数用于停止PDM（Pulse Density Modulation）并释放缓冲区。它首先释放sampleBuffer，然后释放inference.buffer。
 */
static void microphone_inference_end(void)
{
    free(sampleBuffer);
    ei_free(inference.buffer);
}


#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_MICROPHONE
#error "Invalid model for current sensor."
#endif