This project utilizes Xiao S3 to design a machine vision-based automatic detection system for cat food bowls. Its main functionality is to detect the food in the cat food bowl using machine vision without the need for any machine learning models. When the food is insufficient, it sends an email notification as a reminder.
DesignThere are already many existing projects online that achieve similar results, mostly by taking photos of bowls with and without cat food, labeling them, and using RNN for training to achieve recognition. Therefore, although the end result is the same, I wanted to try a different approach this time.
I plan to use color recognition for judgment: We can fix the camera above the cat food bowl, and since there is a significant color difference between the bowl and the cat food, we can analyze the color distribution in the camera's captured image. If the color range of the cat food occupies a large proportion, it indicates that there is still enough food in the bowl. However, if the proportion of colors near white is too large, it means that the cat food is almost finished, and most of the captured photos will show the bottom of the bowl. Since I don't have cat food at home right now, I will use green beans as a substitute for testing purposes.
It seems that this program doesn't require a lot of computational power and can be directly run on the ESP32S3. However, I tried running it on Circuitpython and encountered several issues, although it can still run with some effort. Therefore, I also attempted to use the Xiao S3 solely as an IP camera module, streaming the video to a Raspberry Pi 5, which would be responsible for data processing.
BoardThe Seeed Studio XIAO series is a line of small development boards that share a similar hardware structure, with a size that is actually as small as a thumb. The name "XIAO" represents its size characteristic of being "small" and also signifies its powerful functionality, as "XIAO" can also mean "brave" or "vigorous". The Seeed Studio XIAO ESP32S3 Sense integrates a camera sensor, digital microphone, and SD card support. With its embedded ML computing capability and photography features, this development board can be an excellent tool to start exploring smart voice and visual AI applications.
Seeed Studio XIAO ESP32S3 Sense utilizes the highly integrated Xtensa processor ESP32-S3R8 SoC, supporting 2.4GHz WiFi and low-power Bluetooth® BLE 5.0 dual-mode, making it suitable for various wireless applications. It also features lithium battery charging management.
As an advanced version of Seeed Studio XIAO ESP32S3, this board is equipped with a plug-in OV2640 camera sensor, capable of displaying a full resolution of 16001200. Its base is even compatible with OV5640, supporting resolutions of up to 25921944. The board also includes a digital microphone for voice sensing and audio recognition. SenseCraft AI provides various pre-trained artificial intelligence (AI) models and no-code deployment for XIAO ESP32S3 Sense.
This development board comes with a powerful SoC and built-in sensors, featuring 8MB PSRAM and 8MB FLASH on the chip. Additionally, it has an SD card slot that supports up to 32GB of FAT storage space. These provide the board with a larger programming space and open up more possibilities for embedded ML applications.
ImplementationFirst, I attempted to complete all the steps directly on the Xiao S3. Fortunately, Circuitpython supports ulab.numpy, which provides great convenience for data processing. Circuitpython uses the firmware of ESP32-S3-DevKitC-1-N8R8, which comes with the ulab and espcamera libraries. We can use these libraries to initialize the camera. It's important to note that the captured frame buffer from the camera requires dedicated memory space to store it. Therefore, we need to add CIRCUITPY_RESERVED_PSRAM=1048576 in the settings.toml file to allocate space for the buffer. The specific allocation size will depend on the captured frame size and framebuffer_count and may need to be modified accordingly.
def init_cam():
_i2c = busio.I2C(scl=board.IO39, sda=board.IO40)
_data_pin = [
board.IO15,
board.IO17,
board.IO18,
board.IO16,
board.IO14,
board.IO12,
board.IO11,
board.IO48
]
_cam = espcamera.Camera(
data_pins=_data_pin,
pixel_clock_pin=board.IO13,
vsync_pin=board.IO38,
href_pin=board.IO47,
i2c=_i2c,
external_clock_pin=board.IO10,
# external_clock_frequency=20_000_000,
# powerdown_pin=None,
# reset_pin=None,
pixel_format=espcamera.PixelFormat.RGB565,
frame_size=espcamera.FrameSize.SVGA,
# jpeg_quality = 5,
framebuffer_count = 1,
# grab_mode=espcamera.GrabMode.WHEN_EMPTY
)
_cam.vflip = False
_cam.hmirror = True
return _cam
The captured bitmap photos obtained through the espcamera.Camera method can also undergo pixel traversal and bitwise operations to obtain RGB colors. However, the unfortunate thing is that the array in ulab.numpy does not support three-dimensional arrays and only supports up to two dimensions. Therefore, here we need to compress the two-dimensional pixel coordinates into one dimension, leaving one dimension to store RGB information. The bitmap captured by espcamera has an original color format of 16-bit RGB565. We also need to convert it to 24-bit RGB888 for easier color calculations in subsequent steps.
def color(rgb565):
high = rgb565 >> 8
low = rgb565 & 0xFF
rgb565 = (low << 8) | high
R5 = rgb565 >> 11
G6 = (rgb565 >> 5) & 0B111111
B5 = rgb565 & 0B11111
R8 = ( R5 * 527 + 23 ) >> 6
G8 = ( G6 * 259 + 33 ) >> 6
B8 = ( B5 * 527 + 23 ) >> 6
RGB8 = [R8, G8, B8]
return RGB8
def get_array(bitmap):
image_array = np.zeros((bitmap.height*bitmap.width, 3), dtype=np.uint8)
for y in range(bitmap.height):
for x in range(bitmap.width):
image_array[x*bitmap.height+y] = color(bitmap[x,y])
print("\r" + "progress: " + str(int(y * 100 / (bitmap.height - 1))) + "%" + " " * 3, end="")
print("")
return image_array
Once we obtain the regular np array, we can calculate the difference between the color of each pixel and the given target color. Here, I use the Euclidean distance to measure the difference between colors. After calculating the differences, we can determine whether a pixel has a different color or the same color based on a preset threshold. If the difference is higher than the threshold, it is considered a different color, and if the difference is lower than the threshold, it is considered the same color.
def calculate(image_array, target_color, tolerance):
# Calculate vector norm
color_diff = np.linalg.norm(image_array - target_color, axis=1)
# Calculate the number of similar pixels
similar_pixels = np.sum(color_diff < tolerance)
# Calculate the proportion
total_pixels = image_array.shape[0]
proportion = similar_pixels / total_pixels
return proportion
However, to use the above method, we still need to know the target color and the threshold value. Due to variations in each camera and the influence of lighting conditions on color perception, we cannot directly rely on theoretical values for color selection. Instead, we should use the colors obtained from the camera under real working conditions. Here, my approach is to capture an image containing only cat food and calculate the average color and the Euclidean distance of the color standard deviation in that image. The average color can be used as the target color, while the Euclidean distance of the color standard deviation can serve as a reference for the threshold value.
def get_params(image_array):
avg_color = np.mean(image_array, axis=0)
threshold = np.linalg.norm(np.std(image_array, axis=0))
gc.collect()
print(avg_color, threshold)
So next, all we need to do is to run the get_params() method to obtain the parameters. Once we have the parameters, we can input them into the calculate() method and run it in a loop.
However, I encountered several issues during actual usage. Firstly, the data processing is extremely slow in CircuitPython. It takes more than 5 minutes just to process an array into an np.array at a resolution of 1280x1024. This makes it extremely difficult to write and test the code efficiently. Additionally, the critical problem is that data processing consumes a lot of memory. The xiao_s3 board only has 8MB of PSRAM, which is insufficient to support the calculation of Euclidean distance even for images of 800x600 resolution.
And when we set the FrameSize to the default QQVGA, which is 160x120, the program can run normally. However, due to the low image quality, the color difference between having food in the bowl and not having food is not significant enough, leading to potential misjudgments.
Therefore, I had to resort to another method to complete the project. I decided to use the xiao_s3 board solely as an IP camera and delegate the computational tasks to the Raspberry Pi 5.
Firstly, we need to follow the official wiki tutorial to add Arduino support. Then, we can open the CameraWebServer example from the Arduino official demo.
Next, there are a few places that need to be modified. Firstly, uncomment #define CAMERA_MODEL_XIAO_ESP32S3 // Has PSRAM
and comment out all other camera models. Then, modify the WiFi credentials accordingly. It's important to note that the code provided in the official wiki includes a line while(!Serial);
in the setup function. Be sure to comment it out, as the program will not run without a serial connection.
Once the modifications are done, you can proceed with uploading the code. During the upload, please note that the default configuration does not enable PSRAM. We need to manually change the PSRAM option to OSPI
.
After uploading, observe the serial monitor. If everything is working correctly, the serial monitor will print the IP address of the camera. We can enter this IP address on our computer to access the camera control panel and view the camera feed.
Next is the Raspberry Pi part. It follows the same approach as the CircuitPython method described above, with only minor differences. For example, we can use the request library to fetch the latest frame, convert it to an image using the Pillow library, and then use NumPy for further processing and calculations. I won't go into detail here, as the process is quite similar to what was described earlier.
def find_color_proportion(image_array, target_color, tolerance):
# Calculate vector norm
color_diff = np.linalg.norm(image_array - target_color, axis=2)
# Calculate the number of similar pixels
similar_pixels = np.sum(color_diff < tolerance)
# Calculate the proportion
total_pixels = image_array.shape[0] * image_array.shape[1]
proportion = similar_pixels / total_pixels
return proportion
def get_params(image_array):
avg_color = np.mean(image_array, axis=(0, 1))
std_color = np.std(image_array, axis=(0, 1))
threshold = np.linalg.norm(std_color)
print(avg_color, threshold)
image_array = np.array(Image.open(io.BytesIO(requests.get("http://192.168.1.91/capture?_cb=0").content)))
# get_params(image_array)
proportion = find_color_proportion(image_array, [25, 29, 21], 30)
print("Color Proportion:", proportion)
After testing, we can now successfully differentiate between an empty and full cat food bowl. Let's set the threshold for the proportion to 20%. If the proportion falls below 20%, we will need to send an email notification to the owner. Next, we need to implement SMTP email sending.
host = "smtp.qq.com"
port = 465
sender = "xxxxxxxxxxxx@qq.com"
app_password = "xxxxxxxxxxx"
receiver = sender
def send_mail(msg):
content = MIMEMultipart()
content["subject"] = "cat food"
content["from"] = sender
content["to"] = receiver
content.attach(MIMEText(msg))
with smtplib.SMTP_SSL(host=host, port=port) as smtp:
try:
smtp.login(sender, app_password)
smtp.send_message(content)
print("Email Sent!")
except Exception as error:
print("Error: ", error)
Sure, any email provider that supports SMTP can be used here. Now, let's write the main loop function, and the project will be up and running.
while True:
image_array = np.array(Image.open(io.BytesIO(requests.get("http://192.168.1.91/capture?_cb=0").content)))
# get_params(image_array)
proportion = find_color_proportion(image_array, [25, 29, 21], 30)
print("Color Proportion:", proportion)
if proportion < 0.20:
send_mail("Cat Food Bowl is EMPTY.")
time.sleep(10)
ResultsWe can see that when the cat food bowl is full, the output proportion is around 60%; whereas when the cat food bowl is empty, the output proportion is 0%.
This activity gave me the opportunity to completely approach a common project on the internet in a new way. It was a fascinating experience to use numpy for data science tasks on ESP32S3.
Comments