Living in a student dorm means you have to deal with many challenges. One big hassle is getting your laundry done: pack your smelly clothes and carry them down to the laundromat - only to find a mess with no available washer for the next hour.
As part of the Seeed Vision Challenge, we wanted to create something to help solve some of these issues. We designed a frame to retain the Vision AI Module v2 with a socketed Xiao ESP32S3 and deployed a custom model to look for washers. We learned lots about the new and exciting v2 AI Module, Seeed_Arduino_SSCMA/camera_web_server and the amazing Sensecraft no-code platform.
We aimed to create a system using the Seeed Vision AI Module v2 to detect and monitor washing machines' availability in a dorm laundromat.
To achieve this, we designed a frame to securely hold the Vision AI Module, using a socketed Xiao ESP32S3 to deploy a custom model for washer detection. This setup was intended to make the monitoring process seamless and efficient.
What Worked Out of the Box
The Vision AI Module v2 proved to be user-friendly and straightforward to set up. Integrating it with the Xiao ESP32S3 was smooth, and the initial setup and deployment of the camera_web_server on the Xiao ESP32S3 worked without major issues. Additionally, the Sensecraft no-code platform was highly intuitive, providing an easy starting point for AI model deployment, which was crucial for our project's rapid development.
Additional TestsTo better understand the capabilities of the ESP32S3, we explored various demos, including the Xiao ESP32S3 Sense Web Camera Demo. We also designed a 3D printed frame to hold the Vision AI Module and Xiao ESP32S3 securely. This frame featured adjustable camera angles, optional lens mounts, and a tidy setup, which facilitated easier data collection and ensured that the components remained securely in place during operation.
Challenges We Faced
Despite the initial successes, we encountered several challenges.
The camera_web_server had some initial bugs that required troubleshooting.
The flashing interface on the Sensecraft website also posed issues, needing multiple attempts to resolve.
Configuring the model to detect multiple classes effectively was particularly challenging, as it required fine-tuning and experimentation to achieve satisfactory results.
For our dataset, we used the camera_web_server running on the Vision Module v2 attached to the Xiao ESP32S3. By connecting the setup via a mobile hotspot and using a smartphone browser, we were able to save still images of washers. We aimed to create a diverse dataset by capturing images from various angles and under different lighting conditions, resulting in a total of 55 washer images. This diversity was crucial for training a robust AI model.
We employed both Label Studio and Roboflow for dataset labeling and augmentation. By implementing MobileNet v2 and YOLO 192 transfer learning models, we aimed to enhance the detection capabilities of our system. Although we faced initial difficulties, such as relabeling the dataset multiple times to improve accuracy, we ultimately achieved a functional model capable of detecting washers.
Google Colab ipynb model training file based on Gesture Detection Swift Yolo 192 provided example by Seeed: Another copy of Gesture_Detection_Swift-YOLO_192.ipynb
After labelling & training on the data, we uploaded the model to sensecraft.seeed.cc, typed in the labels and deployed it using the fancy built-in web serial flasher (use a supported browser) to the Seeed Grove Vision v2 board to get some results at last.
The inference results showed that only washers were detected, so further optimization is critically needed.
We also encountered issues with flashing the Arduino_SSCMA for the ESP32C3 but eventually succeeded with the ESP32S3 camera web server using PlatformIO.
While the model achieved fast inference speeds of around 30fps, its mostly clear we need a larger dataset and probably simpler labeling might be required for better results. We figured we try first with all the labels.
Further ExplorationWe compared Label Studio and Roboflow for dataset management and augmentation:
- Label Studio was robust and fast but lacked some advanced features like mosaic augmentation.
- Roboflow offered powerful features and a better user interface but involved additional costs.
We also explored modifying the YOLO 192 person detector colab for training and exporting models but stopped short of committing fully.
When deployed, we tested the image delay and found wired latency of about 300 ms and wireless latency between 400-600 ms, though jitteriness due to antenna placement and orientation remained an issue.
The 3D printed frame significantly facilitated data collection, keeping the setup secure and tidy. Our project provides an approach to a practical solution for monitoring washer availability in dorm laundromats. With potential improvements in dataset size and model configuration, this system could become even more effective, helping students spend less time waiting for washers.
By unraveling Seeed’s Vision V2, we've taken a small step towards making dorm life more convenient, and show that students can potentially monitor laundry availability on the edge in real-time affordably, saving time and frustration.
Links and References@fb03 prepared a step model of the AI Vision v2 available on GrabCAD, which made the adapter kit design possible
Model files also available on Printables here: printables.com/model/929372-seeed-grove-vision-module-v2-adapter-kit
Optional magnetic lens kit: de.aliexpress.com/item/1005007059986698.html
The adjustable lens mount we use in the adapter kit is a remix of the one RoverXR uses: github.com/mbz4/RoverXR
Comments