Himax recently released the WE2 AI Processor HX6538-A, a processor that embedded dual ARM Cortex-M55 processors and an ARM Ethos-U55 microNPU, integrated with Helium vector and floating-point extensions to accelerate convolution operation of neural network model.
In the datasheet, they've highlighted three distinct features:
1)award-winning low power consumption
2) A boost in on-device ML performance
3) designed for battery-powered endpoint AI applications
I'm really into tinyML, and it's cool that Seeed just used the HX6538-A to make their new Grove Vision AI Module V2. I've got my hands on a sample of this module, and I am eager to conduct a performance test to gauge its capabilities.
How am I going to test them?I plan to conduct a straightforward / quick evaluation, 1)selecting an image processing model and 2)running it on Grove Vision AI Module V2 and several other popular vision AI boards on the market. This will allow me to 3)compare their performances respectively.
My test is divided into two parts, consisting of two comparisons:
1)Compare the mainstream MCU-based boards capable of running AI models and evaluate their performance.
2)Run the model on a Raspberry Pi 4B to get a sense of the difference between an MCU and a miniature computer.
I've selected the following 6 boards for comparison:
- Grove Vision AI V2:the new kid on the block with the HX6538-A, likely packing a punch for AI applications.
- Grove Vision AI V1: the predecessor to the V2, so it'll be interesting to see how much improvement the new version brings.
- ESP32-S3-EYE Development Board:based on ESP32S3, with integrated AI capabilities, often used for image recognition projects.
- Seeed Studio XIAO ESP32S3 Sense: a tinyML-ready dev board includes sensors on board.
- Arduino Nicla Vision: a versatile board designed with ML in mind, featuring low power consumption and specialized for edge AI tasks.
- Raspberry Pi 4 Model B:a mini-computer with a good amount of processing power.
I will be comparing across five dimensions:
● Power consumption:this metric indicates the potential for creating battery-powered products.
● Inference time: this reflects the processing speed of the MCU and its latency. Shorter is better.
Inference time is the duration required to complete a single inference task, such as classifying or detecting an object in an image, usually measured in milliseconds (ms).
● Frame rate: it assesses whether the product can capture instant changes, movements, or impulses. Higher is better.
Typically measured in Frames Per Second (FPS), it represents the number of frames that the system can process each second. This is actually a calculated value, related to the inference time. The formula to determine frame rate is: Frame Rate (FPS)=1000 / Inference Time (milliseconds)
● Ease of use: this looks at how beginner-friendly the product is for those new to vision AI and how quickly it can run mainstream models. This will be a subjective score from me, with a higher score indicating greater user-friendliness.
● Price: Indicates whether the commercial barrier to entry is sufficiently low.
I am going to test these board in this way:
1) I will flash the same AI model onto each of these boards,
2) then use these modules to capture and recognize an AI-generated human face displayed on my computer screen under the same environmental conditions,
3) and then I will record their performance across the five mentioned dimensions.
Note:
1. Raspberry Pi 4B is Included for perspective on CPU vs. MCU performance, despite its difference from MCU boards.
2. Nicla Vision: Due to compatibility issues with the test model, it is tested with an alternative method; results are not directly comparable with other boards.
3. The inclusion of these 2 boards is intended to offer a broader perspective on processing capabilities across different hardware platforms. Their results are color-highlighted for clear distinction on table chart.
Quick glance at the test resultsIf you're looking to directly see the results and conclusions, please refer to the chart below. But if you're interested in the detailed experimental process, continue scrolling down.
Each board has its own unique features and is suitable for specific scenarios. If we evaluate purely from computational power, CPUs are indeed much stronger than MCUs. From a power consumption standpoint, if you're developing a battery-powered product, you must opt for an MCU-based board.
After balancing computational power, cost, and power consumption, the Grove Vision AI V2, based on ARM M55 and U55, indeed performs very well, it is even powerful enough to capture motion.
However, it is not the most comprehensive development board as it is better suited for handling vector data. If more development resources are needed, I would suggest pairing it with a master controller and use Grove Vision AI V2 as a smart expansion solely to handle the ML workload, or you could opt to a development board like Nicla Vision or ESP32.
Details of my testing processChoosing a model
Firstly, I need a suitable model, and I have several ideas in mind:
1)I want a model that can give immediate and intuitive results; facial recognition would be perfect.
2)The model needs to be runnable on an MCU.
3)The model should be compatible with a common deep learning framework, such as TFLite.
I chose this one that based on YOLO, which can operates within the TFLite environment.
This model has predetermined input and output.
All input images will be resized to 96x96 pixels. This standardization helps to reduce discrepancies that might arise from using different cameras on various boards. The output is structured as a 567x6 tensor, where 567 represents the number of potential bounding boxes, and the 6 corresponds to the attributes: [x, y, width, height, confidence score, class]. Having fixed input and output sizes ensures that the computational load remains consistent across all processors. This uniformity allows for a fair comparison of performance metrics, such as inference time and power consumption, since each board will be performing the same amount of computation. It is crucial for the test to ensure an "apples to apples" comparison.
Next, my goal is to flash it onto the different boards and record the performance.
Grove Vision AI V2This product is the latest to be released among all the items tested this time. What draws me to it the most is that its core is built based on a dual-core Arm Cortex-M55 and an integrated Arm Ethos-U55 NPU.
It claims that there is an impressive improvement in ML performance while still keeping a very low power consumption.
The design of the board includes a CSI interface, decoupling the camera from the board, which indicates Seeed's intention to support a wider range of CSI cameras, like the Raspberry Pi cameras.
The first camera supported is the OV5647. Given that, I suspect that many algorithms that perform well on the Raspberry Pi could potentially be ported to this board, possibly yielding excellent recognition results. This is something worth to try in the future.
Test results of Grove Vision AI V2
Product: Seeed Studio Grove Vision AI V2
Processor: Himax WiseEye2 HX6538 (Cortex®-M55 + Ethos-U55) 400 MHz(M55 big) + 150MHz (M55 little) + 400MHz (U55)
Power Consumption: 0.35W
Inference Time: 33ms
Frame Rate: 30.3FPS
Ease of Use: 9.0
Price: $23.89
It is really impressive that its fps has reached 30+ FPS, which means that even when I am singing, I can clearly see every movement of my mouth. This level of performance from an MCU-grade processor is something I am witnessing for the first time.
Typically, the human eye can perceive changes in images at about 10 to 12 frames per second. Beyond this frame rate, most people will perceive the motion as continuous. The standard FPS for movies is usually 24 frames. Therefore, a processing speed of 30+ FPS means that it is more than capable of capturing many subtle movement in human motion.
Moreover, a highlight is its power consumption. Despite such high performance, it doesn’t compromise on power consumption, making it perfectly viable for battery-powered devices.
Kudos to ARM, Himax, and Seeed for this product!
As easy as a breeze with SenseCraft AI
To utilize this module, all you need is the web-based SenseCraft AI. The entire process is incredibly beginner-friendly. It takes less than a minute to setup the board and see the results.
Connecting the board to the computer via the Type-C interface, simply open Seeed's SenseCraft AI, select the board, choose the model, click 'deploy', and then you can immediately see the results on the interface.
I would rate this experience a 9 out of 10 because it's truly straightforward. There's even no need to follow a wiki's step-by-step guide to get everything up and running.
Quick sum for Grove Vision AI V2
I use a serial port monitoring software to observe the reported inference speeds and calculate the frame rate information. The inference speed is only 33 milliseconds! It's sensitive enough to capture screen moiré patterns! It's undeniable that this improvement on MCU is quite substantial.
So, for the GroveVision AI V2, the results are as follows: the inference time is approximately 33 milliseconds, the frame rate is about 30.30 FPS, under a 5.06V power supply, the output current fluctuates between 68.6 milliamps to 71.1 milliamps, with an average power consumption of 0.35 watts. For "ease of use, " I gave it 9 points!!
The Grove Vision AI V1 is the first generation of the Grove Vision AI line. I included it in this round of testing because I thought it would be insightful to compare the V1 and V2 side-by-side, to spot the differences in their performance and to see the extent of the upgrades in the V2.
Test results of Grove Vision AI V1
Product: Seeed Studio Grove Vision AI V1
Processor: Himax HX6537-A (ARC EM9D DSP) 400 MHz
Power Consumption: 0.40W
Inference Time: 389.0ms
Frame Rate: 2.57FPS
Ease of Use: 8.0
Price: $25.99
It is evident that V2 has made substantial improvements, with the inference time being nearly 10 times faster, and the frame rate exceeding by more than 14 times. Additionally, the power consumption has been reduced and the price is more affordable.
Running the TFLite model on Grove Vision AI V1
While Grove Vision AI V1 is also a product made by Seeed Studio, it doesn't offer the same user experience as the V2 version and lacks the one-click firmware flashing support in SenseCraft AI.
To run a TFLite model on this device, follow these specific steps:
1)Download the test model
The problem arose when I discovered that the Grove Vision AI V1 does not support the direct use of tflite files; it only accept uf2 files. Therefore, a file format conversion is necessary.
2) Convert.tflite to.uf2
Before proceeding, ensure Python and pip are installed on your system. You will also need to install Numpy.
pip install numpy
Download the uf2conv.py conversion script, and then, execute the following command in the terminal to convert the downloaded model file into a.uf2 file.
python3 uf2conv.py -t 1 'swift_yolo_1xb16_300e_coco_300_int8_sha1_2287b951101007d4cd1d09c3da68e53e6f23a071.tflite' -c
3)Flash the firmware and the converted test model to Grove Vision AI V1
Connect module to the host PC with a Type-C cable and double click BOOT button on the module. There will be a "GROVEAI" disk pop up.
Copy both the grove_ai_v02-01.uf2and swift_yolo-face-dectetion_model.uf2files to the GROVEAI disk to complete the firmware flash.
4)Check the result
Open a web-based application, click 'Connect', select 'Algorithm', and you can see the recognition results in real-time.
When viewing the real-time output through the serial port, it outputs the recognition results approximately every 389 milliseconds, which means it only has 2.57 FPS. It seems this product is only suitable for recognizing static images and cannot handle video feeds.
Next, I examined the power consumption of this board.
I had the Grove Vision AI V1 run the facial recognition model while recording the device's continuous operating voltage and current to calculate its average power consumption.
Quick sum for Grove Vision AI V1
For the Grove Vision AI V1, the results are: the inference time is approximately 389 milliseconds, the frame rate is about 2.57 FPS, under a 5.06V power supply, the output current fluctuates between 78.9 milliamps to 81.8 milliamps, with an average power consumption of 0.40 watts. For "ease of use, " I gave it 6 points.
After testing the V2 and then looking back at the V1, I found that the V2's power consumption has not been significantly reduced compared to the V1. However, there is a substantial improvement in FPS and inference time. Also, when comparing the photos taken by the two products, there's a noticeable difference in photo quality, which indirectly confirms the enhancement in ML performance of the V2.
Furthermore, the overall user experience with the V2 has been greatly optimized by SenseCraft AI, not just in terms of hardware performance. When using the V1, I actually encountered some difficulties. For example, the V1 actually comes with a preloaded default human recognition model. Before I burned the test model, I wanted to try out the product experience with their built-in model. Following the operations from the Seeed Wiki, such a simple demo was still not very smooth for me to attempt. As for the firmware burning process, the V2 supports one-click burning, whereas with the V1, I needed to drag and drop different firmware into the disk to complete the burning.
Seeed Studio XIAO ESP32S3 SenseThis coin-sized tiny gadget, comes integrated with a camera, microphone, SD card slot, and is TinyML-ready.
An intriguing aspect of this product's design is its commitment to minimal size. The camera is first mounted onto an expansion board, which is then stacked onto the XIAO ESP32S3 mainboard. The connection between the expansion board and the XIAO ESP32S3 mainboard is accomplished through a dedicated B2B (board-to-board) connector.
In my opinion, it's quite a clever design. I wonder if Seeed will disclose the details of this interface in the future, allowing us to unleash our creativity to develop other expansion boards, for instance, an attachment for a miniaturized, automated pizza-making conveyor. (Just in case you're in the middle of debugging and find yourself craving a slice!)
Test results of XIAO ESP32S3 Sense
Product: Seeed Studio XIAO ESP32S3 Sense
Processor: ESP3253 /Dual-Core Tensilica LX6) 240 MHz
Power Consumption: 0.45W
Inference Time: 180ms
Frame Rate: 5.55FPS
Ease of Use: 9.0
Price: $13.99
The ML performance of the XIAO ESP32S3 Sense seems to be positioned between Grove Vision V1 and V2. Its performance is not as good as V2, but it is slightly better than V1. Considering the price, it also offers a higher cost-performance ratio and a very satisfactory user experience.
It is much easier with SenseCraft AI
Quick intro: SenseCraft AI is a web-based platform that simplifies the deployment of both pre-trained and custom AI models on Seeed products, providing immediate visualization of inference results for quick performance evaluation.
Within the platform, there are already some optimized models available—for instance, models tailored for MCUs as well as models suited for GPUs like Nvidia's. You can select from these ready-made models and, with just a few clicks, burn them into Seeed products, ensuring great compatibility. And you can also build your own model on it.
The test model we used is already in the SenseCraft AI model library, so configuring this board with SenseCraft AI was a very smooth process.
To deploy the model, all you need to do is openning this web-based application- SenseCraft AI, and click a few buttons.
Quick sum for XIAO ESP32S3 Sense
Check the real-time information printed on the serial port to see what speeds and frame rates are. To view the reported data information from the XIAO, you can send the command AT+INVOKE=-1,1
through the serial port.
As for the power consumption, it is ~0.45W.
So, for the XIAO ESP32S3 Sense, the results are: the inference time is approximately 180 milliseconds, the frame rate is about 5.55 FPS, under a 5.06V power supply, the output current fluctuates between 87.1 milliamps to 93.4 milliamps, with an average power consumption of 0.45 watts. For "ease of use, " I gave it 9 points.
Although the XIAO ESP32S3 Sense may not perform as well as the Grove Vision AI V2 in terms of computational power, it does have a significant advantage in price.
ESP32-S3-EYE Development BoardThe main controller on this board is the same as the one on the XIAO ESP32S3 Sense, and theoretically, boards with the same main controller should have similar performance. However, I happen to have one on hand, so I included it in this round of testing as well. At the same time, I'm curious to see if, aside from price differences, there will be any significant performance discrepancies between development boards made by different manufacturers that use the same chip.
Test results of ESP32-S3-EYE Development Board
Product:ESP32-S3-EYE Development Board
Processor: Espressif ESP32-S3 (Dual-Core Tensilica LX6) 240MHz
Power Consumption: 0.46W
Inference Time: 180.0ms
Frame Rate: 5.55FPS
Ease of Use: 4.0
Price: $45.00
As anticipated, the result of ESP32-S3-EYE is similar to the XIAO ESP32S3 Sense from a performance standpoint. It shows a slight improvement over the Grove Vision AI V1 and is not quite up to par with the V2. But as for the "ease of use", after experimenting with this board, my impression is that it's not particularly beginner-friendly, I gave it 4 points out of 10.
Running the test model on ESP32-S3-EYE
1)Using ESP IDF
The Espressif IDF is a powerful tool for IoT development, yet when it comes to deploying machine vision tasks, the environment is complex and not as accessible for beginners. The framework requires a deep understanding of its inner workings, and developers often struggle with a lack of streamlined setup for machine vision. Given these hurdles, particularly the steep learning curve and the intricate setup for machine vision, the ease of use for implementing such applications via the IDF is realistically a 4 out of 10.
2)Using SenseCraft AI
I think given that the ESP32-S3-EYE shares the same core as the XIAO ESP32S3 Sense, it's possible that SenseCraft AIcould be compatible with the ESP32-S3-EYE board. I can likely flash XIAO’s firmware and test models onto the it using this platform.
I give it a trial!
Download the firmware, boot file and the demoand flash with SenseCraft AI. The interface provided by Seeed simplifies the firmware flashing process.
And it worked!!!! Yay!
Check the real-time information printed on the serial port.
Power consumption.
Quick sum for ESP32-S3-EYE
For the ESP32-S3-EYE Development Board, the results are: the inference time is approximately 180 milliseconds, the frame rate is about 5.55 FPS, under a 5.06V power supply, the output current fluctuates between 88.2 milliamps to 94.3 milliamps, with an average power consumption of 0.46 watts. Its performance is the same as Xiao ESP32S3.
I rate the ease of use at 4 out of 10, as the design doesn't seem to be aimed at beginners. However, by employing SenseCraft AI as an alternative approach, I avoided the hassle of navigating between various sites. This allowed me to interact with the test model with relative ease.
Nicla Vision & OpenMVThe Nicla Vision is a versatile development board equipped with a camera capable of color and depth sensing, a microphone, a motion sensor, a distance sensor, and Wi-Fi/Bluetooth Low Energy connectivity, complemented by a comprehensive set of I/O ports.
However, before finalizing this article, I was unable to find guidelines or examples for running TensorFlow Lite on the Nicla Vision, which means our test model cannot be deployed on this board. Consequently, the testing approach I planned isn't applicable to Nicla Vision.
Despite this, I've heard many positive reviews about the Nicla Vision since its launch, so I still want to take this opportunity to evaluate its performance, focusing solely on a face detection task.
I've learned that Nicla Vision supports OpenMV—an IDE I've been eager to try, which I previously understood to be a tool for implementing machine vision with Python. This presents an excellent opportunity to test it out. So I plan to use OpenMV to explore the capabilities of the Nicla Vision.
While going through the official Arduino documentation, I came across a tutorial named "Blob Detection." The tutorial describes how to perform classification detection to differentiate objects such as bananas and apples. I intend to apply this algorithm to recognize human faces in photos.
Thus, in this test, I will focus exclusively on the face detection task to gauge the board's performance with this specific task. It's important to note that, since this testing approach differs from the ones used with other boards, the data derived from this test cannot be directly compared with other boards.
Therefore, the test data obtained will be specifically targeted at assessing the board's processing capabilities for this particular face detection model on OpenMV.
Test results of Nicla Vision with Blob detection model through OpenMV
Product: Arduino Nicla Vision
Processor: STM32H747AI16 (ARM® Cortex® M7/M4) 480MHz (M7) + 240MHz (M4)
Power Consumption: 0.59W
Inference Time: 178.89ms
Frame Rate: 5.59FPS
Ease of Use: 6.0
Price: $115.00
How easy it is to run the model with OpenMV
Compared to what I encountered with the Pi, it is so much easier to use OpenMV with the Nicla Vision.
Here are the steps:
1) Environment setup
Download OpenMV, then connect Nicla Vision to the OpenMV IDE, click the "Connect" button in the lower left corner and update to the latest firmware. The firmware update took about two minutes, which was fast.
Then, import the necessary libraries and initialize the camera. After completing these lines of code, I just need to click the run button in the lower left corner, and the image will be displayed in the upper right corner of the OpenMV IDE. The code doesn't even require a waiting time to upload, this greatly improves the efficiency, smooth and straight-forward.
import pyb # Import module for board related functions
import sensor # Import the module for sensor related functions
import image # Import module containing machine vision algorithms
import time # Import module for tracking elapsed time
sensor.reset() # Resets the sensor
sensor.set_pixformat(sensor.RGB565) # Sets the sensor to RGB
sensor.set_framesize(sensor.QVGA) # Sets the resolution to 320x240 px
sensor.set_vflip(True) # Flips the image vertically
sensor.set_hmirror(True) # Mirrors the image horizontally
sensor.skip_frames(time = 2000) # Skip some frames to let the image stabilize
2) Define the LAB value
This is a feature I find particularly impressive in OpenMV.
Blob detection is a image processing technique that identifies and segments regions within an image based on differences such as brightness or color, compared to the surrounding areas. Therefore, to effectively recognize objects in an image, you need to define the LAB (Lightness, A-channel, B-channel) color values that predominantly represent the object you intend to track.
LAB color space is a three-axis color system where 'L' stands for Lightness, and 'A' and 'B' are the color-opponent dimensions. LAB color space is designed to approximate human vision and is device-independent, meaning that the colors are consistent across different devices and lighting conditions.
● L (Lightness): Ranges from 0 to 100, representing a scale from complete black to complete white.
● A (Green-Red axis): Values on this axis represent the color spectrum from green to red, with negative values indicating green and positive values indicating red.
● B (Blue-Yellow axis): This axis represents the color spectrum from blue to yellow, with negative values indicating blue and positive values indicating yellow.
The LAB color space is particularly useful in scenarios where accurate color representation is important, such as in image processing and printing industries, because it is more consistent with how humans perceive colors than other color spaces like RGB (Red, Green, Blue) or CMYK (Cyan, Magenta, Yellow, Key/Black).
OpenMV provides a convenient visual-selection tool that helps in pinpointing the desired color ranges, enabling you to interactively choose the best LAB color values for accurate object tracking.
3)Feed the image to it
You can choose to either load the image from SD card or internal flash, but in this case, I get the image using the snapshot() function and then feed the image to the algorithm using the find_blobs function. Arduino dochas a clear guide on this.
Below is my complete code.
import pyb # Import module for board related functions
import sensor # Import the module for sensor related functions
import image # Import module containing machine vision algorithms
import time # Import module for tracking elapsed time
sensor.reset() # Resets the sensor
sensor.set_pixformat(sensor.RGB565) # Sets the sensor to RGB
sensor.set_framesize(sensor.QVGA) # Sets the resolution to 320x240 px
sensor.set_vflip(True) # Flips the image vertically
sensor.set_hmirror(True) # Mirrors the image horizontally
sensor.skip_frames(time = 2000) # Skip some frames to let the image stabilize
# Define the min/max LAB values we're looking for
thresholdsFace = (29, 71, 4, 127, 3, 127)
clock = time.clock() # Instantiates a clock object
while(True):
clock.tick() # Advances the clock
img = sensor.snapshot() # Takes a snapshot and saves it in memory
# Find blobs with a minimal area of 50x50 = 2500 px
# Overlapping blobs will be merged
blobs = img.find_blobs([thresholdsFace], area_threshold=2500, merge=True)
# Draw blobs
for blob in blobs:
# Draw a rectangle where the blob was found
img.draw_rectangle(blob.rect(), color=(0,255,0))
# Draw a cross in the middle of the blob
img.draw_cross(blob.cx(), blob.cy(), color=(0,255,0))
pyb.delay(50) # Pauses the execution for 50ms
print(clock.fps()) # Prints the framerate to the serial console
The overall experience with OpenMV and setup process of the product is very straightforward and intuitive. Additionally, my understanding of Blob Detection has been significantly enhanced; employing this algorithm for image recognition proves to be highly effective. I no longer have to capture many photos to build a model. It imparts the sense that by isolating numerous color blocks from the background, I can "define" the specific outcome I desire from machine vision recognition.
Quick sum for Nicla Vision
Using OpenMV's Blob Detection for face detection, the inference time is approximately 178.89 milliseconds, and the frame rate is about 5.59 FPS, under a 5.05V power supply, the output current fluctuates between 123.4 milliamps to 128.5 milliamps, with an average power consumption of 0.59 watts. For "ease of use, " I gave it 6 points due to "i still need to code".
Let's check the performance of the mini-computer.
Test results of Raspberry Pi 4B
Product:Raspberry Pi 4B
Processor:Broadcom BCM2711 (Cortex®-A72 ARM V8) 1.5GHz
Power Consumption: 3.79W
Inference Time: 8.83ms
Frame Rate: 113.21FPS
Ease of Use: 1.0
Price: $55
The Raspberry Pi, as a "miniature computer, " has achieved a Frame Rate of 113.21 FPS, means that it can handle video feed and theoretically may be even capable for slow-motion playback according to chatGPT.
Running TFLite model on Raspberry Pi
I rated the ease of use for the Raspberry Pi 4B as only 1 out of 10. The deployment process I experienced was not very smooth. Here are the steps:
1) Installing the Operating System (OS):
The tensorflow guidelinedidn't mention which OS version to use, so I picked one at random and ended up running into a bunch of errors. Wasted a lot of time debugging before I realized it was because of the OS version.
Please install this version of the OS: 2021-05-07-raspios-buster-armhf-full
2)Enabling the Raspberry Pi to run TensorFlow Lite models
- Clone this Git repo onto the Raspberry Pi using the following command:
git https://github.com/limengdu/grove-vision-we2-benchmark.git
- Then use the provided script to install necessary Python packages and download the EfficientDet-Lite model:
cd grove-vision-we2-benchmark/raspberrypi/tensorflowlite_example
sh setup.sh
All I need from the TensorFlow Lite API is the Interpreter class. So instead of installing the large tensorflow package, I use the much smaller tflite_runtime package. The setup scripts above will automatically install the TensorFlow Lite runtime.
3) How to quickly verify a successful installation
To check if you've successfully installed everything, run the following command to execute the model:
python3 detect.py \
--model efficientdet_lite0.tflite
If successful, you should be able to run the model and see the camera feed on the monitor attached to your Raspberry Pi. When you place objects like a coffee mug or a keyboard in front of the camera, you should see bounding boxes drawn around those objects that the model recognizes.
4) Run our test model on Pi
Execute the following command on Pi,
cd ..
python3 detect.py --model swift_yolo_1xb16_300e_coco_300_int8_sha1_2287b951101007d4cd1d09c3da68e53e6f23a071.tflite
then we should be able to observe the following results.
Quick sum for Raspberry Pi 4B
The inference time is approximately 8.83 milliseconds, the frame rate is about 113.21 FPS, under a 5.07V power supply, the output current fluctuates between 750 milliamps to 830 milliamps, with an average power consumption of ~3.81 watts. For "ease of use, " I gave it only 1 point.
The computational power of a CPU is still much greater than that of an MCU, but it also consumes more juice.
In SummaryHaving conducted the side-by-side comparison, we are now in a position to draw a final conclusion.
Grove Vision AI
With a moderate inference time of 389 milliseconds and a frame rate of 2.57 FPS, the Grove Vision AI is a cost-effective option at $25.99. It is suitable for hobbyists and educators who require a balance between performance and affordability for simple vision projects. Its low power consumption of 0.40 watts makes it a good choice for battery-powered applications where efficiency is crucial.
XIAO ESP32S3 Sense
The XIAO ESP32S3 Sense offers a fast inference time of 180 milliseconds and an impressive frame rate of 5.55 FPS, all while maintaining a low price point of $13.99. Its power consumption is slightly higher at 0.45 watts. This device is ideal for makers and DIY enthusiasts who need a more responsive unit for intermediate projects without significantly increasing the cost.
ESP32-S3-EYE Development Board
Matching the XIAO ESP32S3 Sense in inference time and frame rate, the ESP32-S3-EYE differs with a higher cost of $45. The marginal increase in power consumption to 0.46 watts might be justified by additional features or better support. This board is well-suited for developers who require a robust platform for prototyping and developing sophisticated applications.
Grove Vision AI V2
With the quickest inference time of 33 milliseconds and the highest frame rate of 30.30 FPS among the products listed, at a moderate price of $23.89 and an efficient power consumption of 0.35 watts, the Grove Vision AI V2 is the premium choice for professional developers and businesses that need high-performance vision processing in their products.
Raspberry Pi 4B
The Raspberry Pi 4B stands out with the fastest inference time of 8.83 milliseconds and a remarkable frame rate of 113.21 FPS. However, its higher power consumption of 3.79 watts and cost of $55 make it the most powerful yet priciest option. It's best suited for advanced users and professionals who require significant computational power for complex projects and are less constrained by power efficiency or budget.
Nicla Vision
Nicla Vision, priced at $115, offers a decent inference time of 178.89 milliseconds and a frame rate of 5.59 FPS. With its power consumption at 0.59 watts, it is positioned as a specialized option for developers looking for integrated solutions with OpenMV's Blob Detection capabilities. This product is tailored for projects that necessitate specific vision processing features and are not as sensitive to price.
User Group Suitability
- Hobbyists and Educators: Grove Vision AI and XIAO ESP32S3 Sense are excellent for those on a budget, allowing for basic to intermediate project development.
- DIY Enthusiasts: XIAO ESP32S3 Sense and Grove Vision AI V2 offer a good balance between performance and cost.
- Professional Developers: The Grove Vision AI V2 and Raspberry Pi 4B provide high performance for more demanding applications, with the Pi 4B offering the highest specs at a higher cost.
- Business and Industrial Applications: Raspberry Pi 4B and Nicla Vision cater to users requiring high processing power and specialized capabilities, respectively, and are less sensitive to price.
- Integrated Solutions: Nicla Vision is suited for those looking for a product with specific built-in vision processing features like OpenMV's Blob Detection.
Performance and Price Gap
There's a clear performance and price gap between the products. For basic vision tasks, the Grove Vision AI and XIAO ESP32S3 Sense strike a good balance. The Grove Vision AI V2 steps up with higher performance for intermediate applications, while the Raspberry Pi 4B and Nicla Vision are positioned for high-end use cases. Users must weigh the trade-offs between cost, power consumption, and processing capabilities when choosing the right product for their needs.
As we come to a close on this exploration of Vision AI technologies, your insights and perspectives are invaluable to us. We strive for continuous improvement and understand that there is always room for enhancement. If you have suggestions on how we might refine our analysis, or if there are particular aspects of Vision AI that you feel warrant deeper investigation, please share your thoughts. Your feedback not only helps us to improve our content but also enriches the discussion for all readers interested in the cutting-edge world of Vision AI. Thank you for your support and engagement.
Comments