Arduinos That See, Hear, Think: A Tour of the Avnet Arduino Pro Edge AI/ML Vision and Speech Kit
Edge AI bundle includes the Arduino Portenta H7 with Vision Shield, Nicla Vision, and more, plus the software to get you started.
Artificial intelligence (AI) is a hot topic at the moment, and while there's been a lot of focus on high-powered cloud-based services driven by data centers filled with graphics cards and dedicated acceleration devices, a bigger revolution is happening at the edge — where resource-constrained microcontrollers are increasingly handling machine learning (ML) and computer vision (CV) tasks on-device, avoiding the environmental, privacy, and latency concerns that can dog centralized services.
With that in mind, Avnet has put together a bundle designed for those looking to experiment with computer vision and speech recognition at the edge: the Arduino Pro Edge AI/ML Vision and Speech Kit. Comprised of an Arduino Portenta H7 with Vision Shield and an Arduino Nicla Vision with case, it provides everything you need for low-power edge AI projects — including licenses for a clever on-device speech recognition system targeting home assistants and other voice-activated systems.
Let's dive in to see how the bundle stacks up.
Hardware
Arduino Portenta H7
- Microcontroller: STMicroelectronics STM32H747XI (dual-core, Arm Cortex-M7 480MHz and M4 240MHz)
- Memory: 1MB on-chip SRAM, 2MB on-chip flash, 8MB SDRAM, 16MB QSPI flash, microSD card (via optional expansion board)
- GPU: Chrom-ART 2D accelerator
- Radio: Murata 1DX IEEE 802.11b/g/n Wi-Fi (65Mb/s) and Bluetooth Low Energy 5 (4.2 using Arduino software stack)
- Secure element: NXP Semiconductors SE050C2, Microchip ATECC608
- Display: DisplayPort over USB Type-C, MIPI Display Serial Interface (DSI) and MIPI D-PHY
- Ethernet: Fast Ethernet (via expansion port)
- Power supply: USB Type-C, 5V VIN, optional 3.7V Li-po (with built-in charger)
- GPIO: 84× digital (most available only on dual 80-pin high-density connectors), 10× PWM, 8× analog
- Size: 66×25mm (around 2.6×0.98")
Arduino Portenta Vision Shield
- Camera: Himax Technologies HM-01B0 low-power sensor (320×320, QVGA window, monochrome)
- Microphones: 2× STMicroelectronics MP34DT06JTR MEMS microphones
- Memory: MicroSD card slot (driven by Portenta H7)
- Ethernet: RJ45 for Fast Ethernet (driven by Portenta H7 PHY)
- Size: 66×25mm (around 2.6×0.98")
Arduino Nicla Vision
- Microcontroller: STMicroelectronics STM32H747AII6 (dual-core, Arm Cortex-M7 480MHz and M4 240MHz)
- Memory: 1MB on-chip SRAM, 2MB on-chip flash, 16MB QSPI flash
- Camera: GalaxyCore GC2145 sensor (1,600×1,200, color)
- Microphones: 1× STMicroelectronics MP34DT06JTR MEMS microphone
- Sensors: STMicroelectronics LSM6DSOX six-axis inertial measurement unit (IMU) with machine-learning core, VL53L1CBV0FY/1 time-of-flight (ToF) distance sensor
- Radio: Murata 1DX IEEE 802.11b/g/n Wi-Fi (65Mb/s) and Bluetooth Low Energy 5 (4.2 using Arduino software stack)
- Secure element: NXP Semiconductors SE050C2
- Power supply: Micro-USB Type-B, 5V VIN, optional 3.7V Li-po (with built-in charger)
- GPIO: 10× digital (two shared with I2C, four with SPI), 12× PWM, 2× analog
- Size: 23×23mm (around 0.9×0.9")
Extras
- Arduino Nicla Vision Enclosure
- Arduino Cloud for Business Voucher, 3 Months
- 2× Arduino Speech Recognition Engine for Arm Cortex-M4/M7 Vouchers
- 1 hour remote consultation for product selection
- 2 hours remote technical support
The two main boards in bundle represent Arduino's two approaches to edge AI — and while there's a lot of commonality between the two, there's a lot of differences there too. The most obvious is the size: the Portenta H7 is the bigger of the two, bringing out a lot more general-purpose input/output (GPIO) connectivity from its STMicro STM32 heart and offering no fewer than three options to drive an external display.
The Arduino Nicla Vision, by contrast, is a member of Arduino's ultra-compact Nicla family, offering considerably more limited GPIO connectivity than the larger Portenta H7 — yet boasting more on-device peripherals including a time-of-flight distance sensor, which runs independently of the camera, and a six-axis inertial measurement unit.
The Arduino Nicla Vision is also the only one of the two to include a camera, located to the top of the board. That doesn't mean the Portenta H7 is no good for computer vision at the edge, though: the bundle also includes the Portenta Vision Shield, a second board, which snaps into to the high-density connectors on the Portenta H7 and adds a camera sensor, two microphones, wired Fast Ethernet port, and microSD card slot.
There's a big difference in the camera sensors used between the two boards, however. The Nicla Vision gets a full-color two-megapixel GalaxyCore GC2145, while the Portenta Vision Shield uses a monochrome 0.1 megapixel Himax Technologies HM-01B0. That might seem like Portenta projects get the short straw, but the Himax camera brings a couple of advantages to the table: it's lower-power than the GalaxyCore sensor, and it includes an integrated motion detection circuit.
Beyond that, you're back to similarities again: both boards are based around variants of the STMicro STM32H747 microcontroller, which has a heterogeneous dual-core architecture with one high-performance Arm Cortex-M7 core running at 480MHz and one lower-power Cortex-M4 core running at 240MHz, both include NXP's SE050C2 secure element, both offer a Murata 1DX radio for Wi-Fi and Bluetooth Low Energy (BLE) connectivity, and both have connectors for optional Lithium-polymer batteries — including charging circuits and fuel gauges, though no batteries are included in the bundle.
Finally, the bundle includes a few extras. The 3D-printed Nicla Vision Enclosure protects the Nicla Vision and offers mounting points for tripods, clamps, and more — though it's primarily designed for use with a battery, which holds the Nicla Vision in place. The other bonuses are software-based: a voucher for a three-month subscription to the Arduino Cloud Enterprise Plan, and two for the Arduino Speech Recognition Engine — about which more later.
Getting started
For those new to the platform, Arduino has put together a is a getting-started guide that pulls in 24 tutorials, or "experiences," on topics ranging from Bluetooth Low Energy connectivity and secure boot to on-device computer vision and proximity detection.
Not every tutorial is applicable to both boards: a guide to using the Arduino Speech Recognition Engine, for example, can be used with both the Portenta H7 with Vision Shield and the Nicla Vision, but a guide to image classification using Edge Impulse is applicable only the Nicla Vision.
Here the benefit of the full Arduino Pro Edge AI/ML Vision and Speech Kit becomes clear: it contains almost everything, bar USB cables, that you will need to work through every tutorial in the collection. It also allows you to experiment further beyond said tutorials: blending motion-tracking with camera input on the Nicla Vision, for example, or building on-device visualizations that use the DisplayPort over USB Type-C connectivity of the Portenta H7.
Sounds good
You can tackle the projects in any order, and given the kit's focus it makes sense to concentrate on the machine learning and edge AI aspects — so in our experiments we began with speech recognition. This makes use of the Arduino Speech Recognition Engine, developed in partnership with Cyberon. On the face of it, it's a fairly standard speech recognition system: a wake-word triggers the device to listen out for one of a number of commands, which can then be used to execute particular code.
Where most embedded speech recognition engines only run the wake-word detection locally and send the rest of the audio to a remote server for transcription, though, the Arduino Speech Recognition Engine works entirely on-device. Better still, it's entirely pre-trained: you don't have to bring your own dataset of recordings, and it can recognize speech in a variety of accents and over 40 languages. In use, it's smooth and flexible — despite using some unusual words in both triggers and commands, we weren't able to trip the engine up.
The engine does have a cost attached: $9. That's not much, and you get two licenses included in the cost of the bundle — though be aware that each license is tied to one specific board, and a license is used up when you compile your program. That latter part needs some clarification: a paid-for Arduino Speech Recognition Engine license is a one-use consumable, which expires when you compile and flash your program to your board. If you need to change that program in the future, whether you're adding new commands or simply fixing a bug, you'll have to pay for another license — and if you want to flash it to a new board, you'll also need a new license.
For rapid iteration and ease of testing, there's a free demo license. Designed for use during development, the demo license has all the same features as the paid-for license — but there's a 20-second pause between each recognition and the sketch stops running after 50 uses, requiring the board to be reset for the next 50 uses. Both the free and paid-for licenses offer support for a single dataset, one wake word or phrase, and up to 20 independent command phrases; for projects exceeding this, a "Pro Licence" is available — with pricing and more details available from Arduino upon request.
Let there be sight
While the speech recognition tutorial uses the Arduino IDE, the computer vision tutorials switch to a different development environment: OpenMV, a MicroPython-based firmware and integrated development environment designed specifically for machine vision. Making the move is as easy as installing OpenMV and allowing it to replace the firmware on your Arduino Portenta H7 or Nicla Vision — and you can go back to the stock firmware and the Arduino IDE at any time.
OpenMV comes with a selection of example projects, one of which mimics the capabilities of the Arduino Speech Recognition Engine at no cost — though without as broad a support for different languages and requiring the use of pre-configured wake and command phrases. Others, as you might expect, focus on the use of the camera with a selection of on-device computer vision models.
The OpenMV IDE is specifically built for computer vision projects, and it shows: to the right of a central code area is a live-view frame buffer, showing what the camera sees and the overlay of any running vision model, and a real-time histogram. These models range from edge-detection, running on either board at around 29 frames per second, to TensorFlow-driven face detection, which runs at a little under 10 frames per second — a pretty impressive figure for a low-power microcontroller. OpenMV's blob tracking model, meanwhile, ran the fastest at nearly 58 frames per second.
For those who want to train their own models, for speech or vision, both boards are fully compatible with Edge Impulse Studio, with Arduino offering a guide to creating a MobileNetV2-based image classification model for the Nicla Vision.
Conclusion
For anyone looking to experiment with low-power on-device artificial intelligence at the edge, there's a lot to recommend the Arduino Pro Edge AI/ML Vision and Speech Kit. Having almost everything you could need in one package is a real time-saver, though the lack of USB cables is an odd omission — and you'll need two, as the Arduino Portenta H7 and Nicla Vision use USB Type-C and micro-USB Type-B connections respectively.
Arduino's documentation is expansive enough to get you started, and provides a jumping-off point for more complex projects. Compatibility with third-party platforms and frameworks, including Edge Impulse Studio, makes it easy to grow beyond the tools Arduino itself provides, and OpenMV compatibility is a real boon for anyone with MicroPython experience.
The Arduino Pro Edge AI/ML Vision and Speech Kit is available to order from Avnet priced at $311.35; volume discounts start at 10 units.