Bird@Edge Turns Birdsong Into Species Recognition On-Device, Using an NVIDIA Jetson Nano

Central Jetson Nano acts as a host for a number of Espressif ESP32-powered remote microphones, all communicating over Wi-Fi.

Researchers from the University of Marburg have developed an interconnected system that aims to recognize different bird species from sound alone — using a machine learning model running on an NVIDIA Jetson Nano and being fed audio data from Espressif ESP32-based remote microphone modules.

"Our experiments show that our deep neural network outperforms the state-of-the-art BirdNET neural network on several data sets," the research team writes of its work, "and achieves a recognition quality of up to 95.2 percent mean average precision on soundscape recordings in the Marburg Open Forest, a research and teaching forest of the University of Marburg, Germany." That's impressive enough, but doubly so considering the entire system runs on-device at the edge — hence its name, Bird@Edge.

The Bird@Edge system uses an NVIDIA Jetson Nano and multiple Espressif ESP32-based microphones to map bird species by song. (📹: Netys-2022)

Brought to our attention by NVIDIA, Bird@Edge is built from two distinct devices: remote microphone sensor nodes and a central processing system. The nodes are built from Espressif ESP32 microcontrollers connected to a Knowles SPH0645LM4H microphone and powered by USB power banks or a single 18650 lithium-ion cell. The central processor is an NVIDIA Jetson Nano, connected to a USB Wi-Fi dongle to create a wireless network for the sensor nodes and to feed recognition activity into a database used by a web-based user interface.

The reason for picking the NVIDIA Jetson Nano over something like a Raspberry Pi is simple: its suitability for on-device machine learning, boasting a powerful graphics processor which can execute the team's TensorRT model. This model is based on the EfficientNet-B3 architecture, which recognizes the audio based on Mel spectrogram visualizations — a neat trick to apply a machine learning system which is designed to work on visual data to audio instead.

In testing in the Marburg Open Forest, a research and teaching forest maintained by the University of Marburg, the team found that Bird@Edge achieved a recognition quality of 95.2 percent mean average precision while drawing just 3.18W for the NVIDIA Jetson Nano and an additional 0.5W per each microphone module.

A copy of the team's paper is available as an open-access PDF download, while the project's software and firmware sources have been published to GitHub under an unspecified open source license.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles