An Early Quitting Time Means Big Efficiency Gains for TinyML Models Running on Smart Sensors
Researchers detail an "early-exit" system, which cuts inference once a minimum confidence level is reached and drop power usage 11 percent.
Researchers from Italy's Politecnico di Milano, working with the EssilorLuxottica Smart Eyewear Lab, have come up with a new approach to running tiny machine learning (tinyML) models on-sensor — and say it can deliver an 11 percent reduction in power draw without any adverse impact on the accuracy of its results.
"Despite their state-of-the-art performance in many tasks, none of the current solutions in the literature aims to optimize the implementation of Convolutional Neural Networks (CNNs) operating directly into sensors," the research team claims. "In this paper, we introduce for the first time in the literature the optimized design and implementation of Depth-First CNNs operating on the Intelligent Sensor Processing Unit (ISPU) within an Inertial Measurement Unit (IMU) by STMicroelectronics."
The focus of the team's efforts is STMicro's LSM6DSO16IS, a six-axis inertial measurement unit with three-axis accelerometer and three-axis gyroscope — and, more importantly, STMicro's ISPU, an "Intelligent Sensor Processing Unit" that allows the user to run "signal processing and AI algorithms" on the sensor itself rather than farming the job out to an external processor.
The smart IMU was connected to an STM32 Nucleo-64 development board's STM32F411RE microcontroller for the purposes of the team's experimentation, though the microcontroller was programmed to remain asleep until interrupted. The team's approach doesn't rely wholly on the sensor's ISPU, though, but instead uses a convolutional neural network model that the researchers describe as being partitioned between the ISPU and the host microcontroller — using an "early-exit" mechanism to have the ISPU cease computation once a minimum confidence level has been reached.
During testing, the team found the new approach to have measurable benefits: the setup required only 4.8mA, on average, to perform the model's inference — a reduction of 11 percent compared to the standard microcontroller-focused approach. Despite this, the researchers report, the accuracy was equivalent between the two approaches.
The team's work is detailed in a preprint on Cornell's arXiv server.