Getting tinyML Ready for Prime Time
Wake Vision, a new dataset for person detection, focuses on quality to squeeze the most performance possible out of tinyML models.
Moving artificial intelligence (AI) applications from the cloud to our pockets is a difficult job, and progress will need to be made on several fronts to get us over the finish line. Portable devices naturally have far less processing power and memory available to them than a massive cluster of servers equipped with beefy GPUs in the cloud. Shrinking AI models down to size so that they can run on these platforms requires heavy optimization, which is an area where a lot of progress has been made in recent years.
But slicing and dicing these models until they have a bare minimum number of parameters has some consequences. Traditional overparameterized models can deal quite well with some poor-quality data in their training dataset. Tiny models, on the other hand, do not have any parameters to spare, leaving them unable to adapt to errors. So throwing the text of the whole internet at a tiny model, as is frequently done with large language models, is not going to fly. These models need very high-quality datasets that are targeted at achieving specific purposes.
Unfortunately, datasets of this sort can be very hard to come by. But for those focused on person detection β a very common task in tinyML β a new option has just been released by a team led by researchers at Harvard University and Useful Sensors. They call the dataset Wake Vision, and it contains nearly 100 times more images than the previous state-of-the-art dataset for person detection in tinyML.
Wake Vision comes in two varieties: Large and Quality. Wake Vision Large focuses on providing as big and diverse of a dataset as possible, whereas Wake Vision Quality is smaller, but the quality of the labels in the dataset is the most important factor. By supplying these options, developers can choose the most appropriate option for their use case. Or in some cases, both can be used β experiments have shown that the best results are often obtained by pretraining with Large, then fine-tuning with Quality. Up to a 6.6 percent increase in accuracy was observed when compared with existing datasets.
Assessing the performance of a newly trained model has also been made easier by Wake Vision. Fine-grained benchmarks have been made available for testing real-world applications. These benchmarks can help to assess the influence of factors like lighting changes and the distance of a subject from the camera.
The dataset is freely available under a permissive CC-BY 4.0 license. It can be downloaded via services like TensorFlow Datasets, Hugging Face Datasets, and Edge AI Labs. Once you do get down to the business of building a model with Wake Vision, be sure to check out the leaderboard to see if your model is the best on the block!