Google Releases Objectron Dataset to Help Improve 3D Object Understanding in Computer Vision
15,000-strong dataset includes videos, bounding boxes, point cloud data, and even a model for recognizing four object types.
Google's AI division has announced the release of the Objectron dataset, a corpus of short video clips designed to capture common objects from various angles — and each coming with augmented reality session data with sparse point-clouds and manually-annotated 3D bounding boxes.
"Understanding objects in 3D remains a challenging task due to the lack of large real-world datasets compared to 2D tasks (e.g., ImageNet, COCO, and Open Images)," explain Google Research software engineers Adel Ahmadyan and Liangkai Zhang. "To empower the research community for continued advancement in 3D object understanding, there is a strong need for the release of object-centric video datasets, which capture more of the 3D structure of an object, while matching the data format used for many vision tasks (i.e., video or camera streams), to aid in the training and benchmarking of machine learning models."
"Today, we are excited to release the Objectron dataset, a collection of short, object-centric video clips capturing a larger set of common objects from different angles. Each video clip is accompanied by AR session metadata that includes camera poses and sparse point-clouds. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images collected from a geo-diverse sample (covering 10 countries across five continents)."
To assist developers in getting started with Objectron, the division is also releasing a 3D object-dection solution for four object classes — shoes, chairs, mugs, and cameras - through Google's MediaPipe framework. Building on an earlier Objectron model, the new solution uses a two-stage framework which uses TensorFlow to find a 2D crop of the target image then estimates a 3D bounding box — offering impressive performance of 83 frames per second on an Adreno 650 mobile GPU.
"By releasing this Objectron dataset," the engineers claim, "we hope to enable the research community to push the limits of 3D object geometry understanding. We also hope to foster new research and applications, such as view synthesis, improved 3D representation, and unsupervised learning."
The dataset and supporting software is now available on the Google Research GitHub repository.