TensorFlow 3D Brings Three-Dimensional Scene Understanding and Object Detection to TensorFlow
The library offers data sets, conversion tools, and fully-working pipelines for handling 3D data in TensorFlow.
Researchers from Google's AI division have released a new library dubbed TensorFlow 3D, designed to bring 3D-aware deep learning capabilities into TensorFlow for scene understanding, object detection, and more.
"The growing ubiquity of 3D sensors (e.g., LiDAR, depth sensing cameras, and RADAR) over the last few years has created a need for scene understanding technology that can process the data these devices capture," the researchers explain. "Such technology can enable machine learning (ML) systems that use these sensors, like autonomous cars and robots, to navigate and operate in the real world, and can create an improved augmented reality experience on mobile devices."
"In order to further improve 3D scene understanding and reduce barriers to entry for interested researchers, we are releasing TensorFlow 3D (TF 3D), a highly modular and efficient library that is designed to bring 3D deep learning capabilities into TensorFlow. TF 3D provides a set of popular operations, loss functions, data processing tools, models and metrics that enables the broader research community to develop, train and deploy state-of-the-art 3D scene understanding models."
TensorFlow 3D includes training and evaluation pipelines for 3D semantic segmentation, object detection, and instance segmentation, though the researchers indicate it has potential beyond these three applications β including, potentially, 3D object shape prediction, point cloud registration and densification, and more. The pipelines are joined by a unified dataset specification and configuration, with support for the Waymo Open, ScanNet, and Rio datasets and tools for converting other datasets into a supported format.
TensorFlow 3D has been released under the Apache License 2.0 on the Google Research GitHub, while two supporting papers have been published so far: DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes; and An LSTM Approach to Temporal 3D Object Detection in LiDAR Clouds.