Robots Get X-Ray Vision
You don't want to play hide-and-seek with a robot powered by THOR, which uses human-inspired reasoning to locate partially hidden objects.
Object recognition is one of the most crucial ingredients in robot visual perception, playing a pivotal role in enabling robots to interact with their environment. This capability assists robots in identifying and categorizing objects in their surroundings, much like how humans recognize familiar objects. Advancing this technology is of immense importance for a variety of applications across industries, from manufacturing and logistics to healthcare and household assistance.
A mature object recognition algorithm facilitates common tasks such as navigation, manipulation, and interaction. By accurately identifying objects, robots can make informed decisions and execute tasks more efficiently. For instance, in industrial settings, robots equipped with robust object recognition systems can precisely locate and grasp items on assembly lines, streamlining production processes and enhancing productivity.
Technological advancements have significantly bolstered object recognition capabilities in recent years. Machine learning algorithms, particularly deep learning models, have revolutionized this field by enabling robots to learn from vast amounts of data, thereby improving their accuracy and robustness in recognizing objects across diverse contexts. Convolutional neural networks have emerged as a particularly powerful tool in object recognition tasks, allowing robots to detect and classify objects with remarkable accuracy.
Despite these advancements many challenges persist, particularly in scenarios involving partially occluded objects. Existing systems often struggle to recognize objects when they are only partially visible, a task that humans can typically perform effortlessly. Research in this area has recently been given a boost by a team at the University of Washington. They have developed a system called Topological features of point cloud slices for Human-inspired Object Recognition (THOR) that reconstructs the three-dimensional shape of a partially-visible object to determine what it is most likely to be.
THOR was modeled on a reasoning mechanism called object unity that humans use to recognize occluded objects. Using this mechanism, people mentally rotate objects in their mind to match representations stored in their memory, then associate the visible portion of the object with the full, unoccluded object that they have previously seen. To simulate that process, THOR first creates a three-dimensional representation of an object, in the form of a point cloud, using an image from a depth camera as the input. The view of each point cloud is then normalized before a machine learning classifier is leveraged to predict the most likely object that is present.
An interesting feature of this system is that it does not need to be trained on massive datasets of objects in cluttered environments. Such a dataset would be costly and time-consuming to collect, and it would be very challenging to produce a well-generalized model with such an approach. THOR only requires images of the unoccluded objects themselves for its training process. This cuts down on complexity and expense, and also enables THOR to work in a wide variety of situations.
In the future, the researchers envision their technique being used to power robots in manufacturing and warehouse environments, and also household service robots. But there is still more work to be done before THOR reaches its full potential. As it presently stands, the system struggles a bit when objects do not have a distinctive, regular shape with minimal variations in size. Moreover, THOR only considers the shape of objects, but does not take other important cues, like color or text labels, into consideration. The team is now hard at work to address these, and other, issues.