Let's Talk Tools
Incorporating natural language descriptions of tools into robot control algorithms helps them to use previously unseen tools efficiently.
With the many successful applications of machine learning that have been realized in recent years, a number of researchers have now turned their attention to creating algorithms that teach robots how to use tools. The appeal of this work is clear — just as humans can greatly improve their efficiency in all manner of tasks by using the right tools, so can robots. But this is a very challenging problem to solve, and has yielded only mediocre results thus far when it comes to developing a generalized skill set around the use of tools.
For humans, building this knowledge is relatively simple. After one learns to use a few tools, they can start to recognize common features that help them to understand how to use new tools that they have not previously worked with. But for robots, it is not so straightforward. Typical control algorithms need to be trained for very specific scenarios — using only certain tools and for predefined purposes. Worse yet, for each new tool that needs to be supported, large datasets need to be laboriously collected, and the model needs to be retrained and validated again.
These are not methods that scale well. Teaching a robot to use a few tools is achievable, but teaching a robot how to handle dozens of tools, and to learn to use a tool it has not previously seen, is not very practical. A team of researchers at Princeton University has taken a novel approach to try and solve this problem. Rather than teaching a model how to use each tool individually with specific examples, they have devised a method that incorporates knowledge extracted from natural language descriptions of tools to help robots better grasp the commonalities that exist between different tools in a more human-like way.
Working under the hypothesis that language information might help a robot learn to use tools more quickly, they leveraged OpenAI’s GPT-3 large language model to get detailed descriptions of tools. Questions were posed to the GPT-3 model in the form of: “describe the [feature] of [tool] in a detailed and scientific response.” Not only does this approach give a very detailed description of each item by leaning on a model that was trained with a massive dataset collected from the internet, but it also sidesteps the issues that come with collecting the data manually.
From these detailed descriptions, the team extracted information about what a tool looks like, and how it can be used. This was paired with visual data from a camera, which can determine what physical features a new, previously unseen, tool has. These can be matched up with a known tool that shares common physical features, after which, knowledge about how the known tool can be used can be leveraged to fine-tune robot control algorithms to make the best use of the new tool. This has the effect of helping a robot to learn much more quickly, as they can start with a rough understanding of the probable functionalities of a new tool.
To prove the concept, the researchers created a simulated environment with the PyBullet simulator, in which a 7-DOF Franka Panda robot arm was tasked with pushing, lifting, sweeping, or hammering using 36 different tools (the model was trained on 27, and 9 were unknown), running the gamut from axes to squeegees. The experiments were all conducted both with, and without, the additional information afforded by the language model. By and large, it was found that adding the additional natural language-based information led to significantly better performance of the simulated robot.
In one case, the control algorithm learned to grab the long end of a crowbar to sweep a bottle across a table, which made for a steady grip and also helped to constrain the bottle with the crowbar’s claw. Without the language information, the robot grabbed the bent end of the crowbar, which gave it a poor grasp and was less efficient in moving the bottle. The additional information did not always help, however. The robot failed to optimally use paint rollers and hammers even when leveraging the language information. In any case, this initial work is promising, and with future work it may bring us closer to the goal of a general purpose robot.