Google's Roboticists Let Robots Write Their Own Code, Vastly Boosting Their Flexibility
Using a code-writing language model similar to GitHub's Co-pilot, these robots can take existing code and rewrite it for new tasks.
A team of researchers from Google's robotics arm is looking to create robots capable of writing their own code, instead of relying on others to tell them what to do and how to do it β coming up with a way to use language models to automatically generate Python code for a range of tasks.
"We developed Code as Policies (CaP), a robot-centric formulation of language model-generated programs executed on physical systems," the team explains. "CaP extends our prior work, PaLM-SayCan, by enabling language models to complete even more complex robotic tasks with the full expression of general-purpose Python code. With CaP, we propose using language models to directly write robot code through few-shot prompting."
The basis of the project is a language model, similar to GitHub's Co-pilot, which can autonomously write program code without user interaction. In one example, a piece of code written by hand that tells a robot to move backwards if it sees an orange is automatically rewritten to move to the right until it sees an apple β altering the robot's behavior without the need for a human to rewrite any code.
"Our experiments demonstrate that outputting code led to improved generalization and task performance over directly learning robot tasks and outputting natural language actions," the team explains. "CaP allows a single system to perform a variety of complex and varied robotic tasks without task-specific training."
In testing, CaP has proven highly successful β but does come with some limitations, including an inability for visual-language models to describe trajectories in human-accessible terms and the fact that only a small number of named primitive parameters can be adjusted. A bigger issue is that the system, as it stands today, makes no attempt to figure out if the generated code is actually useful before attempting to execute it.
"CaPs also struggle to interpret instructions that are significantly more complex or operate at a different abstraction level than the few-shot examples provided to the language model prompts," the team adds. "Thus, for example, in the tabletop domain, it would be difficult for our specific instantiation of CaPs to 'build a house with the blocks' since there are no examples of building complex 3D structures.
"These limitations point to avenues for future work, including extending visual language models to describe low-level robot behaviors (e.g., trajectories) or combining CaPs with exploration algorithms that can autonomously add to the set of control primitives."
More information and demo videos are available on the project website, while a preprint of the paper introducing CaP is available on Cornell's arXiv server. Colab notebooks demonstrating the approach, meanwhile, are available on GitHub under the permissive Apache 2.0 license.