No More Chores?

Google DeepMind has unveiled a suite of tools designed to improve understanding, efficiency, and generalization in personal helper robots.

Nick Bild
11 months agoRobotics
A robot learning to efficiently perform a wide range of tasks (📷: Google DeepMind)

The concept of personal helper robots designed to assist with various tasks around the home has long been a futuristic vision. These robots would serve as versatile companions capable of responding to voice commands and performing household chores. The potential benefits of such robots are numerous, offering increased efficiency, time savings, and enhanced quality of life for users. With the ability to seamlessly integrate into daily routines, personal helper robots could cater to a wide range of needs, from cooking and cleaning to organizing schedules and providing entertainment.

However, despite the allure of this idea, several challenges have impeded the realization of such robots. Firstly, developing sophisticated natural language processing capabilities that allow these robots to understand and respond appropriately to diverse verbal commands poses a considerable technical hurdle. Contextual understanding, ambiguity resolution, and adapting to individual user preferences are complex linguistic challenges that require additional work to resolve.

Another major obstacle is the need for these robots to be adaptable and versatile in navigating the unpredictable and varied environments of homes. Overcoming obstacles such as stairs, uneven surfaces, and tight spaces requires advanced sensor technologies, robust hardware, and sophisticated algorithms for obstacle avoidance and path planning. Further, these complex algorithms require substantial computational resources, which can increase the costs of systems that implement them to the point that they are out of reach for most people.

Researchers on the Google DeepMind Robotics Team have long been working toward the goal of developing a practical, general-purpose personal helper robot. Last summer, they announced the development of a vision-language-action model called Robotic Transformer 2 (RT-2) that was a big step in the direction of building a helper robot that has an understanding of how to interact with the world around it. Now, they have released a trio of new tools — AutoRT, SARA-RT, and RT-Trajectory — that build on RT-2 to help solve some of the biggest problems that still exist in the field.

The first piece of the puzzle, AutoRT, leverages large foundation models to help robots better understand the nuances of requests made by humans, and how to translate them into achievable goals. It does this by combining a foundation model, such as a large language model or a visual language model, with a robot control model, like RT-2. Using this combination, robots can then be deployed to carry out a wide range of tasks in many different settings. During this time, a diverse dataset is collected that can be utilized to train other models to perform many tasks.

The purpose of SARA-RT is to make Robotics Transformer models more efficient, such that they can run faster, consume less energy, and execute on less expensive hardware platforms. SARA-RT achieves this goal by using a special type of fine-tuning, called “up-training,” that translates quadratic complexity within a model into linear complexity, which greatly reduces the computational resource requirements. It was found that models leveraging this technology executed 14% faster, and surprisingly, they were also 10.6% more accurate on average.

Last, but certainly not least, is RT-Trajectory. This tool helps robots to generalize by taking videos of training data performed by humans, and annotating them with outlines that describe robot motions. This gives the control algorithms a clear understanding of the actions that they need to take to mimic the task being demonstrated. With the assistance of RT-Trajectory, task success rates were more than doubled, reaching a very respectable 63% on average.

It is the team’s hope that this suite of tools will help developers to build more capable and helpful robots in the future.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles