Making Sense of Common Sense in Robots
MIT CSAIL leveraged LLMs to give household robots some common sense, enabling them to adapt to disruptions and complete tasks reliably.
Household service robots have long been a subject of fascination, promising to forever change domestic life by relieving humans of mundane chores. However, despite significant advancements in robotics technology, the realization of a truly versatile, general-purpose robot capable of handling various household chores remains elusive. While there are many reasons for this present situation, one of the primary stumbling blocks on this journey toward the ultimate household assistant is the total lack of common sense found in today's robots.
In order to teach a robot to perform a new task, it is commonly shown examples of humans performing that task. Robots excel at mimicking these actions and executing predefined tasks with precision and accuracy, but they struggle in scenarios that require adaptive reasoning and intuitive problem-solving — qualities synonymous with human common sense. When faced with unexpected situations, such as being bumped or encountering an obstacle, robots often falter, requiring a complete reset and restarting the task from scratch.
Outside of the carefully controlled conditions of a laboratory or manufacturing environment, unexpected occurrences happen all the time. Accordingly, household service robots fare quite poorly in the real world, which is why only a few types of robots, with very narrow scopes of responsibility, are found in homes today. A team at MIT CSAIL has been hard at work trying to change this present state of affairs, however. They have leveraged the knowledge that is contained in Large Language Models (LLMs) to give robots a bit of common sense. Using this knowledge, they can adapt when things are not working out exactly according to plan and still carry out their orders.
Generally speaking, when a robot learns a new task, it is represented as a single, continuous trajectory. The researchers recognized that by breaking that up into a series of subtasks, a robot with some common sense would not need to start from the top when the unexpected occurred. Rather, adjustments could be made on the fly, then only the remaining steps in the process could be carried out.
An algorithm was developed that linked the present state of the robot, including its three-dimensional position in space, with its progress in completing a set of actions that need to be performed, as determined by an LLM. With a clear view of what it is presently doing, and how that relates to past and future steps in the process, the robot is able to stop and make adjustments when it is in some way disturbed. After reasoning out the best way to get back on track, it can then finish the job without skipping a beat.
These methods were tested out on a robotic arm that was programmed to scoop marbles from one bowl to another. This is a simple enough task on the surface, but the researchers made it much more challenging by bumping, pushing, and shoving the robot off course. This would have been enough to throw a traditional planning algorithm for a loop and cause it to attempt the task all over again. But in this case, the robot proved to be very resilient. It handled the interruptions without a problem, and was able to adjust and keep moving toward the goal.
It was noted that some prompting skill is required to get the LLM to suitably represent each state of the robot as an appropriate subtask. Looking ahead, the team hopes to automate this process to simplify the setup and training of the system.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.