So Simple a Robot Can Do It
PRoC3S blends LLMs, vision models, and simulations to help robots turn simple commands into safe, efficient plans for complex tasks.
Software engineers tend to be very detail-oriented people. This is more a matter of necessity than a mere coincidence — source code must be written to very explicitly define the operation of software. Add this value to variable x, call function y, loop over these instructions ten times, and so on. Get a few hundred or thousand lines of that sort together (and debugged!), and the magic starts to happen. But not everyone wants to put on their software engineering hat every time they need to tell a machine what to do.
In-home service robots — the kind that can do our chores for us — are still a way off. But when they do emerge from research labs, getting them to do what we want them to do could be a big challenge. We might want a robot to fold the laundry for us, for example. That may seem like a straightforward enough command, but actually making that happen could require dozens of subtasks (e.g., locate an article of clothing, identify its type, grasp it, etc.), with each subtask requiring thousands of lines of source code to implement.
Recently, large language models (LLMs) have been leveraged to translate high-level requests into a series of subtasks that are detailed enough for robots to carry out. However, LLMs are not aware of the robot’s physical capabilities, and they do not understand what is in the robot’s environment, either. Without this knowledge, the plan of action created by the model is likely to fail.
To address these issues, researchers at MIT’s CSAIL, have designed a system that they call Planning for Robots via Code for Continuous Constraint Satisfaction (PRoC3S). It was designed to enable robots to perform open-ended tasks in dynamic environments (like our homes) by integrating LLMs with physical constraints and vision-based modeling. This approach can bring awareness of a robot's physical capabilities, such as its reach, and also allow for navigation and obstacle avoidance.
PRoC3S combines the strengths of LLMs for high-level planning with simulations to validate the feasibility of the robot's actions. The process begins with an LLM generating a plan for a given task, such as cleaning or organizing objects. This plan is then tested in a realistic digital simulation created using vision models, which capture the robot's physical environment and constraints. If the plan fails in the simulation, the LLM iteratively refines it until a viable solution is found.
In a series of experiments, the PRoC3S system has demonstrated success in both digital simulations and real-world applications. For example, it enabled a robotic arm to draw shapes, arrange blocks, and perform object placement tasks with a high level of accuracy. The system’s ability to combine textual reasoning with real-world constraints outperformed other popular approaches, such as LLM3 and Code as Policies, by consistently generating safer and more practical plans.
The team envisions future applications where PRoC3S could enable household robots to handle complex chores, like preparing breakfast or delivering snacks, by simulating and refining their actions before execution. The next steps for the researchers include enhancing the system’s physics simulations and expanding its capabilities to mobile robots for tasks like walking and exploring their surroundings, paving the way for versatile, reliable robotic assistance in everyday life.