Do as I Do, Not as I Say
A new idea in machine learning teaches robots how to interact with everyday objects just by watching humans.
You may have heard it said that imitation is the sincerest form of flattery. Now it turns out that imitation may also be one of the best ways to program a robot. Although there has been a great deal of advancement in designing robots that are capable of manipulating real world objects in recent times, most applications have been limited to computer simulations, and smaller, table-top experiments in lab environments. These experiments are fine and all, but they are not exactly producing the Jetsons-esque robots that can help with any manner of household chores that we have long been waiting for. Why exactly is this the case? Why have these experiments not panned out in real world scenarios?
Researchers at Carnegie Mellon University have been pondering this state of affairs, and believe that teaching robots to learn in-the-wild has been hindered by the use of reinforcement learning and imitation learning techniques. While these methods have been fantastically successful when applied to appropriate problems, there are issues with using them to train robots in this way. Reinforcement learning tends to require a large amount of training data, and when applied to real world scenarios that lack the structured rewards they need to learn, they can fall flat. Imitation learning, on the other hand, requires a large amount of kinesthetic or teleoperated demonstrations for each task they are to learn — not exactly a plug and play solution.
The team proposed a potential solution to these difficulties in which robots visually observe humans in their daily activities. They believe that by simply watching how humans interact with the world, robots can learn how to imitate them. This method provides a rich data source without having to explicitly provide training data for a model. It also allows robots to learn novel, and highly optimized, ways to complete tasks in a variety of environments. After sufficient data has been passively collected, the robot uses that information to guide its own exploration and learning.
The method begins by analyzing video, frame-by-frame, of a human interacting with objects. A recurrent convolutional neural network is used to estimate the position of the person’s hands. This includes estimates of wrist rotation and grip force applied to objects, which are critical for a robot to reproduce the feat. Next, a second neural network is used to determine what object is being interacted with, and what specific points of contact the hand makes with that object. At this point, all of the captured interaction trajectories are translated into robot movement trajectories.
Through observation alone, the robot will have a pretty good, but rough idea of how to go about interacting with the object of interest. However, due to differences between human and robot arms and calibration errors, more information is needed for smooth, accurate interactions. This comes through exploration, and the researchers developed a policy to guide that exploration. This policy is designed above all to allow for only safe interactions, but also to allow for efficient interactions that are not overly restrictive.
To test out their methods, the team set up an experiment using the Stretch Robot with a 6 degree of freedom arm and gripper. Humans then interacted with a number of everyday objects, like drawers, dishwashers, fridges, and doors for the robot to observe. These results were compared with several baseline method results, and found to compare very favorably. The team’s technique had 92% and 83% success rates when manipulating doors and drawers, respectively. In contrast, the best baseline methods only achieved 30% and 53% success rates.
With some fresh thinking in robot learning, the team has developed some novel ideas that may allow robots to interact more naturally in real world scenarios. There is more work yet to be done, but this is something you will want to keep your eyes on.