We Can Sort Things Out
After gathering data from a variety of sources, these reinforcement learning-based robots were able to sort recyclables in a real office.
Reinforcement learning is a machine learning technique that involves teaching machines to learn from their own experiences by rewarding positive outcomes and penalizing negative ones. This technique has shown tremendous promise in the field of robotics, with the potential to revolutionize the way robots are designed and used in everyday life.
One of the most significant advantages of reinforcement learning is its ability to adapt to changing environments and improve its performance through practice. This means that robots could be trained to navigate complex and dynamic environments such as factories, hospitals, and even homes. Reinforcement learning algorithms could enable robots to learn how to perform tasks efficiently and effectively, even in situations where the task or environment is constantly changing.
Despite these promising advantages, we rarely see reinforcement learning-enabled robots in everyday settings. This is due, in large part, to the fact that it has proven to be very difficult to develop an algorithm that can deal with the tremendous complexity and diversity seen in the real world.
In theory, these are exactly the types of situations where reinforcement learning should excel, however. So, a team of engineers at Google Research set out to design a multipronged approach that can leverage the unique characteristics of reinforcement learning, while also eliminating many of the pain points that have plagued past efforts.
Using a fleet of 23 mobile manipulation robots, a system was designed to sort waste and recycling at waste stations throughout the Google office buildings. The robots were tasked with roaming from station to station, and at each one they would sort objects between the recycling, compost, and trash bins to ensure that everything was in the correct location.
From the perspective of a human, this may sound like a simple task. But for a robot, it is a very complex problem to solve. The possible variety of objects that could be in the bins is virtually endless, and each of those items must be accurately sorted. Google Research’s solution to building a capable robot involved leveraging data from four sources to train the model.
To get a robot up and running initially, it needs some basic knowledge about the task it is to perform — it cannot start learning on the job before it knows what the job is. To fulfill this requirement, a small number of hand-designed control policies were developed for the robot. These policies had a very low success rate, but were sufficient to run experiments in a simulated environment, after which the knowledge gained could be transferred to the physical robots using sim-to-real transfer.
At this point, the robots had gained some skills, but still had much to learn before they would be ready for real world use. This learning took place in “robot classrooms” where waste stations were set up with a variety of objects to learn from. After they finished their schooling, the robots were unleashed on the office buildings to practice their new trade. But their learning did not stop there — as they carried out their daily tasks, the reinforcement learning process continued to further improve their accuracy over time.
The algorithm was evaluated after collecting data from 540,000 trials in the classrooms, and 32,500 trials from real world use. As more data was collected, performance of the system was noted to improve. The final system was observed to have an average sorting accuracy of 84%, which was sufficient to reduce contamination in the waste bins by up to 50% by weight.
This work showed that with the proper training, practical reinforcement learning-based robots can be developed that are capable of performing real-world tasks in real office environments. There is still work to be done — the experiments showed that the robots did not succeed in every situation — but the techniques demonstrated by Google Research may serve as the basis for robots that can perform a broad range of tasks one day.