You’ll Pick It Up As You Go Along
OK-Robot uses existing open-source vision, navigation, and grasping models to create a versatile robotic assistant — no training needed.
Tremendous progress has been made on multiple fronts in machine learning in recent years. Many of these advances — in areas like computer vision, navigation, natural language understanding, and grasping — have important implications for ongoing development efforts in robotics. These are, after all, among the core competencies that are needed by the general-purpose robots we all dream of owning one day that can clean our homes, cook us dinner, and handle all of the other mundane household tasks that most of us loathe.
One cannot help but wonder why, when so many technological breakthroughs have been achieved, we still seem to be so far away from true general-purpose robots. Even the best of the best robots available today are plagued with brittleness and tend to fail in completing tasks far more often than they succeed — especially when they are put to work outside of a carefully controlled laboratory environment.
Most people assume that this problem results from the fact that training the massive machine learning models that power the various systems of these robots is a laborious and expensive process, requiring deep pockets and expertise that few organizations have access to. There is certainly truth in this, however, the open source community has been thriving. The freely-available models that have been produced are frequently demonstrated to be more capable than state of the art closed systems in terms of accuracy and efficiency.
A team of engineers at New York University and AI at Meta recently spent some time trying to understand how open-source machine learning models can be utilized to build a more capable robot that can operate under a wide range of conditions. In the process they created what they call OK-Robot (Open Knowledge Robot), a robot that can perform arbitrary pick-and-drop operations in previously unseen real-world environments. Through careful integration of the components, they built a robot with a high success rate and no need for data collection or model training — every component of the system was acquired off-the-shelf.
The robot itself is a Stretch, manufactured by Hello Robotics. These versatile robots have a mobile, wheeled base with a vertical bar attached to it. A gripper arm slides along this vertical bar to perform grasping actions at different heights. In order to get this robot working in a new environment, a lidar scan of the area is first performed using an iPhone and the Record3D app. This data is fed into the LangSam and CLIP models, which provide a set of vision-language representations that are stored in a semantic memory.
When a user requests that the robot pick up an object, the semantic memory is utilized to find the location of that object. A navigation algorithm then directs the robot to drive close enough to the object to pick it up, while avoiding collisions and ensuring that movement of the gripper will not be blocked in the course of the operation. Finally, a pre-trained grasping model predicts the best approach for the robotic gripper, which follows the plan to grab the desired object.
OK-Robot was evaluated in ten different real-world home environments. Despite not being supplied with any new training data, the system achieved a respectable 58.5% pick-and-drop success rate on average. It was noted that in less cluttered environments, the success rate of OK-Robot shot up to 82.4%.
The researchers’ approach may still have a good deal of room for improvement, and it may be limited to just pick-and-drop operations, but the fact that no costly data collection or model training is required makes OK-Robot very attractive. By leveraging free and open-source tools, the number of people that can participate in pushing the field forward is multiplied, making the likelihood of future technological breakthroughs much greater.