This Robot Is All Ears

ManiWAV promises to make robots more capable in everyday tasks by integrating audio into the visual information about the world around them.

Nick Bild
2 months agoRobotics
ManiWAV is teaching a robot to cook (📷: Z. Liu et al.)

Even the simplest of tasks that we do everyday turn out to be quite complicated when we really dig into the details. Getting a glass of water, for example, involves locating a glass, moving near to it, reaching for it and grabbing it, carrying it to the sink, positioning the glass under the faucet, turning on the water, and so on. There may be literally hundreds of subtasks, each with its own set of challenges, that go into solving simple problems.

This may not seem especially important to you, but it is extremely important to roboticists. That is because robots that seek to emulate human activities must learn to perform the myriad skills that they require. As the tasks get more complex, the difficulty level goes through the roof. Furthermore, when working in an unstructured environment, like a typical home, the robot must learn to adapt to widely varying conditions. All of this complexity leaves engineers seeking to build this type of robot with a seemingly impossible task.

The vast majority of robots that are designed to work in dynamic environments lean very heavily on computer vision algorithms to collect information about their surroundings. This provides a very rich source of information, however, it is not exactly how humans work. In addition to vision, we also use our other senses, such as touch and hearing, to gather more information about our surroundings. Seeking to more closely replicate the way we get things done, a team led by researchers at Stanford University integrated both audio and video into a robot control system. Their hope was that the audio data would provide more information about contacts between objects, leading to more precision in robotic interactions with the world.

This new approach, called ManiWAV, consists of two major components — a data collection device and a learning framework. To support data collection, the team created what they call an ear-in-hand manipulator. It consists of a robot gripper called the Universal Manipulation Interface that is outfitted with a piezoelectric contact microphone. The microphone is wired directly to the mic port on the GoPro camera that is used for capturing visual data to ensure that the two sources of information are completely synchronized.

The learning framework takes the audio and video data as inputs, and predicts the most appropriate 10-DoF robot action as an output. This goal was accomplished through the design of a custom transformer-based machine learning algorithm.

After training this system on a representative dataset, it was put through its paces in a number of experiments. In one case, the robot was asked to flip a bagel in a pan using a spatula. The other trials involved pouring a set of dice between different cups, erasing a whiteboard, or taping a wire to a plastic strip. The same robot arm, equipped with the team’s custom gripper and GoPro camera, was used for each experiment.

As you might expect, the results of the experiments were mixed — sometimes sounds give valuable clues, and sometimes they do not. The robot performed far better than vision-only solutions when pouring dice between cups or erasing the whiteboard, for example. But flipping a bagel, on the other hand, did not have anything to gain from the additional audio data.

The team believes that future updates to the learning algorithm could improve the performance of the system. In particular, they believe they will get better results by accounting for the fact that audio signals are received at a higher frequency than images. With refinements such as this, ManiWAV could help to usher in the era of more capable general-purpose robots.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles