The Pressure’s On
3D-ViTac fuses tactile and visual sensing for advanced robotics applications that can overcome occlusions and handle delicate tasks.
Human senses rarely work in isolation. Take something simple, like picking up a ball, for example. Even this requires the coordination of multiple senses working together. Your vision gauges the ball's position, size, and distance, while your sense of touch provides feedback about its texture and weight as your fingers make contact. These sensory inputs combine to inform your brain, allowing you to adjust your grip, pressure, and movement in real time.
Taking in all of this sensory information and making subtle muscle movements in response just comes naturally to us. But nothing comes natural to robots — we have to literally teach them everything they know. And while tasks like picking up a ball may seem simple, when you get down to the nuts and bolts of it, there is a lot involved. As more sensing modalities are added in, the job only grows more difficult. This is one of the reasons that most robots are very limited in how they can interact with the world around them.
In an effort to address this shortcoming, a team headed up by researchers at Columbia University has developed a system called 3D-ViTac that combines tactile and visual sensing to enable advanced robotic manipulation. Inspired by the human ability to integrate the sense of vision and touch, 3D-ViTac addresses two key challenges in robotic perception: designing effective tactile sensors and unifying distinct sensory data types.
The system features cost-effective, flexible tactile sensors composed of piezoresistive sensing matrices. Each matrix has a thickness of less than 1 mm, making it adaptable to a variety of robotic manipulators. These sensors are integrated onto a soft, 3D-printed gripper, creating a robust and inexpensive solution. Each sensor pad consists of a 16x16 array of sensing units, capable of detecting mechanical pressure changes and converting them into electrical signals, with a high spatial resolution of 3 mm² per sensing point. Signals are captured by an Arduino Nano, which transmits the data to a computer for further processing.
The tactile data from these sensors are integrated with multi-view visual data into a unified 3D visuo-tactile representation. This fusion preserves the spatial structure and relationships of the tactile and visual inputs, enabling imitation learning via diffusion policies. This approach allows robots to adapt to force changes, overcome visual occlusions, and perform delicate tasks such as handling fragile objects or manipulating tools in-hand.
A variety of experiments were conducted to assess the performance of 3D-ViTac. First, the tactile sensors were characterized to evaluate them, including signal consistency under various loads and their ability to estimate 6 DoF poses using only tactile data. Next, four challenging real-world tasks were designed to assess the importance of tactile feedback: egg steaming, fruit preparation, hex key collection, and sandwich serving. These tasks tested fine-grained force application, in-hand state adjustment, and task progression under visual occlusions.
A comparative analysis against vision-only and vision-tactile baselines revealed three key benefits of 3D-ViTac: (1) precise force feedback, preventing object damage or slippage, (2) overcoming visual occlusions using tactile contact patterns, and (3) enabling confident transitions between task stages in visually noisy environments. The results highlight how multimodal sensing significantly improves robot performance.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.