Teaching Robots Collaborative Pick-Place and Sortation Tasks
With recent advancements in Vision-Language-Action (VLA) models, there is a growing wave of interest in embodied AI. These models, trained on large-scale vision, language, and action datasets, enable generalized understanding, reasoning, and autonomy. They serve as a robust foundation for fine-tuning across various robotic embodiments.
In this project, we leverage two LeRobot SO-100 ARM robots from Seeed Studio to train them for collaborative pick-and-place tasks. Our ultimate goal is to enable natural language instructions to seamlessly guide the robotic arms in executing these tasks.
The Concept (Les Robots)The project involves-
1. Building two pairs of leader-follower robotics
2. Collecting data using manual demonstrations for collaborative tasks
3. Experiment with fine-tuning and deploying multiple vision language action (VLA) models to teach collaborative pick-and-place tasks. We experiment with the recently releases NVidia's Groot N1, ACT, and Pi0 models.
The Setup
Each follower robot is equipped with an Arducam for visual processing, while an external third camera records the overall process. The captured movements form the foundation for training the NVIDIA Isaac-GR00T N1 foundation VLA model, ensuring that the robots learn from real-world demonstrations.
Steps to ImplementationBuilding the SO-100 Arms: The first step is assembling the LeRobot SO-100 robotic arms, ensuring they are mechanically and electrically functional.
Setting Up the Hardware for Teleoperation: Establishing the connection between the leader and follower robots, enabling real-time movement mirroring.
Note: Calibration is Important!!!
We used a naive approach for calibration based on user use of teleoperation.
Data Collection in LeRobot Format: Recording and structuring demonstration data that serves as the training input for the AI model. :(
Generating Modality.json for Training: Preparing a structured dataset compatible with Gr00t N1’s GPU-based training requirements.
Modifying Gr00t N1 for Bimanual Manipulation: Tweaking the AI model to accommodate simultaneous control of two robotic arms, enhancing their cooperative capabilities. (Working with a code base released 3 days ago!)
Deployment on Robot with 8GB+ GPU: Ensuring the trained model runs efficiently in real-world conditions for inference and execution.
- Failure:
To encourage continuous development, the demonstration dataset has been uploaded to Hugging Face, and implementation details are available on GitHub. This initiative aims to inspire researchers, engineers, and enthusiasts to explore and expand upon the project, unlocking new possibilities in robotics and artificial intelligence.
By bridging human intuition with robotic precision, this project paves the way for the future of teleoperated and autonomous robotic manipulation, transforming industries from manufacturing to service automation. The journey of innovation continues—what will you build next?
Comments
Please log in or sign up to comment.