Robotics and embodied intelligence are the most promising highways for humanity's continued development. Enabling fast prototyping of new models and fine-tuning methods will prove vital in progress acceleration. Ease of access to new developers is also critical to allow for easier adoption and onboarding. We are interested in examining the current state of open source projects such as LeRobot, and are grateful for Seeed Studio organizing a Hackathon to give us hands-on experience to learn.
Problem StatementWe set out to initially start by getting a robot arm to pick and place Lego blocks of different colors, with the idea that we would start with a simple idea, and proceed to a more complicated one if we were successful. We achieved some success in the pick and place, eventually training an ACT (Action Transformer) model to be able to pick and place red Lego pieces. Unfortunately, we did not have time to follow up further.
Initial SetupA leader-follower robot arm setup was provided. We additionally introduced a scene camera and modified the position of the wrist camera. Due to lack of time and relevant tooling, we retained the original wrist camera, but believe that a 3D printed fixture for an additional camera to give a complete view of the gripper during arm operations is essential to improved operational robustness.
With the motors already configured, we proceeded to the arm calibration step. During this process, we inadvertently bumped one of the arms, causing a servo board shortage. This caused a downtime of several hours, where the team worked to procure a replacement, and cobbled together a scrappy setup.
First AttemptThe initial data collection involved picking white pieces out of a pile of Lego pieces, and sorting them into a cardboard tray. The environments were often varied, both in the relevant context (table and tray), and in irrelevant context (people walking in the background). Our initial attempt to fine-tune on this dataset severely overfit, with the trained model simply jittering and not making any discernible progress towards its task of picking and placing Lego structures. This was partly due to training on too many steps (100, 000 was the default in LeRobot), and the initial quality of the dataset being poor. The initial model being used for fine-tuning was LeRobot/Pi0. The fine-tuning process did not work very well here, possibly due to an insufficient quantity of fine-tuning data.
Second AttemptOur second attempt at data collection involved picking red pieces out of a pile of Lego. Part of the motivation of changing the color was to allow for better visibility of the pick target. The base model was changed from Pi0 to ACT and this led to significantly improved model performance. We still witness model overfitting due to the model strongly preferring to make pick attempts from the center of the environment. It is sometimes able to adjust, but is not consistently able to pick and place the target. Our best model performance involved training 30 episodes for 10, 000 steps.
ConclusionThis is an exciting time for end to end neural networks. There's a considerable learning curve, and working with hardware always introduces additional complexity. The open source ecosystem provided in LeRobot has some rough edges, but the core developer workflow mostly works.
Relevant ResourcesFine-Tune Dataset: https://huggingface.co/datasets/ChillyMango/sort-red-blocks-medium
Demo
Comments
Please log in or sign up to comment.