Published September 23, 2024 © MIT

Automating Robot Arm Visual Tracking＆Hand-Eye Calibration

This article highlights is a maker-friendly robotic arm with dynamic tracking, designed for robotics education!

IntermediateProtip2 hours1,456

Things used in this project

Hardware components

M5Stack ESP32 Basic Core IoT Development Kit

Raspberry Pi 4 Model B

Espressif ESP32S

Hand tools and fabrication machines

Elephant Robotics myCobot 320 m5

Story

Automating Robotic Arm Visual Tracking via Hand-Eye Calibration

Introduction

The focus of today's article is to recreate a visual tracking case using the myCobot 320 robotic arm. This case is provided by Elephant Robotics as an official solution, enabling users to get started quickly and follow along step by step to see how to reproduce the results and identify any potential issues.

Equipment

myCobot 320 M5Stack

The myCobot 320 M5Stack is a six-degree-of-freedom robotic arm with a working radius of 350mm and a maximum payload of 1kg at its end effector. It supports various mainstream programming languages and operating systems. In this article, Python is primarily used to control the robotic arm.

From the image, we can see the setup consists of a myCobot robotic arm and a camera, used to capture image data. The specific parameters of the robotic arm are as follows.

Camera

You don't need to use the same camera as I did; the key is that it can be mounted on the end effector of the robotic arm and can acquire image data via a USB cable. For this setup, I used the myCobot Pro Camera Flange, an end-effector camera specially adapted for myCobot by Elephant Robotics.

Environment Setup

● Operating System: Windows 10

● Programming Language: Python

● IDE: PyCharm

● Libraries: Numpy, OpenCV, STag, pymycobot, json, time (latest versions of these libraries are recommended)

Knowledge Prerequisites

OpenCV

OpenCV (Open Source Computer Vision Library) is an open-source software library for computer vision and machine learning. It is widely used in image processing, video analysis, and object detection. OpenCV provides developers with a rich set of tools for handling image data and implementing complex vision algorithms. In the context of robotic arms, OpenCV can be used for visual tracking, where a camera captures the target in real time, analyzing and extracting the target’s position and motion trajectory. The robotic arm then adjusts its movements based on this information, enabling precise object grabbing and manipulation. This visual tracking technology is extensively applied in automation, industrial robotics, and smart manufacturing.

STag

STag markers are a type of 2D barcode system widely used in machine vision for marker detection and spatial positioning. These markers consist of black-and-white patterns, typically square in shape, with a unique binary pattern at the center, allowing them to be quickly and accurately recognized by computer vision systems.

Hand-Eye Calibration – Eye-in-Hand Configuration

Hand-eye calibration involves determining the precise spatial and orientation relationship between the camera (eye) and the robotic arm's end-effector (hand). In this case, we are primarily discussing the "eye-in-hand" scenario, which refers to the situation encountered in this project. Hand-eye calibration is essential to establish how the camera is positioned relative to the robotic arm's end-effector. In the eye-in-hand configuration, the camera is mounted on the robotic arm’s end-effector, so the field of view and camera angle change as the robotic arm moves. The objective is to calculate the transformation between the camera's coordinate system and the robotic arm's end-effector coordinate system. This enables the robot to perceive its surroundings via the camera and execute tasks such as object tracking and precise grasping.

Steps in Hand-Eye Calibration

1. Camera Pose Changes: In the eye-in-hand configuration, the camera’s perspective changes with every robotic arm movement. By moving the robotic arm, multiple viewpoints of a calibration object can be captured, yielding data on the camera’s different poses.

2. Data Collection: The robotic arm is moved to several different positions, and each time, it captures an image of a calibration board or a specific object. The end-effector’s pose (provided by the encoders) and the object’s pose (calculated via image processing) are recorded.

3. Solving for the Relationship Between Camera and End-Effector: Using algorithms like least squares, the transformation matrix between the camera and the end-effector is computed, establishing the coordinate transformation relationship between them.

Implementation with Code

The process is mainly divided into two parts: the calibration process and the tracking movement module.

Calibration Process

1. Coordinate Transformation

In the hand-eye calibration process, transformations between different coordinate systems are involved. The key coordinate systems are as follows:

● World Frame (W): A reference frame usually fixed in the environment.

● Base Frame (B): A frame fixed at the robotic arm's base, used to represent the arm’s posture.

● End-Effector Frame (E): When the camera is mounted on the robotic arm’s end-effector, this frame represents the posture of the arm's end-effector.

● Camera Frame (C): A frame fixed to the camera, used to describe the pose of the objects seen by the camera.

In hand-eye calibration, the goal is to solve the transformation matrix between the camera frame and the robotic arm’s end-effector frame. This allows the pose of the object detected by the camera to be transformed into the robotic arm's end-effector coordinates, enabling precise manipulation of the target object.

def eyes_in_hand_calculate(self, pose, tbe1, Mc1, tbe2, Mc2, tbe3, Mc3, Mr):

        tbe1, Mc1, tbe2, Mc2, tbe3, Mc3, Mr = map(np.array, [tbe1, Mc1, tbe2, Mc2, tbe3, Mc3, Mr])
        # Convert pose from degrees to radians
        euler = np.array(pose) * np.pi / 180
        Rbe = self.CvtEulerAngleToRotationMatrix(euler)
        Reb = Rbe.T
        
        A = np.hstack([(Mc2 - Mc1).reshape(-1, 1), 
                    (Mc3 - Mc1).reshape(-1, 1), 
                    (Mc3 - Mc2).reshape(-1, 1)])
        
        b = Reb @ np.hstack([(tbe1 - tbe2).reshape(-1, 1), 
                            (tbe1 - tbe3).reshape(-1, 1), 
                            (tbe2 - tbe3).reshape(-1, 1)])
        
        U, S, Vt = svd(A @ b.T)
        Rce = Vt.T @ U.T
        
        tce = Reb @ (Mr - (1/3)*(tbe1 + tbe2 + tbe3) - (1/3)*(Rbe @ Rce @ (Mc1 + Mc2 + Mc3)))
        
        eyes_in_hand_matrix = np.vstack([np.hstack([Rce, tce.reshape(-1, 1)]), np.array([0, 0, 0, 1])])
        
        return eyes_in_hand_matrix

2. Data Collection

By moving the robotic arm to different positions, data on the various positions of the robotic arm’s end-effector and the camera’s observations are collected.

In your code, the robotic arm’s pose is obtained by calling the `ml.get_coords()` method, while the camera’s position data is collected via the `stag_identify()` function, which identifies the marker object.

def reg_get(self, ml):
    for i in range(30):
        Mc_all = self.stag_identify()  
    tbe_all = ml.get_coords()  
    ...
    return Mc, tbe

3. Coordinate Transformation Matrix

Based on the data from each position, two transformations can be derived:

● Ai is the transformation matrix of the robotic arm's end-effector at different positions, representing the motion of the end-effector.

● Bi is the transformation matrix of the object as observed by the camera in the camera’s coordinate system, representing the camera’s motion.

These transformation matrices are obtained through the vision system and the robotic arm system (using `get_coords`).

4. Solving the Calibration Matrix

According to the calibration model:

● Ai represents the movement of the robotic arm’s end-effector (from the world frame to the end-effector frame).

● Bi represents the movement of the camera (the motion of the object as seen in the camera’s coordinate system).

● Xce is the hand-eye calibration matrix to be solved, representing the rigid-body transformation between the camera and the robotic arm’s end-effector.

By collecting multiple positions of Ai and Bi, the least squares method can be used to solve for Xce. While this part of the logic is not shown in the code, it can typically be solved using methods like SVD decomposition.

After saving the collected data and calculating the result, the subsequent tracking function can be implemented.

[[0.9825202432037423, 0.03775722308035847, 0.1822864882543945, -21.50838594386444], [-0.04022441808787263, 0.9991420672993772, 0.009855229181470597, -0.6545263884052905], [-0.1817579926285262, -0.017015330087522124, 0.9831960692850951, 59.71321654600654], [0.0, 0.0, 0.0, 1.0]]

5. Visual Tracking

The output of the hand-eye calibration is a rigid-body transformation matrix that describes the spatial relationship between the camera and the robotic arm’s end-effector. This matrix forms the basis for the robotic arm's visual control and operations. Using this matrix, the robotic arm can convert the position of objects perceived by the vision system into its own coordinate system. The previously mentioned STag codes are recognized using OpenCV algorithms.

def stag_robot_identify(self, ml):
        marker_pos_pack = self.stag_identify()
        target_coords = ml.get_coords() 
        while (target_coords is None):
            target_coords = ml.get_coords()
        # print("current_coords", target_coords)
        cur_coords = np.array(target_coords.copy())
        cur_coords[-3:] *= (np.pi / 180)  
        fact_bcl = self.Eyes_in_hand(cur_coords, marker_pos_pack, self.EyesInHand_matrix) 
        
        for i in range(3):
            target_coords[i] = fact_bcl[i]
        
        return target_coords

Based on the coordinates returned from the recognized codes, the robotic arm moves accordingly, performing movements along the XYZ axes of the end-effector to achieve the goal of tracking.

def vision_trace_loop(self, ml):
    mc.set_fresh_mode(1)
    time.sleep(1)

    ml.send_angles(self.origin_mycbot_horizontal, 50) 
    self.wait() 
    time.sleep(1)

    origin = ml.get_coords()  
        while 1:
        target_coords = self.stag_robot_identify(ml) 
        target_coords[0] -= 300  
        self.coord_limit(target_coords)  
        print(target_coords)
        for i in range(3):
            target_coords[i+3] = origin[i+3]  
        ml.send_coords(target_coords, 30)

Summary

Overall, when running this code, there may still be some hiccups, and certain functions are not fully explained. Having worked with many hand-eye calibration methods, I find this to be one of the more straightforward automatic calibration approaches, though it lacks some precision, which can be improved through optimization. In general, this case is worth exploring, especially for those with some understanding of robotic arms and vision systems!