•

MavisHuang

Published February 26, 2025 © GPL3+

Automated Desktop Trash Removal Robot Based on YOLOv10

The article features the 6 DOF robot arm myCobot 320 to create an automated desktop trash-removal assistant.

BeginnerFull instructions provided2 hours459

Automated Desktop Trash Removal Robot Based on YOLOv10

Things used in this project

Software apps and online services

OpenCV

Hand tools and fabrication machines

Elephant Robotics myCobot 320 m5

Story

This project demonstrates how to use the robotic arm myCobot 320, a usb camera, and the YOLOv10 object detection model to automatically recognize desktop trash and perform grab-and-discard operations. By integrating deep learning with robotic control, the system can automatically identify nomal trash on desk and execute the appropriate grabbing and discarding actions.

Project Goals

1. Utilize the YOLOv10 object detection model to identify the coordinates of objects.

2. Compute the target image coordinates, and then use coordinate transformation to determine the target pose.

3. Implement grabbing and discarding of the detected target using the myCobot 320 and the myGripperF100.

Implementation Steps

1. Environment Setup and Camera Calibration

1.1 Environment Setup

● Fixed height aluminum profile bracket to hold a camera

● A usb camera

● A calibration board

1.2 Hand-Eye Calibration

Hand-Eye Calibration involves determining the transformation between the coordinate frame of a robot's end-effector (the "hand") and the coordinate frame of a camera (the "eye"). The steps are:

1. Move the robot: Place the robot in different positions and orientations.

2. Capture images: At each position, take images from the camera.

3. Detect features: Identify corresponding features in the images (e.g., checkerboard patterns).

4. Compute transformation: Use the robot’s movements and the camera's feature data to calculate the transformation matrix between the robot and camera coordinate frames.

We recommand to solve transformation matrix by OpenCV function:

import cv2
... 
R_camera2base, t_camera2base = cv2.calibrateHandEye(R_base2end, t_base2end, R_board2camera, t_board2camera)

We can also obtain the relative position of the camera with respect to the robotic arm base through simple physical measurements. Assuming the camera is positioned 40cm above the desktop, and its Y-axis distance from the arm's origin is 450 cm, we can approximate the translation vector as follows:

By considering the definitions of the camera and robot coordinate systems, we know that their Z-axes are opposites, while their X-axes are aligned, and the Y-axes are the same. Hence, the rotation matrix can be roughly defined as:

The camera's intrinsic matrix is typically represented as follows:

● fx, fy: focal lengths in the X and Y directions.

● cx, cy: optical center coordinates (principal point).

These parameters are fixed characteristics of the camera and only need to be calibrated once, regardless of the camera's placement. We need to take several images of the checkerboard from different angles, then use OpenCV functions to automatically calibrate and obtain the intrinsic matrix.

import cv2
import numpy as np
import glob

# (width,height)
pattern_size = (12, 6)  # 12x9
square_size = 5  # size of each chess 

# define counters 
obj_points = []  
img_points = []  

objp = np.zeros((np.prod(pattern_size), 3), dtype=np.float32)
objp[:, :2] = np.indices(pattern_size).T.reshape(-1, 2)
objp *= square_size

# load image
images = glob.glob('Image/*.jpg') 
for fname in images:
   
    img = cv2.imread(fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    ret, corners = cv2.findChessboardCorners(gray, pattern_size, None)

    if ret:
        img_points.append(corners)
        obj_points.append(objp)

        cv2.drawChessboardCorners(img, pattern_size, corners, ret)
        cv2.imshow('Corners', img)
        cv2.waitKey(500)  
    else:
        print(f"cant find corners: {fname}")

cv2.destroyAllWindows()

# enter calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(obj_points, img_points, gray.shape[::-1], None, None)

with open('camera_intrinsics.txt', 'w') as f:
    f.write('intrinsic (K):\n')
    np.savetxt(f, mtx)
    f.write('\n (dist):\n')
    np.savetxt(f, dist)

with open('camera_extrinsics.txt', 'w') as f:
    f.write('ex_r (rvecs):\n')
    np.savetxt(f, np.array(rvecs).reshape(-1, 3))
    f.write('\n ex_t (tvecs):\n')
    np.savetxt(f, np.array(tvecs).reshape(-1, 3))

print("finish camera_intrinsics.txt and camera_extrinsics.txt")

2. YOLOv10 Detection Preparation

2.1 Enviroment Setup

You can follow the tutorial https://github.com/THU-MIG/yolov10.

conda create -n yolov10 python=3.9
conda activate yolov10
pip install -r requirements.txt
pip install -e .

2.2 Test Detection Model

You can write a script to load your model and get result.

import cv2
import torch
import matplotlib.pyplot as plt
from ultralytics import YOLOv10

model = YOLOv10('best.pt')
image_path = 'desk.jpg'
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# get the result
results = model(image)

boxes = []
labels = []
confidences = []
centers = []


# result boxes: [x_min, y_min,x_max,y_max]
for result in results:
    for detection in result:  
        box = detection.boxes.xyx
        print(box)
        boxes.append(box.numpy())

for box in boxes:
    x_min, y_min, x_max, y_max = box[0]

     x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
    color = (0, 255, 0)  
    thickness = 2 
    image = cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color, thickness)
#print(boxes)

cv2.imshow("Detected Image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

# save the result image 
cv2.imwrite("output_image.jpg", image)

The detectable objects include broken glass, cigarette, plastic bag, bottle cap, tissue, etc.

3. Coordinate Transformation

In Step 2, we obtain the position of the detected object in the image coordinate system as Point2D(x, y). The depth of the detected object can be roughly estimated using the camera's height, where Z ≈ 40 cm. Therefore, by using the back-projection formula:

K is the camera intrinsic matrix obtained in Step 2:

Given that Z is known, we can solve for the XYZ coordinates in the camera coordinate system, where Z is the estimated distance from the desktop to the camera, approximately 0.4 meters:

def get_camera_coordinate(_x,_y):
    # camera intrinsic matrix K 
    f_x = 933.33  
    f_y = 933.33  
    c_x = 960   
    c_y = 540   
    K = np.array([
        [f_x, 0, c_x],
        [0, f_y, c_y],
        [0, 0, 1]
    ])

    # depth Z (known, 36 cm)
    Z = 0.36  # m

    # Image coordinate vector (x, y, 1)     
    image_point = np.array([_x, _y, 1])

    X= (_x-c_x)*Z/f_x
    Y= (_y-c_y)*Z/f_y
    # Z =0.36 in article is 0.40
    camera_coordinates = [X*1000,Y*1000,Z*1000]  #m to mm
    print(f"camera_coords:{camera_coordinates}")
    return camera_coordinates

4. Robot Control

● A status identifier is set to prevent command conflicts during the execution process.

● After completing the movement, the robot arm returns to its origin, and the status identifier is released to allow the next movement.

def move_one_object(_myCobot,_Coords_X,_Coords_Y,_Coords_Z):
    m = _myCobot
    coords_object = [float(_Coords_X),float(_Coords_Y),float(_Coords_Z),-175,0,130]

    print(f" now move to {coords_object}")
    m.send_coords(coords_object,40,1)
    while check_move(m,coords_object) != True:
        time.sleep(0.5)
        print("----- waiting -----")

    m.set_pro_gripper_close(14)
    sleep(3)
    return coords_object

5. Function Integration and Testing

import time
from time import sleep
import math
import cv2
import torch
import matplotlib.pyplot as plt
from ultralytics import YOLOv10
import numpy as np

from pymycobot import MyCobot320,utils

# define robot
mc = MyCobot320(utils.get_port_list()[0])
mc.set_pro_gripper_torque(gripper_id=14, torque_value=10)
# define model
model = YOLOv10('best.pt')
# define camera
cap = cv2.VideoCapture(0)
# define robot init status
robot_status = 0

# check camera status
if not cap.isOpened():
    print("could not open the camera")
    exit()


def get_yolov10_result(_model,_input_image):
    boxes = []
    confidences = []

    if _input_image is None:
        print("Error: Image is not loaded.")
        return boxes,confidences

    results = _model(_input_image)


    for result in results:
        for detection in result:
            box = detection.boxes.xyxy
            confidence = detection.boxes.conf
            #print(detection)

            # set freshould
            if confidence[0]< 0.5:
                continue

            boxes.append(box.numpy())
            confidences.append(confidence.numpy())

    for box in boxes:
        x_min, y_min, x_max, y_max = box[0]

        # draw detect rectangle
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        color = (0, 255, 0)
        thickness = 2
        _input_image = cv2.rectangle(_input_image, (x_min, y_min), (x_max, y_max), color, thickness)


        # draw the center point
        x_center = (x_min + x_max) // 2
        y_center = (y_min + y_max) // 2

        center_color = (0, 0, 255)  # red
        radius = 5  # raduis of point
        _input_image = cv2.circle(_input_image, (x_center, y_center), radius, center_color, -1)
        #print(f"x_center: {x_center},\n y_center: {y_center}")

        # show
        image_show = cv2.resize(_input_image, (640, 640))
        cv2.imshow("Displayed Image", image_show)
        cv2.waitKey(1)

    return boxes,confidences

def get_best_object(_boxes,_confidences):

    best_index = _confidences.index(max(_confidences))
    best_box = _boxes[best_index]

    x_min, y_min, x_max, y_max = best_box[0]
    x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)

    x_center = round((x_min + x_max) // 2 / 640 * 1920)
    y_center = round((y_min + y_max) // 2 / 640 * 1080)
    #print(f"x_center: {x_center},\n y_center: {y_center}")

    return x_center,y_center


def get_camera_coordinate(_x,_y):
    # camera intrisic matrix
    f_x = 933.33  #
    f_y = 933.33  #
    c_x = 960
    c_y = 540
    K = np.array([
        [f_x, 0, c_x],
        [0, f_y, c_y],
        [0, 0, 1]
    ])


    # depth Z (已知, 36 cm)
    Z = 0.36  #m


    X= (_x-c_x)*Z/f_x
    Y= (_y-c_y)*Z/f_y
    camera_coordinates = [X*1000,Y*1000,Z*1000]  # mm
    print(f"camera_coords:{camera_coordinates}")
    return camera_coordinates


def get_robot_coordinate(_cam_point_xyz):
    R_y_180 = np.array([
        [-1, 0, 0],
        [0, 1, 0],
        [0, 0, -1]
    ])

    T = np.array([0, 450, 400])

    T_cam_to_arm = np.eye(4)
    T_cam_to_arm[:3, :3] = R_y_180
    T_cam_to_arm[:3, 3] = T


    camera_coordinates = np.array(
        [[_cam_point_xyz[0]], [_cam_point_xyz[1]], [_cam_point_xyz[2]]])


    arm_coordinates = np.dot(R_y_180,camera_coordinates)
    arm_coordinates[1,0] += 450
    arm_coordinates[2,0] += 360

    print(arm_coordinates)
    return arm_coordinates

def check_move(_myCobot,_goal):
    m = _myCobot
    recent_coords=m.get_coords()
    dist = math.sqrt(pow(_goal[0]-recent_coords[0],2)+pow(_goal[1]-recent_coords[1],2)+pow(_goal[2]-recent_coords[2],2))
    print(f"dist={dist}")
    if dist < 12:
        return True
    else:
        return False


def move_one_object(_myCobot,_Coords_X,_Coords_Y,_Coords_Z):
    m = _myCobot
    coords_object = [float(_Coords_X),float(_Coords_Y),float(_Coords_Z),-175,0,130]

    print(f" now move to {coords_object}")
    m.send_coords(coords_object,40,1)
    while check_move(m,coords_object) != True:
        time.sleep(0.5)
        print("----- waiting -----")

    m.set_pro_gripper_close(14)
    sleep(3)
    return coords_object

def move_trash(_myCobot):
    m = _myCobot
    coords_trash=[250,150,280,160,0,70]
    m.send_coords(coords_trash,50,1)
    time.sleep(1)
    return coords_trash


def move_base(_myCobot):
    m = _myCobot

    base_coords = [250,150,280,160,0,70]
    m.send_coords(base_coords,50,1)
    time.sleep(2)
    m.set_pro_gripper_open(14)
    sleep(2)
    return base_coords


if __name__ == "__main__":

    gripper_height =185
    init_status = check_move(mc,move_base(mc))
    while init_status != True:
        time.sleep(1)

    print("robot init")
    robot_status = 0
    while True:
        
        ret, frame = cap.read()
        if not ret:
            print("No Image")
            break

        # crop to 640x640
        image_resized = cv2.resize(frame, (640, 640))
        image_rgb = image_resized.copy()
        cv2.imshow("1",image_rgb)
        cv2.waitKey(1)

        detect_boxes,detect_confidences = get_yolov10_result(model, image_rgb)

        print(f"len boxes:{len(detect_boxes)}")

        if len(detect_boxes) != 0 and robot_status==0:

            robot_status = 1
            x_image, y_image = get_best_object(detect_boxes, detect_confidences)
            xyz_camera = get_camera_coordinate(x_image, y_image)
            xyz_robot = get_robot_coordinate(xyz_camera)

            #limit the robot arm
            if xyz_robot[1] > 350.0:
                xyz_robot[1] = 340.5

            move_object = move_one_object(mc,xyz_robot[0],xyz_robot[1],xyz_robot[2]+gripper_height)

            current_status = check_move(mc, move_base(mc))
            while current_status != True:
                print("wait for mobing to base")
                time.sleep(0.5)
            robot_status = 0

Future Work and Expansion

1. Object Recognition Optimization: By training a more accurate YOLOv10 model, we aim to enhance the recognition capability for objects with different shapes and forms.

2. Increase Complexity of Detection Model: The system will be expanded to handle more complex tasks, such as multi-object recognition and grabbing, as well as dynamic target recognition.

Summary

This case demonstrates how combining the YOLOv10 object detection model with the 6 DOF robot arm myCobot 320 enables automatic object identification and disposal for desktop cleaning task. By effectively configuring both hardware and software, the cobot is able to perform tasks efficiently in complex environments, providing strong support for automation applications. We're excited to see more users and makers using the myCobot series robots to create innovative projects and participate in our cases collection activity: https://www.elephantrobotics.com/en/call-for-user-cases-en/.

Code

test.py

import time
from time import sleep
import math
import cv2
import torch
import matplotlib.pyplot as plt
from ultralytics import YOLOv10
import numpy as np

from pymycobot import MyCobot320,utils

# define robot
mc = MyCobot320(utils.get_port_list()[0])
mc.set_pro_gripper_torque(gripper_id=14, torque_value=10)
# define model
model = YOLOv10('best.pt')
# define camera
cap = cv2.VideoCapture(0)
# define robot init status
robot_status = 0

# check camera status
if not cap.isOpened():
    print("could not open the camera")
    exit()


def get_yolov10_result(_model,_input_image):
    boxes = []
    confidences = []

    if _input_image is None:
        print("Error: Image is not loaded.")
        return boxes,confidences

    results = _model(_input_image)


    for result in results:
        for detection in result:
            box = detection.boxes.xyxy
            confidence = detection.boxes.conf
            #print(detection)

            # set freshould
            if confidence[0]< 0.5:
                continue

            boxes.append(box.numpy())
            confidences.append(confidence.numpy())

    for box in boxes:
        x_min, y_min, x_max, y_max = box[0]

        # draw detect rectangle
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        color = (0, 255, 0)
        thickness = 2
        _input_image = cv2.rectangle(_input_image, (x_min, y_min), (x_max, y_max), color, thickness)


        # draw the center point
        x_center = (x_min + x_max) // 2
        y_center = (y_min + y_max) // 2

        center_color = (0, 0, 255)  # red
        radius = 5  # raduis of point
        _input_image = cv2.circle(_input_image, (x_center, y_center), radius, center_color, -1)
        #print(f"x_center: {x_center},\n y_center: {y_center}")

        # show
        image_show = cv2.resize(_input_image, (640, 640))
        cv2.imshow("Displayed Image", image_show)
        cv2.waitKey(1)

    return boxes,confidences

def get_best_object(_boxes,_confidences):

    best_index = _confidences.index(max(_confidences))
    best_box = _boxes[best_index]

    x_min, y_min, x_max, y_max = best_box[0]
    x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)

    x_center = round((x_min + x_max) // 2 / 640 * 1920)
    y_center = round((y_min + y_max) // 2 / 640 * 1080)
    #print(f"x_center: {x_center},\n y_center: {y_center}")

    return x_center,y_center


def get_camera_coordinate(_x,_y):
    # camera intrisic matrix
    f_x = 933.33  #
    f_y = 933.33  #
    c_x = 960
    c_y = 540
    K = np.array([
        [f_x, 0, c_x],
        [0, f_y, c_y],
        [0, 0, 1]
    ])


    # depth Z (已知, 36 cm)
    Z = 0.36  #m


    X= (_x-c_x)*Z/f_x
    Y= (_y-c_y)*Z/f_y
    camera_coordinates = [X*1000,Y*1000,Z*1000]  # mm
    print(f"camera_coords:{camera_coordinates}")
    return camera_coordinates


def get_robot_coordinate(_cam_point_xyz):
    R_y_180 = np.array([
        [-1, 0, 0],
        [0, 1, 0],
        [0, 0, -1]
    ])

    T = np.array([0, 450, 400])

    T_cam_to_arm = np.eye(4)
    T_cam_to_arm[:3, :3] = R_y_180
    T_cam_to_arm[:3, 3] = T


    camera_coordinates = np.array(
        [[_cam_point_xyz[0]], [_cam_point_xyz[1]], [_cam_point_xyz[2]]])


    arm_coordinates = np.dot(R_y_180,camera_coordinates)
    arm_coordinates[1,0] += 450
    arm_coordinates[2,0] += 360

    print(arm_coordinates)
    return arm_coordinates

def check_move(_myCobot,_goal):
    m = _myCobot
    recent_coords=m.get_coords()
    dist = math.sqrt(pow(_goal[0]-recent_coords[0],2)+pow(_goal[1]-recent_coords[1],2)+pow(_goal[2]-recent_coords[2],2))
    print(f"dist={dist}")
    if dist < 12:
        return True
    else:
        return False


def move_one_object(_myCobot,_Coords_X,_Coords_Y,_Coords_Z):
    m = _myCobot
    coords_object = [float(_Coords_X),float(_Coords_Y),float(_Coords_Z),-175,0,130]

    print(f" now move to {coords_object}")
    m.send_coords(coords_object,40,1)
    while check_move(m,coords_object) != True:
        time.sleep(0.5)
        print("----- waiting -----")

    m.set_pro_gripper_close(14)
    sleep(3)
    return coords_object

def move_trash(_myCobot):
    m = _myCobot
    coords_trash=[250,150,280,160,0,70]
    m.send_coords(coords_trash,50,1)
    time.sleep(1)
    return coords_trash


def move_base(_myCobot):
    m = _myCobot

    base_coords = [250,150,280,160,0,70]
    m.send_coords(base_coords,50,1)
    time.sleep(2)
    m.set_pro_gripper_open(14)
    sleep(2)
    return base_coords


if __name__ == "__main__":

    gripper_height =185
    init_status = check_move(mc,move_base(mc))
    while init_status != True:
        time.sleep(1)

    print("robot init")
    robot_status = 0
    while True:
        # 读取每一帧图像
        ret, frame = cap.read()
        if not ret:
            print("No Image")
            break

        # crop to 640x640
        image_resized = cv2.resize(frame, (640, 640))
        image_rgb = image_resized.copy()
        cv2.imshow("1",image_rgb)
        cv2.waitKey(1)

        detect_boxes,detect_confidences = get_yolov10_result(model, image_rgb)

        print(f"len boxes:{len(detect_boxes)}")

        if len(detect_boxes) != 0 and robot_status==0:

            robot_status = 1
            x_image, y_image = get_best_object(detect_boxes, detect_confidences)
            xyz_camera = get_camera_coordinate(x_image, y_image)
            xyz_robot = get_robot_coordinate(xyz_camera)

            #limit the robot arm
            if xyz_robot[1] > 350.0:
                xyz_robot[1] = 340.5

            move_object = move_one_object(mc,xyz_robot[0],xyz_robot[1],xyz_robot[2]+gripper_height)

            current_status = check_move(mc, move_base(mc))
            while current_status != True:
                print("wait for mobing to base")
                time.sleep(0.5)
            robot_status = 0

Credits

Elephant Robotics

111 projects • 236 followers

An Innovative Robotics Company.

Contact

Automated Desktop Trash Removal Robot Based on YOLOv10

Things used in this project

Software apps and online services

Hand tools and fabrication machines

Story

Project Goals

Implementation Steps

1. Environment Setup and Camera Calibration

1.1 Environment Setup

1.2 Hand-Eye Calibration

2. YOLOv10 Detection Preparation

2.1 Enviroment Setup

2.2 Test Detection Model

3. Coordinate Transformation

4. Robot Control

5. Function Integration and Testing

Future Work and Expansion

Summary

Schematics

YOLOv10_model

Code

test.py

Credits

Elephant Robotics

MavisHuang

Comments

Embed the widget on your own site

Automated Desktop Trash Removal Robot Based on YOLOv10

Automated Desktop Trash Removal Robot Based on YOLOv10

Things used in this project

Software apps and online services

Hand tools and fabrication machines

Story

Project Goals

Implementation Steps

1. Environment Setup and Camera Calibration

1.1 Environment Setup

1.2 Hand-Eye Calibration

2. YOLOv10 Detection Preparation

2.1 Enviroment Setup

2.2 Test Detection Model

3. Coordinate Transformation

4. Robot Control

5. Function Integration and Testing

Future Work and Expansion

Summary

Schematics

YOLOv10_model

Code

test.py

Credits

Elephant Robotics

MavisHuang

Comments

Related channels and tags