This project demonstrates how to use the robotic arm myCobot 320, a usb camera, and the YOLOv10 object detection model to automatically recognize desktop trash and perform grab-and-discard operations. By integrating deep learning with robotic control, the system can automatically identify nomal trash on desk and execute the appropriate grabbing and discarding actions.
1. Utilize the YOLOv10 object detection model to identify the coordinates of objects.
2. Compute the target image coordinates, and then use coordinate transformation to determine the target pose.
3. Implement grabbing and discarding of the detected target using the myCobot 320 and the myGripperF100.
Implementation Steps1. Environment Setup and Camera Calibration1.1 Environment Setup● Fixed height aluminum profile bracket to hold a camera
● A usb camera
● A calibration board
1.2 Hand-Eye CalibrationHand-Eye Calibration involves determining the transformation between the coordinate frame of a robot's end-effector (the "hand") and the coordinate frame of a camera (the "eye"). The steps are:
1. Move the robot: Place the robot in different positions and orientations.
2. Capture images: At each position, take images from the camera.
3. Detect features: Identify corresponding features in the images (e.g., checkerboard patterns).
4. Compute transformation: Use the robot’s movements and the camera's feature data to calculate the transformation matrix between the robot and camera coordinate frames.
We recommand to solve transformation matrix by OpenCV function:
import cv2
...
R_camera2base, t_camera2base = cv2.calibrateHandEye(R_base2end, t_base2end, R_board2camera, t_board2camera)
We can also obtain the relative position of the camera with respect to the robotic arm base through simple physical measurements. Assuming the camera is positioned 40cm above the desktop, and its Y-axis distance from the arm's origin is 450 cm, we can approximate the translation vector as follows:
By considering the definitions of the camera and robot coordinate systems, we know that their Z-axes are opposites, while their X-axes are aligned, and the Y-axes are the same. Hence, the rotation matrix can be roughly defined as:
The camera's intrinsic matrix is typically represented as follows:
● fx, fy: focal lengths in the X and Y directions.
● cx, cy: optical center coordinates (principal point).
These parameters are fixed characteristics of the camera and only need to be calibrated once, regardless of the camera's placement. We need to take several images of the checkerboard from different angles, then use OpenCV functions to automatically calibrate and obtain the intrinsic matrix.
import cv2
import numpy as np
import glob
# (width,height)
pattern_size = (12, 6) # 12x9
square_size = 5 # size of each chess
# define counters
obj_points = []
img_points = []
objp = np.zeros((np.prod(pattern_size), 3), dtype=np.float32)
objp[:, :2] = np.indices(pattern_size).T.reshape(-1, 2)
objp *= square_size
# load image
images = glob.glob('Image/*.jpg')
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findChessboardCorners(gray, pattern_size, None)
if ret:
img_points.append(corners)
obj_points.append(objp)
cv2.drawChessboardCorners(img, pattern_size, corners, ret)
cv2.imshow('Corners', img)
cv2.waitKey(500)
else:
print(f"cant find corners: {fname}")
cv2.destroyAllWindows()
# enter calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(obj_points, img_points, gray.shape[::-1], None, None)
with open('camera_intrinsics.txt', 'w') as f:
f.write('intrinsic (K):\n')
np.savetxt(f, mtx)
f.write('\n (dist):\n')
np.savetxt(f, dist)
with open('camera_extrinsics.txt', 'w') as f:
f.write('ex_r (rvecs):\n')
np.savetxt(f, np.array(rvecs).reshape(-1, 3))
f.write('\n ex_t (tvecs):\n')
np.savetxt(f, np.array(tvecs).reshape(-1, 3))
print("finish camera_intrinsics.txt and camera_extrinsics.txt")
2. YOLOv10 Detection Preparation2.1 Enviroment SetupYou can follow the tutorial https://github.com/THU-MIG/yolov10.
conda create -n yolov10 python=3.9
conda activate yolov10
pip install -r requirements.txt
pip install -e .
2.2 Test Detection ModelYou can write a script to load your model and get result.
import cv2
import torch
import matplotlib.pyplot as plt
from ultralytics import YOLOv10
model = YOLOv10('best.pt')
image_path = 'desk.jpg'
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# get the result
results = model(image)
boxes = []
labels = []
confidences = []
centers = []
# result boxes: [x_min, y_min,x_max,y_max]
for result in results:
for detection in result:
box = detection.boxes.xyx
print(box)
boxes.append(box.numpy())
for box in boxes:
x_min, y_min, x_max, y_max = box[0]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
color = (0, 255, 0)
thickness = 2
image = cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color, thickness)
#print(boxes)
cv2.imshow("Detected Image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save the result image
cv2.imwrite("output_image.jpg", image)
The detectable objects include broken glass, cigarette, plastic bag, bottle cap, tissue, etc.
3. Coordinate TransformationIn Step 2, we obtain the position of the detected object in the image coordinate system as Point2D(x, y). The depth of the detected object can be roughly estimated using the camera's height, where Z ≈ 40 cm. Therefore, by using the back-projection formula:
K is the camera intrinsic matrix obtained in Step 2:
Given that Z is known, we can solve for the XYZ coordinates in the camera coordinate system, where Z is the estimated distance from the desktop to the camera, approximately 0.4 meters:
def get_camera_coordinate(_x,_y):
# camera intrinsic matrix K
f_x = 933.33
f_y = 933.33
c_x = 960
c_y = 540
K = np.array([
[f_x, 0, c_x],
[0, f_y, c_y],
[0, 0, 1]
])
# depth Z (known, 36 cm)
Z = 0.36 # m
# Image coordinate vector (x, y, 1)
image_point = np.array([_x, _y, 1])
X= (_x-c_x)*Z/f_x
Y= (_y-c_y)*Z/f_y
# Z =0.36 in article is 0.40
camera_coordinates = [X*1000,Y*1000,Z*1000] #m to mm
print(f"camera_coords:{camera_coordinates}")
return camera_coordinates
4. Robot Control● A status identifier is set to prevent command conflicts during the execution process.
● After completing the movement, the robot arm returns to its origin, and the status identifier is released to allow the next movement.
def move_one_object(_myCobot,_Coords_X,_Coords_Y,_Coords_Z):
m = _myCobot
coords_object = [float(_Coords_X),float(_Coords_Y),float(_Coords_Z),-175,0,130]
print(f" now move to {coords_object}")
m.send_coords(coords_object,40,1)
while check_move(m,coords_object) != True:
time.sleep(0.5)
print("----- waiting -----")
m.set_pro_gripper_close(14)
sleep(3)
return coords_object
5. Function Integration and Testingimport time
from time import sleep
import math
import cv2
import torch
import matplotlib.pyplot as plt
from ultralytics import YOLOv10
import numpy as np
from pymycobot import MyCobot320,utils
# define robot
mc = MyCobot320(utils.get_port_list()[0])
mc.set_pro_gripper_torque(gripper_id=14, torque_value=10)
# define model
model = YOLOv10('best.pt')
# define camera
cap = cv2.VideoCapture(0)
# define robot init status
robot_status = 0
# check camera status
if not cap.isOpened():
print("could not open the camera")
exit()
def get_yolov10_result(_model,_input_image):
boxes = []
confidences = []
if _input_image is None:
print("Error: Image is not loaded.")
return boxes,confidences
results = _model(_input_image)
for result in results:
for detection in result:
box = detection.boxes.xyxy
confidence = detection.boxes.conf
#print(detection)
# set freshould
if confidence[0]< 0.5:
continue
boxes.append(box.numpy())
confidences.append(confidence.numpy())
for box in boxes:
x_min, y_min, x_max, y_max = box[0]
# draw detect rectangle
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
color = (0, 255, 0)
thickness = 2
_input_image = cv2.rectangle(_input_image, (x_min, y_min), (x_max, y_max), color, thickness)
# draw the center point
x_center = (x_min + x_max) // 2
y_center = (y_min + y_max) // 2
center_color = (0, 0, 255) # red
radius = 5 # raduis of point
_input_image = cv2.circle(_input_image, (x_center, y_center), radius, center_color, -1)
#print(f"x_center: {x_center},\n y_center: {y_center}")
# show
image_show = cv2.resize(_input_image, (640, 640))
cv2.imshow("Displayed Image", image_show)
cv2.waitKey(1)
return boxes,confidences
def get_best_object(_boxes,_confidences):
best_index = _confidences.index(max(_confidences))
best_box = _boxes[best_index]
x_min, y_min, x_max, y_max = best_box[0]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
x_center = round((x_min + x_max) // 2 / 640 * 1920)
y_center = round((y_min + y_max) // 2 / 640 * 1080)
#print(f"x_center: {x_center},\n y_center: {y_center}")
return x_center,y_center
def get_camera_coordinate(_x,_y):
# camera intrisic matrix
f_x = 933.33 #
f_y = 933.33 #
c_x = 960
c_y = 540
K = np.array([
[f_x, 0, c_x],
[0, f_y, c_y],
[0, 0, 1]
])
# depth Z (已知, 36 cm)
Z = 0.36 #m
X= (_x-c_x)*Z/f_x
Y= (_y-c_y)*Z/f_y
camera_coordinates = [X*1000,Y*1000,Z*1000] # mm
print(f"camera_coords:{camera_coordinates}")
return camera_coordinates
def get_robot_coordinate(_cam_point_xyz):
R_y_180 = np.array([
[-1, 0, 0],
[0, 1, 0],
[0, 0, -1]
])
T = np.array([0, 450, 400])
T_cam_to_arm = np.eye(4)
T_cam_to_arm[:3, :3] = R_y_180
T_cam_to_arm[:3, 3] = T
camera_coordinates = np.array(
[[_cam_point_xyz[0]], [_cam_point_xyz[1]], [_cam_point_xyz[2]]])
arm_coordinates = np.dot(R_y_180,camera_coordinates)
arm_coordinates[1,0] += 450
arm_coordinates[2,0] += 360
print(arm_coordinates)
return arm_coordinates
def check_move(_myCobot,_goal):
m = _myCobot
recent_coords=m.get_coords()
dist = math.sqrt(pow(_goal[0]-recent_coords[0],2)+pow(_goal[1]-recent_coords[1],2)+pow(_goal[2]-recent_coords[2],2))
print(f"dist={dist}")
if dist < 12:
return True
else:
return False
def move_one_object(_myCobot,_Coords_X,_Coords_Y,_Coords_Z):
m = _myCobot
coords_object = [float(_Coords_X),float(_Coords_Y),float(_Coords_Z),-175,0,130]
print(f" now move to {coords_object}")
m.send_coords(coords_object,40,1)
while check_move(m,coords_object) != True:
time.sleep(0.5)
print("----- waiting -----")
m.set_pro_gripper_close(14)
sleep(3)
return coords_object
def move_trash(_myCobot):
m = _myCobot
coords_trash=[250,150,280,160,0,70]
m.send_coords(coords_trash,50,1)
time.sleep(1)
return coords_trash
def move_base(_myCobot):
m = _myCobot
base_coords = [250,150,280,160,0,70]
m.send_coords(base_coords,50,1)
time.sleep(2)
m.set_pro_gripper_open(14)
sleep(2)
return base_coords
if __name__ == "__main__":
gripper_height =185
init_status = check_move(mc,move_base(mc))
while init_status != True:
time.sleep(1)
print("robot init")
robot_status = 0
while True:
ret, frame = cap.read()
if not ret:
print("No Image")
break
# crop to 640x640
image_resized = cv2.resize(frame, (640, 640))
image_rgb = image_resized.copy()
cv2.imshow("1",image_rgb)
cv2.waitKey(1)
detect_boxes,detect_confidences = get_yolov10_result(model, image_rgb)
print(f"len boxes:{len(detect_boxes)}")
if len(detect_boxes) != 0 and robot_status==0:
robot_status = 1
x_image, y_image = get_best_object(detect_boxes, detect_confidences)
xyz_camera = get_camera_coordinate(x_image, y_image)
xyz_robot = get_robot_coordinate(xyz_camera)
#limit the robot arm
if xyz_robot[1] > 350.0:
xyz_robot[1] = 340.5
move_object = move_one_object(mc,xyz_robot[0],xyz_robot[1],xyz_robot[2]+gripper_height)
current_status = check_move(mc, move_base(mc))
while current_status != True:
print("wait for mobing to base")
time.sleep(0.5)
robot_status = 0
1. Object Recognition Optimization: By training a more accurate YOLOv10 model, we aim to enhance the recognition capability for objects with different shapes and forms.
2. Increase Complexity of Detection Model: The system will be expanded to handle more complex tasks, such as multi-object recognition and grabbing, as well as dynamic target recognition.
SummaryThis case demonstrates how combining the YOLOv10 object detection model with the 6 DOF robot arm myCobot 320 enables automatic object identification and disposal for desktop cleaning task. By effectively configuring both hardware and software, the cobot is able to perform tasks efficiently in complex environments, providing strong support for automation applications. We're excited to see more users and makers using the myCobot series robots to create innovative projects and participate in our cases collection activity: https://www.elephantrobotics.com/en/call-for-user-cases-en/.
Comments
Please log in or sign up to comment.