The project was created by Long Da and his partner EnshenZhu, who has open-sourced and published related content on Gitbuband Youtube.
This is a demonstration video of the case:
In this project, we'll demonstrate how to use the 6 DOF collaborative robot myCobot 320 M5 to grab objects at random positions in a camera-detected area. We will leverage object detection funciton of Large Language Model(LLM), Python scripts for controlling the robot's movements, and voice commands for more interactive control.
This project is split into multiple components:
● Object detection using Qwen LLM.
● Real-time robotic control and feedback with dual-threading.
● Voice command integration.
● Efficient random positioning for grabbing objects.
What You’ll Need● Robot arm myCobot 320 M5.
● Camera for real-time object detection.
● Python 3.x.
● Libraries: pymycobot, OpenCV, PyTorch, Numpy, OpenAI.
myCobot 320 M5The myCobot 320 M5 is a collaborative robot designed for education, research, and light industrial automation. With a maximum arm span of 350 mm and a payload capacity of up to 1 kg, it is suitable for users of all skill levels. The cobot supports Python and features hardware interfaces like I/O and USB, making it easy to connect with various sensors and actuators. It also offers comprehensive open-source libraries and APIs, simplifying the development process, and is compatible with a variety of development environments, encouraging users to engage in extended development.
1. Wi-Fi Setup: Ensure the robot is connected to a Wi-Fi hotspot.
2. Software: Install Python libraries and dependencies:
pip install pymycobot opencv-python torch numpy
Step-by-Step Instructions1. Set up Your Wi-Fi HotspotBefore starting the wireless robot control, create a Wi-Fi hotspot with the following details:
● SSID: MyCobotWiFi2.4G
● Password: mycobot123
● Make sure your Wi-Fi band is 2.4 GHz.
2. Install Required Python LibrariesRun the following command to install the necessary libraries:
pip install pymycobot opencv-python torch numpy
3. Build AI Agent of Qwen ModelGet your API key from the official site of Qwen to replace the 'qwen_apikey' in code,
# Your Qwen API_KEY
import yaml
apikey = yaml.safe_load(open('env/configs.yaml', 'r'))["qwen_apikey"]
# the whole prompt in english
prompt = """I am about to give a command to the robot arm. Please help me extract the starting object and the ending object from this sentence, find the pixel coordinates of the upper left corner and the lower right corner of the two objects from this picture, and output the json data structure.
For example, if my command is: Please help me put the red square on the house sketch.
You output the following format:
{
"start":"red square",
"start_xyxy":[[102,505],[324,860]],
"end":"house sketch",
"end_xyxy":[[300,150],[476,310]]
}
Just reply to the json itself, don't reply to other content
My current command is: """ + command
Use a LLM to process the input image and find the upper left and lower left coordinate points of the target object, and obtain the position of the object in the robot coordinate system via the method of coordinate system transformation.
# take picture and send to Qwen model
import cv2
from PIL import Image
# turn on camera
cap = cv2.VideoCapture(1)
if not cap.isOpened():
print("Unable to open camera")
exit()
# while True:
ret, frame = cap.read()
if not ret:
print("Unable to read frame")
# break
image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
width, height = image.size
print(width, height)
input_height,input_width = smart_resize(height,width,min_pixels=min_pixels, max_pixels=max_pixels)
response = inference_with_api(frame, prompt, min_pixels=min_pixels, max_pixels=max_pixels)
print(response)
im = image
bounding_boxes = response
# Load the image
# img = im
# Parsing out the markdown fencing
bounding_boxes = parse_json(bounding_boxes)
# font = ImageFont.truetype("NotoSansCJK-Regular.ttc", size=14)
try:
json_output = ast.literal_eval(bounding_boxes)
except Exception as e:
end_idx = bounding_boxes.rfind('"}') + len('"}')
truncated_text = bounding_boxes[:end_idx] + "]"
json_output = ast.literal_eval(truncated_text)
start, start_box = json_output["start"], json_output["start_xyxy"]
end, end_box = json_output["end"], json_output["end_xyxy"]
# Convert normalized coordinates to absolute coordinates
# start object
start_abs_y1 = int(start_box[0][1]/input_height * height)
start_abs_x1 = int(start_box[0][0]/input_width * width)
start_abs_y2 = int(start_box[1][1]/input_height * height)
start_abs_x2 = int(start_box[1][0]/input_width * width)
start_x_center = int((start_abs_x1 + start_abs_x2) / 2)
start_y_center = int((start_abs_y1 + start_abs_y2) / 2)
# start_y_center = max(start_abs_y1, start_abs_y2)
if start_abs_x1 > start_abs_x2:
start_abs_x1, start_abs_x2 = start_abs_x2, start_abs_x1
if start_abs_y1 > start_abs_y2:
start_abs_y1, start_abs_y2 = start_abs_y2, start_abs_y1
# end object
end_abs_y1 = int(end_box[0][1]/input_height * height)
end_abs_x1 = int(end_box[0][0]/input_width * width)
end_abs_y2 = int(end_box[1][1]/input_height * height)
end_abs_x2 = int(end_box[1][0]/input_width * width)
end_x_center = int((end_abs_x1 + end_abs_x2) / 2)
end_y_center = int((end_abs_y1 + end_abs_y2) / 2)
# end_y_center = max(end_abs_y1, end_abs_y2)
if end_abs_x1 > end_abs_x2:
end_abs_x1, end_abs_x2 = end_abs_x2, end_abs_x1
if end_abs_y1 > end_abs_y2:
end_abs_y1, end_abs_y2 = end_abs_y2, end_abs_y1
print(f"start position: ({start_abs_x1}, {start_abs_y1}) ({start_abs_x2}, {start_abs_y2}), center coordinate: ({start_x_center}, {start_y_center})")
print(f"end position: ({end_abs_x1}, {end_abs_y1}) ({end_abs_x2}, {end_abs_y2}), center coordinate: ({end_x_center}, {end_y_center})")
4. Combine AI Instructions with Python API to Control Robot# connect mycobot 320
from pymycobot import MyCobot320Socket
import time
def get_ip_config():
# open YAML file
with open('env/configs.yaml', 'r') as file:
data = yaml.safe_load(file)
# read ip and port
ip_address = data['ip']
netport = data['port']
return ip_address, netport
ip_address, netport = get_ip_config()
mc = MyCobot320Socket(ip_address, netport)
time.sleep(1)
mc.focus_all_servos()
time.sleep(1)
print("\n---> set_gripper_mode(0) => pass-through")
ret_mode = mc.set_gripper_mode(0)
print(" Return code:", ret_mode)
time.sleep(1)
home_angles = [0, 0, 0, 0, 0, 0]
print("\n---> Move to home position:", home_angles)
mc.send_angles(home_angles, 30)
time.sleep(3)
speed = 30
print("\n---> Open gripper")
mc.set_gripper_state(0, 100)
time.sleep(2)
print(f"Captured camera start coordinate from vision: ({start_x_center},{start_y_center})" )
print(f"Captured camera end coordinate from vision: ({end_x_center},{end_y_center})" )
import numpy as np
H = np.array([
[6.60782927e-04, 2.48469514e+00, -5.96091742e+02],
[3.82506417e-01, 4.06164160e-01, -2.18163280e+02],
[9.21284300e-05, -5.55189057e-03, 1.00000000e+00]
])
def convert_camera_to_robot(camera_coord, H):
u, v = camera_coord
point_h = np.array([u, v, 1.0])
robot_h = H.dot(point_h)
robot_h /= robot_h[2]
return (robot_h[0], robot_h[1])
start_robot_xy = convert_camera_to_robot((start_x_center, start_y_center), H)
print("Converted robot start (x, y):", start_robot_xy)
end_robot_xy = convert_camera_to_robot((end_x_center, end_y_center), H)
print("Converted robot end (x, y):", end_robot_xy)
pick_z = 165
pick_orientation = [-179.46, -6.69, 95.57]
pick_coords = [start_robot_xy[0], start_robot_xy[1], pick_z] + pick_orientation
print("Pick coordinates:", pick_coords)
place_coords = [end_robot_xy[0], end_robot_xy[1], pick_z+50] + pick_orientation
print("\n---> Move to pick coordinates")
print(pick_coords)
mc.send_coords(pick_coords, speed, 1)
time.sleep(3)
print("\n---> Close gripper to grasp block")
mc.set_gripper_state(1, 100)
time.sleep(2)
pick_coords_ascend = [pick_coords[0], pick_coords[1], pick_coords[2] + 50] + pick_orientation
print("\n---> Ascend after grasping (z + 150)")
print(pick_coords_ascend)
mc.send_coords(pick_coords_ascend, speed, 1)
time.sleep(3)
print("\n---> Move to place coordinates")
print(place_coords)
mc.send_coords(place_coords, speed, 1)
time.sleep(3)
print("\n---> Open gripper to release block")
mc.set_gripper_state(0, 100)
time.sleep(2)
print("\n---> Return to home position")
mc.send_angles(home_angles, 30)
time.sleep(3)
print("\n---> Close gripper (final state)")
mc.set_gripper_state(1, 100)
time.sleep(2)
print("\nPick & Place sequence completed.\n")
Then you will see the the ouput details pf pick & place program.
We would like to extend our sincere gratitude to the two creative makers behind this project. By building an AI Agent that understands input images, identifies target positions, and controls a robotic arm myCobot 320 accordingly, they have successfully integrated vision with LLMs to create an intelligent motion system. We hope more people will apply this technology to everyday life, bringing greater convenience to the world.
Developers are welcome to participate in our User Case Initiative and showcase your innovative projects: https://www.elephantrobotics.com/en/call-for-user-cases-en/.
Comments
Please log in or sign up to comment.