What You’ll Need
myCobot 320 M5
Prerequisites
Step-by-Step Instructions
1. Set up Your Wi-Fi Hotspot
2. Install Required Python Libraries
3. Build AI Agent of Qwen Model
4. Combine AI Instructions with Python API to Control Robot
Summary

•

Published April 15, 2025 © GPL3+

Robotic Grasping Powered by Qwen AI and Computer Vision

This project shows how to build Qwen AI agent to understand image and dialogue, locate objects, and generate code for robot grasping.

BeginnerFull instructions provided2 hours209

Robotic Grasping Powered by Qwen AI and Computer Vision

Things used in this project

Software apps and online services

OpenCV – Open Source Computer Vision Library OpenCV

Hand tools and fabrication machines

Elephant Robotics myCobot 320 m5

Story

The project was created by Long Da and his partner EnshenZhu, who has open-sourced and published related content on Gitbuband Youtube.

This is a demonstration video of the case:

In this project, we'll demonstrate how to use the 6 DOF collaborative robot myCobot 320 M5 to grab objects at random positions in a camera-detected area. We will leverage object detection funciton of Large Language Model(LLM), Python scripts for controlling the robot's movements, and voice commands for more interactive control.

This project is split into multiple components:

● Object detection using Qwen LLM.

● Real-time robotic control and feedback with dual-threading.

● Voice command integration.

● Efficient random positioning for grabbing objects.

What You’ll Need

● Robot arm myCobot 320 M5.

● Camera for real-time object detection.

● Python 3.x.

● Libraries: pymycobot, OpenCV, PyTorch, Numpy, OpenAI.

myCobot 320 M5

The myCobot 320 M5 is a collaborative robot designed for education, research, and light industrial automation. With a maximum arm span of 350 mm and a payload capacity of up to 1 kg, it is suitable for users of all skill levels. The cobot supports Python and features hardware interfaces like I/O and USB, making it easy to connect with various sensors and actuators. It also offers comprehensive open-source libraries and APIs, simplifying the development process, and is compatible with a variety of development environments, encouraging users to engage in extended development.

Prerequisites:

1. Wi-Fi Setup: Ensure the robot is connected to a Wi-Fi hotspot.

2. Software: Install Python libraries and dependencies:

pip install pymycobot opencv-python torch numpy

Step-by-Step Instructions

1. Set up Your Wi-Fi Hotspot

Before starting the wireless robot control, create a Wi-Fi hotspot with the following details:

● SSID: MyCobotWiFi2.4G

● Password: mycobot123

● Make sure your Wi-Fi band is 2.4 GHz.

2. Install Required Python Libraries

Run the following command to install the necessary libraries:

pip install pymycobot opencv-python torch numpy

3. Build AI Agent of Qwen Model

Get your API key from the official site of Qwen to replace the 'qwen_apikey' in code,

# Your Qwen API_KEY
import yaml
apikey = yaml.safe_load(open('env/configs.yaml', 'r'))["qwen_apikey"]
 
# the whole prompt in english
prompt = """I am about to give a command to the robot arm. Please help me extract the starting object and the ending object from this sentence, find the pixel coordinates of the upper left corner and the lower right corner of the two objects from this picture, and output the json data structure.
 
For example, if my command is: Please help me put the red square on the house sketch.
You output the following format:
{
 
"start":"red square",
"start_xyxy":[[102,505],[324,860]],
"end":"house sketch",
"end_xyxy":[[300,150],[476,310]]
}
 
Just reply to the json itself, don't reply to other content
 
My current command is: """ + command

Use a LLM to process the input image and find the upper left and lower left coordinate points of the target object, and obtain the position of the object in the robot coordinate system via the method of coordinate system transformation.

# take picture and send to Qwen model
import cv2
from PIL import Image
# turn on camera
cap = cv2.VideoCapture(1)
if not cap.isOpened():
    print("Unable to open camera")
    exit()
 
# while True:
ret, frame = cap.read()
if not ret:
    print("Unable to read frame")
    # break
image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
width, height = image.size
print(width, height)
input_height,input_width = smart_resize(height,width,min_pixels=min_pixels, max_pixels=max_pixels)
response = inference_with_api(frame, prompt, min_pixels=min_pixels, max_pixels=max_pixels)
print(response)
 
im = image
bounding_boxes = response
# Load the image
# img = im
# Parsing out the markdown fencing
bounding_boxes = parse_json(bounding_boxes)
 
# font = ImageFont.truetype("NotoSansCJK-Regular.ttc", size=14)
 
try:
  json_output = ast.literal_eval(bounding_boxes)
except Exception as e:
  end_idx = bounding_boxes.rfind('"}') + len('"}')
  truncated_text = bounding_boxes[:end_idx] + "]"
  json_output = ast.literal_eval(truncated_text)
 
start, start_box = json_output["start"], json_output["start_xyxy"]
end, end_box = json_output["end"], json_output["end_xyxy"]
 
# Convert normalized coordinates to absolute coordinates
# start object
start_abs_y1 = int(start_box[0][1]/input_height * height)
start_abs_x1 = int(start_box[0][0]/input_width * width)
start_abs_y2 = int(start_box[1][1]/input_height * height)
start_abs_x2 = int(start_box[1][0]/input_width * width)
start_x_center = int((start_abs_x1 + start_abs_x2) / 2)
start_y_center = int((start_abs_y1 + start_abs_y2) / 2)
# start_y_center = max(start_abs_y1, start_abs_y2)
 
if start_abs_x1 > start_abs_x2:
    start_abs_x1, start_abs_x2 = start_abs_x2, start_abs_x1
 
if start_abs_y1 > start_abs_y2:
    start_abs_y1, start_abs_y2 = start_abs_y2, start_abs_y1
 
 
# end object
end_abs_y1 = int(end_box[0][1]/input_height * height)
end_abs_x1 = int(end_box[0][0]/input_width * width)
end_abs_y2 = int(end_box[1][1]/input_height * height)
end_abs_x2 = int(end_box[1][0]/input_width * width)
end_x_center = int((end_abs_x1 + end_abs_x2) / 2)
end_y_center = int((end_abs_y1 + end_abs_y2) / 2)
# end_y_center = max(end_abs_y1, end_abs_y2)
 
if end_abs_x1 > end_abs_x2:
    end_abs_x1, end_abs_x2 = end_abs_x2, end_abs_x1
 
if end_abs_y1 > end_abs_y2:
    end_abs_y1, end_abs_y2 = end_abs_y2, end_abs_y1
 
print(f"start position: ({start_abs_x1}, {start_abs_y1}) ({start_abs_x2}, {start_abs_y2}), center coordinate: ({start_x_center}, {start_y_center})")
print(f"end position: ({end_abs_x1}, {end_abs_y1}) ({end_abs_x2}, {end_abs_y2}), center coordinate: ({end_x_center}, {end_y_center})")

4. Combine AI Instructions with Python API to Control Robot

# connect mycobot 320
from pymycobot import MyCobot320Socket
import time
def get_ip_config():
    # open YAML file
    with open('env/configs.yaml', 'r') as file:
        data = yaml.safe_load(file)
 
    # read ip and port
    ip_address = data['ip']
    netport = data['port']
 
    return ip_address, netport
 
 
ip_address, netport = get_ip_config()
mc = MyCobot320Socket(ip_address, netport)
time.sleep(1)
 
mc.focus_all_servos()
time.sleep(1)
 
print("\n---> set_gripper_mode(0) => pass-through")
ret_mode = mc.set_gripper_mode(0)
print("     Return code:", ret_mode)
time.sleep(1)
 
home_angles = [0, 0, 0, 0, 0, 0]
print("\n---> Move to home position:", home_angles)
mc.send_angles(home_angles, 30)
time.sleep(3)
 
speed = 30
 
print("\n---> Open gripper")
mc.set_gripper_state(0, 100)
time.sleep(2)
print(f"Captured camera start coordinate from vision: ({start_x_center},{start_y_center})" )
print(f"Captured camera end coordinate from vision: ({end_x_center},{end_y_center})" )
 
import numpy as np
 
H = np.array([
    [6.60782927e-04,  2.48469514e+00, -5.96091742e+02],
    [3.82506417e-01,  4.06164160e-01, -2.18163280e+02],
    [9.21284300e-05, -5.55189057e-03,  1.00000000e+00]
])
 
def convert_camera_to_robot(camera_coord, H):
    u, v = camera_coord
    point_h = np.array([u, v, 1.0])
    robot_h = H.dot(point_h)
    robot_h /= robot_h[2]
    return (robot_h[0], robot_h[1])
 
start_robot_xy = convert_camera_to_robot((start_x_center, start_y_center), H)
print("Converted robot start (x, y):", start_robot_xy)
end_robot_xy = convert_camera_to_robot((end_x_center, end_y_center), H)
print("Converted robot end (x, y):", end_robot_xy)
 
pick_z = 165
pick_orientation = [-179.46, -6.69, 95.57]
pick_coords = [start_robot_xy[0], start_robot_xy[1], pick_z] + pick_orientation
print("Pick coordinates:", pick_coords)
 
place_coords = [end_robot_xy[0], end_robot_xy[1], pick_z+50] + pick_orientation
 
 
print("\n---> Move to pick coordinates")
print(pick_coords)
mc.send_coords(pick_coords, speed, 1)
time.sleep(3)
 
 
print("\n---> Close gripper to grasp block")
mc.set_gripper_state(1, 100)
time.sleep(2)
 
pick_coords_ascend = [pick_coords[0], pick_coords[1], pick_coords[2] + 50] + pick_orientation
print("\n---> Ascend after grasping (z + 150)")
print(pick_coords_ascend)
mc.send_coords(pick_coords_ascend, speed, 1)
time.sleep(3)
 
print("\n---> Move to place coordinates")
print(place_coords)
mc.send_coords(place_coords, speed, 1)
time.sleep(3)
 
print("\n---> Open gripper to release block")
mc.set_gripper_state(0, 100)
time.sleep(2)
 
print("\n---> Return to home position")
mc.send_angles(home_angles, 30)
time.sleep(3)
 
print("\n---> Close gripper (final state)")
mc.set_gripper_state(1, 100)
time.sleep(2)
 
print("\nPick & Place sequence completed.\n")

Then you will see the the ouput details pf pick & place program.

Summary

We would like to extend our sincere gratitude to the two creative makers behind this project. By building an AI Agent that understands input images, identifies target positions, and controls a robotic arm myCobot 320 accordingly, they have successfully integrated vision with LLMs to create an intelligent motion system. We hope more people will apply this technology to everyday life, bringing greater convenience to the world.

Developers are welcome to participate in our User Case Initiative and showcase your innovative projects: https://www.elephantrobotics.com/en/call-for-user-cases-en/.

Credits

Elephant Robotics

106 projects • 224 followers

An Innovative Robotics Company.

Contact

Robotic Grasping Powered by Qwen AI and Computer Vision

Things used in this project

Software apps and online services

Hand tools and fabrication machines

Story

What You’ll Need

myCobot 320 M5

Prerequisites:

Step-by-Step Instructions

1. Set up Your Wi-Fi Hotspot

2. Install Required Python Libraries

3. Build AI Agent of Qwen Model

4. Combine AI Instructions with Python API to Control Robot

Summary

Code

myCobot-320-random-position-grabbing-tasksition

Credits

Elephant Robotics

londaaaaa

Comments

Embed the widget on your own site

Robotic Grasping Powered by Qwen AI and Computer Vision

Robotic Grasping Powered by Qwen AI and Computer Vision

Things used in this project

Software apps and online services

Hand tools and fabrication machines

Story

What You’ll Need

myCobot 320 M5

Prerequisites:

Step-by-Step Instructions

1. Set up Your Wi-Fi Hotspot

2. Install Required Python Libraries

3. Build AI Agent of Qwen Model

4. Combine AI Instructions with Python API to Control Robot

Summary

Code

myCobot-320-random-position-grabbing-tasksition

Credits

Elephant Robotics

londaaaaa

Comments

Related channels and tags