Ever wondered about a turret that moves in both horizontal and vertical directions to track faces and that shoots at the point blank of the target? MechaMachine is the answer. It is a real-time face tracking turret which detects the faces, classifies whether it is a friend or an enemy, and shoots at the target.
The main idea of the project is to build an autonomous real-time face tracking turret which shoots at the faces by tracking them. This involves the computation feasibility of detecting and plotting the faces along with the mechanical structures to shoot the projectiles at the target.
STEP 1: The MECHAnical part of the MechaMachineThe intention was to make a turret like structure inspired by the Turret of the character/agent KillJoy in the video game Valorant with the constraints of a mount for holding a camera for tracking and a mechanism to shoot bullets with a place to install a laser to point out the target.
Based on the inspiration, we decided to make a 3D printed build of the MechaMachine. These were the following constraints to be adhered to:
- A place/mount to hold the camera.
- A space to fit a laser module.
- A shooting mechanism which holds a magazine of nerf bullets.
- A pan and tilt setup for the movement of the turret.
With the following constraints in mind the final design seemed to look like this when designed on Fusion360 and AutoCAD:
The STL parts attached were then 3D printed using ABS. The infill was set to 40%. We chose to print it in ABS as PLA didn’t seem to work out with the speedy movement of the mechanism causing its breakdown. The strength and durability of ABS at 40% infill sufficed our needs.
STEP 2: The Functioning of the MechaMachineTo understand the way MechaMachine functions we must look into the approach of it:
- Use the camera get frames.
- Detect if any faces are present in the frames.
- Classify the detected faces into friends and foes.
- Calculate the coordinates of the face and mark the target.
- Move the setup to make sure the face is being tracked.
- Shoot at the target using the shooting mechanism.
- Once shot, stabilise the MechaMachine.
To make the computation fast and the movement real-time, a decision was made to use 2 computation units; one for detecting and classifying the faces and the other for the controlling of the movements and shooting.
AMD Xilinx KRIA KR260 SOMwas used as the computation unit or the CPU which takes input from the camera and marks the coordinates to send it to the MCU.
The MCU is used to control the mechanisms to move around, track the faces and also to shoot the Nerf Darts at the target. The MCU used here is ArduinoUNOR3.
For the connections part, the AMD Xilinx KRIA KR260 SOM is connected to the camera and the ArduinoUNOR3. The camera sends the video frames to the AMD Xilinx KRIA KR260 SOM using the usb port. There is a serial communication established between the AMD Xilinx KRIA KR260 SOM and the ArduinoUNOR3. at 9600 baud rate to transfer the data.
The ArduinoUNOR3 controls the movement by moving the pan and tilt servos which are connected to it using the digital pwm pins on the ArduinoUNOR3.
The ArduinoUNOR3 also controls the shooting mechanism and the laser module.
STEP 4: The CodeDrafting the code involves the following parts:
- Code for the ArduinoUNOR3 to control the MechaMachine.
- Code for the AMD Xilinx KRIA KR260 SOMto detect and map the face using OpenCV.
- Training a custom YOLOv5 Model for classification on Google Colab using RoboFlow as the annotating tool.
- Quantising the model to make it work on the AMD Xilinx KRIA KR260 SOM.
To understand the communication between the AMD Xilinx KRIA KR260 SOM and ArduinoUNOR3 we must look into the data structure which is been transferred using serial communication: [startMarker, x_coordinate, y_coordinate, arm_motor, fire, endMarker]
startMarker => this variable indicates the start of the data being sent by the python code.
x_coordinate => this indicates the x coordinates of the face being sent by the python code.
y_coordinate => this indicates the y coordinates of the face being sent by the python code.
arm_motor => this indicates whether the shooting mechanism should be armed or not.
fire => this indicates whether MechaMachine should shoot or not.
endMarker => this is the end marker of the data being sent by the computer.
The Python Code for KRIA KR260 SOM:
/* Please refer the given GitHub repository for the whole code this sub-section has only the pseudo-code: */
Initialize:
Import necessary libraries (cv2, serial, time, torch)
Load quantized YOLOv5 model weights ('model.pt')
Initialize Serial connection to Arduino ('com4', 9600)
Define Constants:
tolerance_x = 640 // 2 - 30
tolerance_y = 480 // 2 - 30
tolerance_w = 640 // 2 + 30
tolerance_h = 480 // 2 + 30
arming_tolerance_x = tolerance_x - 25
arming_tolerance_y = tolerance_y - 25
arming_tolerance_w = tolerance_w + 25
arming_tolerance_h = tolerance_h + 25
startMarker = 999
endMarker = 998
Main Execution Loop:
Open VideoCapture device (0) as 'cap'
While True:
success, img = cap.read()
Flip image horizontally (cv2.flip(img, 1))
# YOLOv5 Face Detection
img0 = Convert 'img' to RGB format (cv2.cvtColor(img, cv2.COLOR_BGR2RG B))
Resize image to 'img_size' and perform letterboxing (letterbox function)
Convert image to Torch tensor and move to device (img.to(device).float())
Perform inference with YOLOv5 model (model(img)[0])
Apply non-maximum suppression to detections (non_max_suppression function)
Process Detections:
If detections are found:
For each detection:
Extract coordinates and confidence (xyxy, conf)
Calculate center (x, y) and dimensions (w, h) of bounding box
Determine arm_motor and fire flags:
If center (x, y) is within arming_tolerance region:
Set arm_motor = 1
Else:
Set arm_motor = 0
If center (x, y) is within tolerance region:
Update in_box_time
If in_box_time > 500ms:
Set fire = 1
Else:
Reset in_box_time and fire
Construct serial command string:
string = 'S{0:d}X{1:d}Y{2:d}A{3:d}F{4:d}E{5:d}'.format(startMarker, x, y, arm_motor, fire, endMarker)
Print string for debugging
Encode string as UTF-8 and send to Arduino (ArduinoSerial.write(string.encode('utf-8')))
Draw visualizations:
Draw circle at (x, y) (cv2.circle(img, (x, y), 2, (255, 255, 255), 2))
Draw bounding box around face (cv2.rectangle(img, (x - w // 2, y - h // 2), (x + w // 2, y + h // 2), (0, 0, 255), 3))
Draw constraint rectangles on image:
cv2.rectangle(img, (tolerance_x, tolerance_y), (tolerance_w, tolerance_h), (0, 0, 0), 3)
cv2.rectangle(img, (arming_tolerance_x, arming_tolerance_y), (arming_tolerance_w, arming_tolerance_h), (255, 0, 0), 3)
Display processed image with annotations (cv2.imshow("MechaMachine", img))
Exit loop if 'q' key is pressed (cv2.waitKey(1) & 0xFF == ord('q'))
Release VideoCapture device and close all windows (cap.release(), cv2.destroyAllWindows())
TheArduinoCode:
/* Please refer the given GitHub repository for the whole code this sub-section has only the pseudo-code: */
Function Setup
InitializeServosAndPins()
InitializeVariables()
SetupSerialCommunication()
Function Loop
TurnOnLaser()
ReceiveData()
ArmMechaMachine()
If DataReceived Then
TrackFace()
SetTrigger()
ArmMechaMachine()
FireIfTriggered()
Function ReceiveData
If SerialDataAvailable Then
ParseSerialData()
Function TrackFace
ReadFaceCoordinatesFromBuffer()
AdjustServoPositions()
UpdateServos()
Function SetTrigger
If ShouldFireRequested Then
EnableFiring()
Function FireIfTriggered
If CanFire And IsArmed And NotFiring Then
StartFiringSequence()
Function ArmMechaMachine
If ArmRequested Then
ActivateArmMotor()
Else
DeactivateArmMotor()
Function ParseSerialData
ReadAndStoreDataFromSerial()
Function AdjustServoPositions
CalculateNewServoPositions()
Function StartFiringSequence
InitiateFiringMechanism()
Function EnableFiring
AllowMechaMachineToFire()
To train the face classification model using YOLOv5 refer the following documentation:
Extract the trained model and further quantise it to use it with the AMD Xilinx KRIA KR260 SOM.
To Quantise the model pt to be usable with the AMD Xilinx KRIA KR260 SOM follow the documentation:
Quantising of the custom YOLOv5 model
Result:MechaMachine takes 0.3 second to shoot without classifying the faces which is very close to real-time. Whereas with classification it takes around 1 second to make inference and shoot.
MechaMachine is the real-life replica of the Turret of KillJoy from the game Valorant but with the real life implementation which may help out law enforcement authorities and to aid the military with the autonomy of a machine to make decisions to shoot to prevent infiltrators in a no mans land.
We would like to thank AMD Pervasive AI contest to provide with the hardware needed for the computation. This was a great opportunity to build such a project.
Comments