The main objective of this project is to perform detection and tracking of faces from the real-time input video. The tracking of the face is done with the help of Arduino uno microcontroller. The microcontroller arduino is connected to two servo motors and two Drive motors. The servos are centered before tracking begins. The coordinates obtained from the bounding box in ESP32 is used to track the face in the subsequent frames. The servos control panning and tilting the webcam mounted on it. The webcam’s position changes according to the movement of the object or person.
Design and Building
The input video stream is obtained using an ESP32-CAM. The ESP-WHO framework takes QVGA (320×240) images as input. Face detection is implemented using MTCNN and MobileNet, and will return the position of any faces in the image if present. Each frame is examined for a face. It is operative only on frontal faces. Once the face is spotted, a bounding box is drawn around it.
The coordinates of the box obtained after detecting the face in a frame is written onto the ESP32 microcontroller.
ESP32 equipped with other sensors such as ultrasonic sensors, IR etc. does not have enough pins to control 2 servos and 2 motors. In addition, the image processing puts a heavy load on the ESP32. In the end I decided on a combination of 2 processors. One component is the Arduino platform with 2 motors and 2 servos and ESP32 as the input video stream source, as the picture below shows.
Aside from the fact that the program takes at least 200 ms for face recognition, any further delay would be undesirable, which was another reason for an additional processor. The second processor takes full control of the task over the servo as well as the vehicle movement.
The ESP32-CAM can be programmed using the Arduino IDE, which supports the ESP32 platform.
For those who have little or no experience with the ESP32 camera development boards can start with the following tutorial: ESP32-CAM Video Streaming and Face Recognition with Arduino IDE
In my previous projects were building a remote control for car robots and Robot Arm. Detailed description of how to develop the project is here. To control the movement of the robot car, it should be connected to a platform with the motor driver or other motor controllers that can be controlled by the ESP32-CAM or a Bluetooth. A detailed description can be found on the page here.
CodeAlthough tremendous strides have been made in face recognition, one of the remaining open challenges is how to achieve real-time speed on the CPU and maintain high performance, as effective face recognition models tend to be computationally intensive. To meet this challenge, I propose a novel concept that offers superior performance in terms of both speed and accuracy.
The code shown below was the minimum needed to be able to detect the images of the camera, so web server is not active because video streaming was not necessary.
When you build the model in C++ for the ESP you can use a lot of functions in the ESP-WHO library: esp_camera.h, the fd_forward.h and fb_gfx.h; that allows us to initialize and interact with the camera and provide the second function to detect faces in the image.
They help get your data into the right format and perform operations you need (look at the header files for names/explanations of the functions available) finally, once all that’s done you’ll want to make your inference.
If someone had problems and the program keeps rebooting, then the library can help:
#include "soc/soc.h" // disable brownout problems
#include "soc/rtc_cntl_reg.h" // disable brownout problems
How to find center of faceA function “called draw_face_boxes” that is normally used to provides a detected face, display a box around.
Using the function as a result we get coordinates for the top point on the left side of the box frame. As can be seen from the picture, the center of the box is different from the center of the picture. So camera should move up and right as shown by blue arrows on the picture. The X and Y co-ordinates of this box combined with its height and width can be used find the centre of the box and therefore the centre of the face.
Center of box is: x= 74 px, y= 82px
Center of face in frame:
X= 74+ half height/2= 74+ 100/2=134px
Y= 82+ half height/2 = 82+140/2=152px
To conversion from pixels to degrees, for QVGA (320px ×240px, diagonal 400px) divide diagonal with the view field of camera. I’m using a OV2640 camera module, which shipped together with my board ESP32 and has 60° (with Multiple Lens Options- Available FoV: 100°/120°/140°/170° ).
For my camera it gets the pixels per degree of rotation:
400/60= 6, 7
Now distance image center from the frame center converted into degrees is given with the following forms:
posH =posH + (160 - face_center_pan)/7;
What remains is to send the value of angle for servo via serial2.
Serial2.printf("H%d \n", posH);
A code can be added so that the motors and servos only activate when the face is outside the frame.
Due to servo movement
of 10° and 170° for panorama one must limit horizontal movement.
If the object is at the edge of the picture, then you can turn Module with motors. Code is as below.
As noted, when the module turn left or right, the position for the servo is corrected by 20 degrees, so instead of 10° we set it to 30°, which suits the corresponding turn.
We don't need correction for the servo that positions the camera in the vertical direction. Because of this, code is correspondingly simpler.
We will also define a function to take care of the camera initialization -initCamera(). We will then call this function on the Arduino setup. Among the multiple initialization parameters, we will set the frame size to QVGA, which is the recommended resolution for the face detection.
In part setup, we start by opening two serial connections, so we can output a message and commands, when we detect faces on the captured image.
After that we will call the mtmn_init_config function (we will be using the default configurations), which takes no arguments and returns a set of default MTN configurations, we can use right away to start detecting faces in the camera images.
mtmn_config_t mtmn_config = {0};
We will write the rest of our code in the Arduino main loop. The first thing we need to do is obtaining a camera frame. So, we start by declaring a variable that will hold a pointer to a struct of type camera_fb_t. This struct will hold a pointer to the buffer containing the actual image and also some metadata such as the width and the height of the image and the length of the buffer that contains it.
camera_fb_t * frame;
Then call the esp_camera_fb_get function to get an image from the camera, which we will store on our previously declared variable.
frame = esp_camera_fb_get();
This function takes no arguments and returns a pointer to a camera_fb_t struct, which we will store on our previously declared variable.
Then we call to the dl_matrix3du_alloc function (Deep Learning Library). As output, this function will return a pointer to the allocated matrix struct.
Additionally to working with this struct type, the detection function also expects the image to be in the RGB888 format. To convert the captured image to RGB888 format we will call the “fmt2rgb888” function, which will convert our original image (in JPEG) to the RGB888 format.
esp_camera_fb_return(frame);
This function call will allow the image buffer to be reused again, which makes sense since we will continuously grab new images and we don’t need to keep the old ones.
static void draw_face_boxes(dl_matrix3du_t *image_matrix, box_array_t *boxes)
Function called draw_face_boxes is used to display a box around a detected face.
box_array_t *boxes = face_detect(image_matrix, &mtmn_config);
A box_array_t type value contains face boxes, as well as score and landmark of each box: as coordinates: left top, right down, landmarks.
if (boxes != NULL) {
We just want to know if faces were detected in the image or not. We will simply check if this pointer is non-NULL and if not increment the noDetection counter, which is of interest to a "search function". If this pointer is different from NULL draw_face_boxes(image_matrix, boxes) sent command (as print) for move, tilt or pan camera.
ElseHowever, if there is no face to be recognized for a certain time, then you have to activate a function to search. As a timer for the search function, we use a variable "noDetection", which accumulates with each unsuccessful attempt at face detection. Action within "Else" is split so that search runs first on the side where the last mall face is located. Then set for to other side. Since the timer variable is "noDetection", each step follows in about 200-240 ms. If face recognition would have come in the meantime, the entire action within "Else" is cancelled.
Part of "Else" can be seen in the picture below.
From the Robot project Multi-Functional 2WD driving Straight Robot Car and “Remote control and Video Monitoring with ESP32 for Robot “we use a table for different commands.
A code can be added so that the motors and servos only activate when the face is outside the frame.
Buttons to move the car in Left, Right, Forward and reverse directions
To control a camera, we could change the commands, for example instead of "2" for left and "4" for down, would be a full command for servo to horizontal position or servo to vertical position, each followed by a number representing the indicates angular positions.
With the command we can send the exact position to both servos (panorama and tilt).
The main thing is that there are two more rows in the “Loop” area in “switch(getstr)”.
// V or H case string for Servo
case 'V': posV = Serial.parseInt();movToV();break; //now holds posV
case 'H': posH = Serial.parseInt();movToH();break; //now holds posH and mov()
default: break;
and of course there are two subroutines:
void movToV()
{
//++++++++++++++ A Head Vertikal turn to posV
//stp();
if (posV<EndDown) posH=EndDown;
if (posV>EndUp) posH=EndUp;
myservoV.write(posV);
delay(t); // delay 40/5ms(used to adjust the servo speed)
}
void movToH()
{
// ++++++++++++ A Head Horizontal turn to posH
//stp();
if (posH<10) posH=10;
if (posH>170) posH=170;
myservoH.write(posH);
delay(t); // delay 20/5ms(used to adjust the servo speed)
}
The program refers to DRV8835 H-Bridge. Since I would have to replace the DRV8835 H-Bridge later, I made a few small changes for the new module with L298N and then the old program fits without any changes. Schematics can be downloaded from the site as a Word document.
Testing the codeTo test the code, compile it and upload it to your ESP32 to make sure it is correctly connected to the camera. Also connect pin 13-Rx and 15-Tx to Arduino platform and normal serial to FT232RL FTDI Programmer - USB adapter. Once the procedure finishes, open the Arduino IDE serial monitor.
Then point the camera at your face. You should see something similar to Figure (right side) in Serial Monitor. On serial monitor you can see that for each face detection timestamp was sent, as well as position in frame, followed by commands for servo H and V with corresponding angle.
Look an an example in the picture: ESP sends horizontal -H91 and vertical servo -V89 commands, and also the exact position of frame 150 point and time when the face edge is 2293 ms. If a face is detected in the captured images and if the Arduino Platform also responds correctly to commands, the camera tries to tracking a face.
If the face is not detected, then variable "noDetection" is incremented. As noted, if face is lost, servo first tries to move camera in the same direction. If there is no detection, camera keeps moving to other side. Movement is slower, about 1° for 40ms. In this case, however, ESP loses information about exact position. Therefore, a fixed position is set at certain points in time, which can help the ESP32 face recognition to quickly determine the position. The variable noDetection was used as a timer, since its increase follows after about 200-240ms. If Face detected, all commands in the "Else" will not be executed.
ConclusionOn the video, the robot looks like a "Big Brother", which doesn't sound sympathetic, since "Big Brother" stands for state control and intrusion into the life of the individual. But with appropriate make up, our robot looks more likeable. In any case, for this you need a bit of tinkering.
Unfortunately, “Follow Me” feature is only limited to Face Tracking, which is the only option for ESP-WHO framework. For a real search, OpenCV library would be more suitable. OpenCV.js runs in a browser which allows rapid trial of OpenCV functions by someone with only a modest background in HTML and JavaScript.
This brings many advantages, but would be a completely different concept. That's why we leave that for the next projects.
Comments