The Edge AI Kit uses dual cameras to enable depth perception or distance recognition applications.
The Stereo Vision application detects a human face in both camera images and then calculates the disparity between the ROI (region of interest) in each camera’s image. This disparity is used to calculate the subject’s distance from the camera.
SetupSetup the Edge AI Kit according to the instructions here.
You can also find more details in the Edge AI Kit Linux User Manual.
To use the StereoVision application, you will need to use an external display (either the included LVDS touch screen or an HDMI monitor). Attach a keyboard and mouse to the Edge AI Kit.
ImageThe application has already been built for for you and is included in the out of box image. If you would prefer to build your own version, please check out the project Building an ML-enabled Yocto image for the Edge AI Kit (coming soon).
Download the image with the application from the Avnet Boards page under the "Reference designs" tab. Unzip it (you should end up with a.wic image).
Use Balena Etcher or a similar software to flash the image to your SD card.
Camera CalibrationWe will need to calibrate our cameras to ensure that the images they capture are aligned correctly. This process involves taking pictures of a checkerboard pattern from different angles and then using those images to calculate the camera parameters.
The checkerboard is here: https://github.com/Avnet/stereovision-app/blob/main/documentation/Checkerboard-A4-25mm-8x6.pdf
The image must be printed in A4 format. Attach it to a stiff backing, such as cardstock. It's easiest if the backing is slightly larger than the chessboard so that there is space to hold on to it as you move it from side to side. The chessboard size is critical to the distance measurement. If the size of your board is not identical, you must adjust the corresponding variables in file "calibration.cpp" and rebuild the Qt application.
More details about camera calibration can be seen in Camera Calibration using OpenCV and Camera Calibration OpenCV
Run the applicationOnce your Edge AI Kit has been set up and booted, it should open Wayland Desktop.
Open a terminal on the Wayland desktop. Enter the command:
stereovision
This will open the stereovision Qt application.
1. Select “Start” to start the camera stream.
If this gives an error, run the command “systemctl stop ap1302-stream to make sure cameras are stopped, or go through the troubleshooting steps in section 3.3.5.
2. Select “Takephoto” to take a calibration photo. Position the chessboard so that it is in frame in both cameras. It is recommended to take at least calibration 8 photos with the chessboard in the corners of either camera frame (but always fully in frame in both cameras).
The intrinsic and extrinsic parameters of cameras will be saved in files "left.yml", "right.yml" and "stereo.yml" in folder "camera_data" located at the root of project. Note: if you edit the path or name of the images for the calibration, you need to adjust the corresponding variables in file "calibration.cpp".
3. Select “Calibrate” to calibrate the app.
4. Select “Face Detection” to begin detecting faces. This will also select rectification (e.g. the cameras have been calibrated to the chessboard). You can also run rectification without face detection to test the calibration.
The face detection model is a TensorFlow Lite model from MediaPipe., which is accelerated on the NPU using the NNAPI delegate.
The disparity calculation is done using OpenCV. Unfortunately openCV is not supported by the the acceleration provided by the Neural Processing Unit (NPU) integrated on Avnet i.MX 8M Plus Edge AI Kit.
The process of measuring distance consists of two steps:
1. Human detection.
2. Disparity calculate of the ROI (region of human)
The application detects human faces with artificial neural networks and selects the corresponding feature points of face in stereo images using SURF method. The distance is computed with disparity of the feature points.
Human detectionThe tflite model file, which is required to run network, is located in "dnnData" folder of the application code. You can checkout the latest face detection models from MediaPipe.
This application supports two models: "short range", which is designed for faces within 2 meters of the camera and the "full range", which is designed for faces within 5 meters of the camera. If you use other models, please adapt the parameters of FaceOptions in code according to the corresponding description file e.g. face_detection_full_range_common.pbtxt or Model card.
The article Understanding Bounding Box Outputs from Neural Networks explains how to use model and set parameters for face detection.
Disparity calculationDisparity refers to the distance between two corresponding points in the left and right image of a stereo pair. The disparity of a point in rectified image (camera calibration) can be used to calculate the coordinates of this point in the real world.
After calibration, the two feature points of a pair has theoretically the same v-coordinate in image. In this case the disparity refers to the difference of u-coordinates between two corresponding points in the left and right image. Once disparity is computed, the Z-coordinate in real world can be calculated since the other parameters can be obtained in file "camera_data/stereo.yml".
Each detected face in the left image will be matched with the one in right image. The two face blocks will be considered as matched if their width, height and v-coordinates are same. Once faces are matched, their disparity will be computed to calculate the distance.
Tuning parametersThe following parameters can be tuned to adjust the model's accuracy in different conditions:
- ConfidenceThresh [0.01, 1.0]: The variable defines the minimum confidence for face detection. Only the detected faces with confidence more than the threshold will be processed.
- FaceOverlapThresh [0.01, 1.0]: To increase the accuracy, the overlapped detected faces will be treated as one. The variable defines the maximum overlap. The two objects with overlap more than the threshold will be treated as one.
- Disparity_Y Thresh [0, 100]: The variable define threshold of the disparity difference in v-direction, which is used for face matching.
- Width_Rel_Thresh [0.01, 1.0]: The variable define threshold of reletive deviation of face block width. 0.1 means that the face blocks in left and right frame will not be matched each other if the difference of height more than 10%.
- Height_Rel_Thresh [0, 10]: The variable define threshold of reletive deviation of face block height. 0.1 means that the face blocks in left and right frame will not be matched each other if the difference of height more than 10%.
- SkipFrame [0, 20]: To accelerate the calculation, not all the frame will be process. If the variable set to 4, then only one frame will be process in every 4 frames.
Comments