Introduction
Setup
Image
Camera Calibration
Run the application
Application Details
Human detection
Disparity calculation
Tuning parameters

Published May 12, 2023 © LGPL

Stereo Vision Depth Perception on the Edge AI Kit

Build and run a Qt application that recognizes a face and detects how far it is from the camera using stereo depth perception

AdvancedProtip2 hours844

Stereo Vision Depth Perception on the Edge AI Kit

Things used in this project

Hardware components

Tria Technologies i.MX 8M Plus Edge AI Kit

8-65GB micro SD card

Software apps and online services

OpenCV – Open Source Computer Vision Library OpenCV

Yocto Project

Story

Introduction

The Edge AI Kit uses dual cameras to enable depth perception or distance recognition applications.

The Stereo Vision application detects a human face in both camera images and then calculates the disparity between the ROI (region of interest) in each camera’s image. This disparity is used to calculate the subject’s distance from the camera.

Setup

Setup the Edge AI Kit according to the instructions here.

You can also find more details in the Edge AI Kit Linux User Manual.

To use the StereoVision application, you will need to use an external display (either the included LVDS touch screen or an HDMI monitor). Attach a keyboard and mouse to the Edge AI Kit.

Image

The application has already been built for for you and is included in the out of box image. If you would prefer to build your own version, please check out the project Building an ML-enabled Yocto image for the Edge AI Kit (coming soon).

Download the image with the application from the Avnet Boards page under the "Reference designs" tab. Unzip it (you should end up with a.wic image).

Use Balena Etcher or a similar software to flash the image to your SD card.

Camera Calibration

We will need to calibrate our cameras to ensure that the images they capture are aligned correctly. This process involves taking pictures of a checkerboard pattern from different angles and then using those images to calculate the camera parameters.

The checkerboard is here: https://github.com/Avnet/stereovision-app/blob/main/documentation/Checkerboard-A4-25mm-8x6.pdf

The image must be printed in A4 format. Attach it to a stiff backing, such as cardstock. It's easiest if the backing is slightly larger than the chessboard so that there is space to hold on to it as you move it from side to side. The chessboard size is critical to the distance measurement. If the size of your board is not identical, you must adjust the corresponding variables in file "calibration.cpp" and rebuild the Qt application.

More details about camera calibration can be seen in Camera Calibration using OpenCV and Camera Calibration OpenCV

Run the application

Once your Edge AI Kit has been set up and booted, it should open Wayland Desktop.

Open a terminal on the Wayland desktop. Enter the command:

stereovision

This will open the stereovision Qt application.

1. Select “Start” to start the camera stream.

If this gives an error, run the command “systemctl stop ap1302-stream to make sure cameras are stopped, or go through the troubleshooting steps in section 3.3.5.

2. Select “Takephoto” to take a calibration photo. Position the chessboard so that it is in frame in both cameras. It is recommended to take at least calibration 8 photos with the chessboard in the corners of either camera frame (but always fully in frame in both cameras).

The intrinsic and extrinsic parameters of cameras will be saved in files "left.yml", "right.yml" and "stereo.yml" in folder "camera_data" located at the root of project. Note: if you edit the path or name of the images for the calibration, you need to adjust the corresponding variables in file "calibration.cpp".

3. Select “Calibrate” to calibrate the app.

1 / 2

4. Select “Face Detection” to begin detecting faces. This will also select rectification (e.g. the cameras have been calibrated to the chessboard). You can also run rectification without face detection to test the calibration.

1 / 2 • Face detection in meters

Application Details

The face detection model is a TensorFlow Lite model from MediaPipe., which is accelerated on the NPU using the NNAPI delegate.

The disparity calculation is done using OpenCV. Unfortunately openCV is not supported by the the acceleration provided by the Neural Processing Unit (NPU) integrated on Avnet i.MX 8M Plus Edge AI Kit.

The process of measuring distance consists of two steps:

1. Human detection.

2. Disparity calculate of the ROI (region of human)

The application detects human faces with artificial neural networks and selects the corresponding feature points of face in stereo images using SURF method. The distance is computed with disparity of the feature points.

Human detection

The tflite model file, which is required to run network, is located in "dnnData" folder of the application code. You can checkout the latest face detection models from MediaPipe.

This application supports two models: "short range", which is designed for faces within 2 meters of the camera and the "full range", which is designed for faces within 5 meters of the camera. If you use other models, please adapt the parameters of FaceOptions in code according to the corresponding description file e.g. face_detection_full_range_common.pbtxt or Model card.

The article Understanding Bounding Box Outputs from Neural Networks explains how to use model and set parameters for face detection.

Disparity calculation

Disparity refers to the distance between two corresponding points in the left and right image of a stereo pair. The disparity of a point in rectified image (camera calibration) can be used to calculate the coordinates of this point in the real world.

After calibration, the two feature points of a pair has theoretically the same v-coordinate in image. In this case the disparity refers to the difference of u-coordinates between two corresponding points in the left and right image. Once disparity is computed, the Z-coordinate in real world can be calculated since the other parameters can be obtained in file "camera_data/stereo.yml".

Each detected face in the left image will be matched with the one in right image. The two face blocks will be considered as matched if their width, height and v-coordinates are same. Once faces are matched, their disparity will be computed to calculate the distance.

Tuning parameters

The following parameters can be tuned to adjust the model's accuracy in different conditions:

ConfidenceThresh [0.01, 1.0]: The variable defines the minimum confidence for face detection. Only the detected faces with confidence more than the threshold will be processed.
FaceOverlapThresh [0.01, 1.0]: To increase the accuracy, the overlapped detected faces will be treated as one. The variable defines the maximum overlap. The two objects with overlap more than the threshold will be treated as one.
Disparity_Y Thresh [0, 100]: The variable define threshold of the disparity difference in v-direction, which is used for face matching.
Width_Rel_Thresh [0.01, 1.0]: The variable define threshold of reletive deviation of face block width. 0.1 means that the face blocks in left and right frame will not be matched each other if the difference of height more than 10%.
Height_Rel_Thresh [0, 10]: The variable define threshold of reletive deviation of face block height. 0.1 means that the face blocks in left and right frame will not be matched each other if the difference of height more than 10%.
SkipFrame [0, 20]: To accelerate the calculation, not all the frame will be process. If the variable set to 4, then only one frame will be process in every 4 frames.

Credits

Monica Houston

80 projects • 463 followers

I don't live on a boat anymore.

Contact

Comments

Please log in or sign up to comment.

Stereo Vision Depth Perception on the Edge AI Kit