Background
Plan
Out of my Comfort Zone
Through the Eyes of the Kinect
Stepping it up with OpenCV
Show Me Your Smile
Head Hunter
Hack It Up
Summary
Research
Journal

Published May 29, 2020 © MIT

Facial Recognition Using XBox Kinect and OpenCV!

The XBox 360 came with one of the most advanced cameras of its time. It's time to breathe new life into an awesome piece of hardware!

IntermediateProtip1 hour7,538

Facial Recognition Using XBox Kinect and OpenCV!

Things used in this project

Hardware components

Microsoft XBox 360 Kinect Sensor

These are not directly available for sale. However they can easily (and inexpensively) be found on eBay.

Raspberry Pi 4 Model B

This component is optional, you can connect the Kinect to your PC instead.

Software apps and online services

Raspberry Pi Raspbian

OpenCV – Open Source Computer Vision Library OpenCV

OpenKinect.org Libfreenect

Story

Background

As I was rummaging through my closet, I stumbled upon a relic from the past... The original XBox 360 Kinect!

It was highly praised as one of the most sophisticated sensor arrays of it's day, and people have been hacking it make some pretty cool projects. I had always wanted to try it, and I finally decided it was time!

Plan

Here was my original 7-point plan...

Research
Setup local laptop with the Kinect
Create Dockerfile to isolate the build environment
Install OpenCV
Create ARM based Docker image from Dockerfile
Setup a Raspberry Pi with the Kinect
Track a person in the video

Out of my Comfort Zone

Part of the reason I wanted to tackle this project is because I have no real experience with webcams or video rendering frameworks, and I wanted to pull back the curtain and see what's going on in this world.

To be honest, when I first looked at the example code I was overwhelmed...

In all the examples I could find, everyone starts them off with:

using namespace cv;
using namespace Freenect;
using namespace std;

I hate this ^^^.

When you don't know about any of the frameworks or libraries being used, it turns any example code into API soup. Furthermore, it makes it hard to tell how multiple libraries are interacting with each other.

The first thing I did was go line-by-line through the example code in glview.c, researching and documenting each and every API call. It was extremely tedious, but it did give me an idea how the logic flows and I was finally able to distinguish between the calls to libfreenect versus those to OpenGL.

In case it helps, I added the documented code to the repository. It is functionally equivalent to its original counterpart but heavily documented.

Through the Eyes of the Kinect

I know the hardware is old now, but the Kinect is still an incredibly impressive piece of hardware; I was blown away!

The camera on the Kinect is capable of 1280 x 1024, which is nothing to write home about - but it's still a solid camera.

FREENECT_RESOLUTION_MEDIUM (640x480)

However, the real impressive piece of functionality is the depth sensor.

Look closely and you can see facial features (eyes, nose, cheek, mouth, forehead)

Given that it's a decade old, it is incredible that it can pick up facial features with the depth sensor (look closely at the image).

Stepping it up with OpenCV

One of the my main objectives was to learn about OpenCV; instead of just knowing about OpenCV.

I found an example at OpenKinect.org using the Kinect with OpenCV, but unfortunately it was desperately out of date. Luckily, I stumbled upon several amazing video tutorials on OpenCV. Coupling that with the fact I had already taken the time to break down the libfreenectglview.c example, I was able to quickly get the Kinect feed rendering via OpenCV!

Look below for links to the OpenCV video tutorials!

Show Me Your Smile

Once you are processing/rendering video data via OpenCV, then it is fairly trivial to add facial recognition.

// Facial recognition variables
cv::CascadeClassifier face_detection("/usr/local/share/opencv4/haarcascades/haarcascade_frontalface_alt2.xml");
float cascade_image_scale = 1.5f;

// Change BGR image to smaller, grayscale image
cv::Mat cascade_grayscale;
cv::resize(bgr_image, cascade_grayscale, cv::Size((bgr_image.size().width / cascade_image_scale), (bgr_image.size().height / cascade_image_scale)));
cv::cvtColor(cascade_grayscale, cascade_grayscale, cv::COLOR_BGR2GRAY);

// Detect faces
std::vector<cv::Rect> faces;
face_detection.detectMultiScale(cascade_grayscale, faces, 1.1, 3, 0, cv::Size(25, 25));

// Draw detection rectangles on original image
if (faces.size())
{
  // Apply rectangles to BGR image
  for (auto &face : faces)
  {
    cv::rectangle(
      bgr_image,
      cv::Point(cvRound(face.x * cascade_image_scale), cvRound(face.y * cascade_image_scale)), // Upper left point
      cv::Point(cvRound((face.x + (face.width - 1)) * cascade_image_scale), cvRound((face.y + (face.height - 1)) * cascade_image_scale)), // Lower right point
      cv::Scalar(0, 0, 255) // Red line
    );
  }
}

Head Hunter

Detecting is a face isn't that impressive/creepy, unless people realize it's happening. This is where the Kinect's tilt motor comes in handy. I decided to add in a little code to make Kinect reposition itself to ensure the face stays in the center of the frame.

// Calculate avgerage y-axis value of faces
avg_face_y = (sum_face_y / faces.size());

// Track face (vertical only)
if (avg_face_y < ((cascade_grayscale.size().height / 2) - 25))
{
  if (++tilt_degrees >= 30)
  {
    tilt_degrees = 30;
  }
  kinect.setTiltDegrees(tilt_degrees);
}
else if (avg_face_y > ((cascade_grayscale.size().height / 2) + 25))
{
  if (--tilt_degrees <= -30)
  {
    tilt_degrees = -30;
  }
  kinect.setTiltDegrees(tilt_degrees);
}

NOTE:sum_face_y was calculated during the rectangle drawing.

Hack It Up!

Seriously! Who wants to use someone else's project without changing it around a little? Don't worry, I had that in mind when I set out to tackle this project.

I've created a Dockerfile that sets up the build environment. Instead of spending an hour setting up an environment (which may or may not work), there is only one dependency, Docker CE.

Sanity Test

First, I would recommend a hardware/sanity test, to ensure your Kinect is working properly with your computer. Use the following commands to launch a container running my example application:

$ xhost +local:docker
$ docker run --device /dev/snd --env DISPLAY --interactive --net host --privileged --rm --tty zfields/kinect-opencv-face-detect

Those experienced with Docker will observe the container requires a lot of privilege, specifically --net host and --privileged. These are required to interact with USB controller and the X11 server.

It is true, the Docker image is large and bloated, assumes egregious permissions, and has not been tailored to the example application; that is by design. The container has been fashioned as an experimentation environment.

NOTE: If you are concerned with tweeking the xhost settings (as I first was), then you have nothing to worry about. The settings will revert once you reboot your machine.

Experiment (x86 or ARM)

First you will need to clone the Git repository...

$ git clone https://github.com/zfields/kinect-opencv-face-detect.git

Once you have the source, modifying the example application and recompiling is a breeze! Simply edit kinect-opencv-face-detect.cpp and run the following command:

$ docker build --tag kinect-opencv-face-detect .

Docker will use the image you downloaded for the sanity test as a build cache (which SIGNIFICANTLY reduces build time). Then Docker will notice the file kinect-opencv-face-detect.cpp has changed, and reissue only the COPY (26/28) and make (27/28) steps.

Step 26/28 : COPY kinect_opencv_face_detect.cpp .
 ---> 8b50b9e8dfb4
Step 27/28 : RUN ["make", "all"]
 ---> Running in 2cf308cce477
g++ -I/usr/local/include/libfreenect -I/usr/include/libusb-1.0 -I/usr/local/include/opencv4/ -fPIC -g -Wall -std=c++11 -Wall -Wextra -Wpedantic kinect_opencv_face_detect.cpp -o head_hunter  -lfreenect -lpthread -L/build_opencv/lib -lopencv_core -lopencv_highgui -lopencv_imgcodecs -lopencv_imgproc -lopencv_objdetect
In file included from /usr/local/include/libfreenect/libfreenect.hpp:37,
                 from kinect_opencv_face_detect.cpp:11:
/usr/include/libusb-1.0/libusb.h:740:46: warning: ISO C++ forbids zero-size array 'dev_capability_data' [-Wpedantic]
  uint8_t dev_capability_data[ZERO_SIZED_ARRAY];
                                              ^
/usr/include/libusb-1.0/libusb.h:765:78: warning: ISO C++ forbids zero-size array 'dev_capability' [-Wpedantic]
  struct libusb_bos_dev_capability_descriptor *dev_capability[ZERO_SIZED_ARRAY];
                                                                              ^
/usr/include/libusb-1.0/libusb.h:1258:70: warning: ISO C++ forbids zero-size array 'iso_packet_desc' [-Wpedantic]
  struct libusb_iso_packet_descriptor iso_packet_desc[ZERO_SIZED_ARRAY];
                                                                      ^
In file included from kinect_opencv_face_detect.cpp:11:
/usr/local/include/libfreenect/libfreenect.hpp: In member function 'virtual void Freenect::FreenectDevice::VideoCallback(void*, uint32_t)':
/usr/local/include/libfreenect/libfreenect.hpp:155:36: warning: unused parameter 'video' [-Wunused-parameter]
   virtual void VideoCallback(void *video, uint32_t timestamp) { }
                              ~~~~~~^~~~~
/usr/local/include/libfreenect/libfreenect.hpp:155:52: warning: unused parameter 'timestamp' [-Wunused-parameter]
   virtual void VideoCallback(void *video, uint32_t timestamp) { }
                                           ~~~~~~~~~^~~~~~~~~
/usr/local/include/libfreenect/libfreenect.hpp: In member function 'virtual void Freenect::FreenectDevice::DepthCallback(void*, uint32_t)':
/usr/local/include/libfreenect/libfreenect.hpp:157:36: warning: unused parameter 'depth' [-Wunused-parameter]
   virtual void DepthCallback(void *depth, uint32_t timestamp) { }
                              ~~~~~~^~~~~
/usr/local/include/libfreenect/libfreenect.hpp:157:52: warning: unused parameter 'timestamp' [-Wunused-parameter]
   virtual void DepthCallback(void *depth, uint32_t timestamp) { }
                                           ~~~~~~~~~^~~~~~~~~
Removing intermediate container 2cf308cce477
 ---> c723fdc35987
Step 28/28 : CMD ["/build/head_hunter", "0"]
 ---> Running in 25567783ebe1
Removing intermediate container 25567783ebe1
 ---> afaa7e804361
Successfully built afaa7e804361
Successfully tagged kinect-opencv-face-detect:latest

Now, your application is ready run!

NOTE: You'll notice several warnings appear during compilation. These are existing warnings for libusb and libfreenect. I enabled these warnings in the Makefile by using the compile flags -Wall-Wextra and -Wpedantic.

Headless Mode

My original goal was to run on an embedded platform, as to enable robotics scenarios. To accommodate that end, I added a headless mode of operation. If you pass a non-zero parameter to the application, it will boot into headless mode with facial recognition and head tracking enabled.

To launch headless mode via the container, issue the following command:

$ docker run --device /dev/snd --env DISPLAY --interactive --net host --privileged --rm --tty kinect-opencv-face-detect /build/head_hunter 1

Summary

This project was a blast! Thanks to the great resources available on the internet (links below), I was armed with all the information required to get this project working. I was able to learn about OpenCV and create a useful platform for my personal experimentation. All in all, I hope this post serves as another resource for those trying to pull the worlds of OpenCV and Libfreenect together!

Research

Kinect

OpenCV

zak::waitKey(millis)

Journal

06 MAY 2020 - I was able to quickly and successfully setup the containerized build environment. However, I was unable to run the examples, because the require an X11 GUI for the video display. I found a couple of links to research possible paths forward, but in the interest of time I decided to install the example dependencies natively on my host machine in order to test the Kinect. The test was successful. The next steps will be to get the Kinect providing input to OpenCV.

07 MAY 2020 - I was able to get the application to run via the CONTAINER! Previously there were complications providing Docker access to the X11 host. I have resolved this in a hacky, not safe manner, and made notes in the Dockerfile. I found several good blog posts (linked above) and upgraded the container to include OpenCV. Installing OpenCV in the container took over an hour and I fell asleep while it was running. The next step will be to test the OpenCV and kinect sample software provided by the OpenKinect project.

09 MAY 2020 - I attempted to use the example code I found on the internet. Unfortunately, the examples were written for the Freenect API v1 and v2, and the project is currently at major version 4. Instead of installing an older version of the API, I am planning to understand the flow of the program, then remap the OpenCV usage to the newer Freenect examples.

12 MAY 2020 - I began walking through the glview.c example of the Kinect library. I spent time looking up the documentation for the OpenGL and OpenGLUT API calls. I made comments in the example code with my findings. I documented the main loop, which processes the OpenGL thread. I still have the libfreenect thread to review and document. For my next steps, there does not appear to be API documentation for libfreenect, so I will have to read the code to understand each API call.

13 MAY 2020 - I am disappointed about the lack of API documentation for libfreenect. My goal was to learn OpenCV, a library leaned upon heavily by the industry. However, I have instead gotten myself into the weeds of libfreenect. I am going to continue learning the Kinect, because I have been facinated by the hardware. That being said, I thought it was pertinent to call out the loss of productivity as a talking point for a retrospective. I spent 5 hours researching/documenting the API calls, and felt like I accomplished nothing. I stopped with only the DrawGLScene function left to research/document.

18 MAY 2020 - Documenting DrawGLScene was by far the most difficult, as it contained trigonometric functions to calculate compensation for video rotation angles. I dug deep into the weeds - even researching and studying trigometric functions (of which I had long since forgotten). While doing it, documenting the example felt exhaustive and fruitless, but I believe it helps me identify exactly which code is necessary versus which code can be discarded. Stepping back, I think going deep on trigonometry was unnecessary when considering the goals of identifying non-critical code; however the subject matter remains fascinating. Having completed the research/documentation exercise, I feel as though I have a firm grasp of the flow of a Kinect program. The next steps will be to review the examples I found early on, and see if I can integrate OpenCV into the existing Kinect examples.

Later in the evening, I watched several YouTube video tutorials (linked above) describing how to get started with OpenCV. I now feel as though I have a basic enough understanding of OpenCV to cipher the original OpenCV examples I found early on in my research.

20 MAY 2020 - I rehydrated the OpenCV + Kinect example from OpenKinect.org. I updated the API calls in the example code as well as the Makefile, and got the example running in my Docker container.

21 MAY 2020 - While updated the example code, I recognized several sections of unused or misleading code, and I'm removing the kruft before adding in the new facial recognition feature.

I updated the windowing scheme and added new key controls to toggle depth information and finally facial recognition. Adding facial recognition was simple to add. It required nearly verbatim usage of the example in the video (linked above).

27 MAY 2020 - I been fiddling with the API over the last few days, trying to upgrade the video resolution. Unfortunately, the API and wrappers are not very extensible. They would need to be completely rewritten to allow objects to be created with parameterized resolution. To create such a composition would be an interesting use case for the CRTP (Curiously Recurring Template Pattern). However, I'll leave the refactor for another day. I plan to provide the examples I've created as is, and I am electing to make a note of the shortcoming in the corresponding blog post.

28 MAY 2020 - As I was finalizing the source and preparing to share, I noticed in my notes that I had originally intended for this to run on the Raspberry Pi. Luckily, I had created Dockerfiles, so this really only amounted to rebuilding the image on ARM - or so I thought... It turns out I configured my Raspberry Pi to not have a GUI. So I created a headless version of the program. This required rewriting cv::waitKey, because it has a dependency on the HighGUI library, the OpenCV windowing framework.