As I was rummaging through my closet, I stumbled upon a relic from the past... The original XBox 360 Kinect!
It was highly praised as one of the most sophisticated sensor arrays of it's day, and people have been hacking it make some pretty cool projects. I had always wanted to try it, and I finally decided it was time!
PlanHere was my original 7-point plan...
- Research
- Setup local laptop with the Kinect
- Create Dockerfile to isolate the build environment
- Install OpenCV
- Create ARM based Docker image from Dockerfile
- Setup a Raspberry Pi with the Kinect
- Track a person in the video
Part of the reason I wanted to tackle this project is because I have no real experience with webcams or video rendering frameworks, and I wanted to pull back the curtain and see what's going on in this world.
To be honest, when I first looked at the example code I was overwhelmed...
In all the examples I could find, everyone starts them off with:
using namespace cv;
using namespace Freenect;
using namespace std;
I hate this ^^^.
When you don't know about any of the frameworks or libraries being used, it turns any example code into API soup. Furthermore, it makes it hard to tell how multiple libraries are interacting with each other.
The first thing I did was go line-by-line through the example code in glview.c
, researching and documenting each and every API call. It was extremely tedious, but it did give me an idea how the logic flows and I was finally able to distinguish between the calls to libfreenect versus those to OpenGL.
In case it helps, I added the documented code to the repository. It is functionally equivalent to its original counterpart but heavily documented.
Through the Eyes of the KinectI know the hardware is old now, but the Kinect is still an incredibly impressive piece of hardware; I was blown away!
The camera on the Kinect is capable of 1280 x 1024, which is nothing to write home about - but it's still a solid camera.
However, the real impressive piece of functionality is the depth sensor.
Given that it's a decade old, it is incredible that it can pick up facial features with the depth sensor (look closely at the image).
Stepping it up with OpenCVOne of the my main objectives was to learn about OpenCV; instead of just knowing about OpenCV.
I found an example at OpenKinect.org using the Kinect with OpenCV, but unfortunately it was desperately out of date. Luckily, I stumbled upon several amazing video tutorials on OpenCV. Coupling that with the fact I had already taken the time to break down the libfreenectglview.c
example, I was able to quickly get the Kinect feed rendering via OpenCV!
Look below for links to the OpenCV video tutorials!
Show Me Your SmileOnce you are processing/rendering video data via OpenCV, then it is fairly trivial to add facial recognition.
// Facial recognition variables
cv::CascadeClassifier face_detection("/usr/local/share/opencv4/haarcascades/haarcascade_frontalface_alt2.xml");
float cascade_image_scale = 1.5f;
// Change BGR image to smaller, grayscale image
cv::Mat cascade_grayscale;
cv::resize(bgr_image, cascade_grayscale, cv::Size((bgr_image.size().width / cascade_image_scale), (bgr_image.size().height / cascade_image_scale)));
cv::cvtColor(cascade_grayscale, cascade_grayscale, cv::COLOR_BGR2GRAY);
// Detect faces
std::vector<cv::Rect> faces;
face_detection.detectMultiScale(cascade_grayscale, faces, 1.1, 3, 0, cv::Size(25, 25));
// Draw detection rectangles on original image
if (faces.size())
{
// Apply rectangles to BGR image
for (auto &face : faces)
{
cv::rectangle(
bgr_image,
cv::Point(cvRound(face.x * cascade_image_scale), cvRound(face.y * cascade_image_scale)), // Upper left point
cv::Point(cvRound((face.x + (face.width - 1)) * cascade_image_scale), cvRound((face.y + (face.height - 1)) * cascade_image_scale)), // Lower right point
cv::Scalar(0, 0, 255) // Red line
);
}
}
Head HunterDetecting is a face isn't that impressive/creepy, unless people realize it's happening. This is where the Kinect's tilt motor comes in handy. I decided to add in a little code to make Kinect reposition itself to ensure the face stays in the center of the frame.
// Calculate avgerage y-axis value of faces
avg_face_y = (sum_face_y / faces.size());
// Track face (vertical only)
if (avg_face_y < ((cascade_grayscale.size().height / 2) - 25))
{
if (++tilt_degrees >= 30)
{
tilt_degrees = 30;
}
kinect.setTiltDegrees(tilt_degrees);
}
else if (avg_face_y > ((cascade_grayscale.size().height / 2) + 25))
{
if (--tilt_degrees <= -30)
{
tilt_degrees = -30;
}
kinect.setTiltDegrees(tilt_degrees);
}
NOTE:sum_face_y
was calculated during the rectangle drawing.
Seriously! Who wants to use someone else's project without changing it around a little? Don't worry, I had that in mind when I set out to tackle this project.
I've created a Dockerfile that sets up the build environment. Instead of spending an hour setting up an environment (which may or may not work), there is only one dependency, Docker CE.
Sanity Test
First, I would recommend a hardware/sanity test, to ensure your Kinect is working properly with your computer. Use the following commands to launch a container running my example application:
$ xhost +local:docker
$ docker run --device /dev/snd --env DISPLAY --interactive --net host --privileged --rm --tty zfields/kinect-opencv-face-detect
Those experienced with Docker will observe the container requires a lot of privilege, specifically --net host
and --privileged
. These are required to interact with USB controller and the X11 server.
It is true, the Docker image is large and bloated, assumes egregious permissions, and has not been tailored to the example application; that is by design. The container has been fashioned as an experimentation environment.
NOTE: If you are concerned with tweeking the xhost
settings (as I first was), then you have nothing to worry about. The settings will revert once you reboot your machine.
Experiment (x86 or ARM)
First you will need to clone the Git repository...
$ git clone https://github.com/zfields/kinect-opencv-face-detect.git
Once you have the source, modifying the example application and recompiling is a breeze! Simply edit kinect-opencv-face-detect.cpp
and run the following command:
$ docker build --tag kinect-opencv-face-detect .
Docker will use the image you downloaded for the sanity test as a build cache (which SIGNIFICANTLY reduces build time). Then Docker will notice the file kinect-opencv-face-detect.cpp
has changed, and reissue only the COPY
(26/28) and make
(27/28) steps.
Step 26/28 : COPY kinect_opencv_face_detect.cpp .
---> 8b50b9e8dfb4
Step 27/28 : RUN ["make", "all"]
---> Running in 2cf308cce477
g++ -I/usr/local/include/libfreenect -I/usr/include/libusb-1.0 -I/usr/local/include/opencv4/ -fPIC -g -Wall -std=c++11 -Wall -Wextra -Wpedantic kinect_opencv_face_detect.cpp -o head_hunter -lfreenect -lpthread -L/build_opencv/lib -lopencv_core -lopencv_highgui -lopencv_imgcodecs -lopencv_imgproc -lopencv_objdetect
In file included from /usr/local/include/libfreenect/libfreenect.hpp:37,
from kinect_opencv_face_detect.cpp:11:
/usr/include/libusb-1.0/libusb.h:740:46: warning: ISO C++ forbids zero-size array 'dev_capability_data' [-Wpedantic]
uint8_t dev_capability_data[ZERO_SIZED_ARRAY];
^
/usr/include/libusb-1.0/libusb.h:765:78: warning: ISO C++ forbids zero-size array 'dev_capability' [-Wpedantic]
struct libusb_bos_dev_capability_descriptor *dev_capability[ZERO_SIZED_ARRAY];
^
/usr/include/libusb-1.0/libusb.h:1258:70: warning: ISO C++ forbids zero-size array 'iso_packet_desc' [-Wpedantic]
struct libusb_iso_packet_descriptor iso_packet_desc[ZERO_SIZED_ARRAY];
^
In file included from kinect_opencv_face_detect.cpp:11:
/usr/local/include/libfreenect/libfreenect.hpp: In member function 'virtual void Freenect::FreenectDevice::VideoCallback(void*, uint32_t)':
/usr/local/include/libfreenect/libfreenect.hpp:155:36: warning: unused parameter 'video' [-Wunused-parameter]
virtual void VideoCallback(void *video, uint32_t timestamp) { }
~~~~~~^~~~~
/usr/local/include/libfreenect/libfreenect.hpp:155:52: warning: unused parameter 'timestamp' [-Wunused-parameter]
virtual void VideoCallback(void *video, uint32_t timestamp) { }
~~~~~~~~~^~~~~~~~~
/usr/local/include/libfreenect/libfreenect.hpp: In member function 'virtual void Freenect::FreenectDevice::DepthCallback(void*, uint32_t)':
/usr/local/include/libfreenect/libfreenect.hpp:157:36: warning: unused parameter 'depth' [-Wunused-parameter]
virtual void DepthCallback(void *depth, uint32_t timestamp) { }
~~~~~~^~~~~
/usr/local/include/libfreenect/libfreenect.hpp:157:52: warning: unused parameter 'timestamp' [-Wunused-parameter]
virtual void DepthCallback(void *depth, uint32_t timestamp) { }
~~~~~~~~~^~~~~~~~~
Removing intermediate container 2cf308cce477
---> c723fdc35987
Step 28/28 : CMD ["/build/head_hunter", "0"]
---> Running in 25567783ebe1
Removing intermediate container 25567783ebe1
---> afaa7e804361
Successfully built afaa7e804361
Successfully tagged kinect-opencv-face-detect:latest
Now, your application is ready run!
NOTE: You'll notice several warnings appear during compilation. These are existing warnings for libusb
and libfreenect.
I enabled these warnings in the Makefile
by using the compile flags -Wall
-Wextra
and -Wpedantic
.
Headless Mode
My original goal was to run on an embedded platform, as to enable robotics scenarios. To accommodate that end, I added a headless mode of operation. If you pass a non-zero parameter to the application, it will boot into headless mode with facial recognition and head tracking enabled.
To launch headless mode via the container, issue the following command:
$ docker run --device /dev/snd --env DISPLAY --interactive --net host --privileged --rm --tty kinect-opencv-face-detect /build/head_hunter 1
SummaryThis project was a blast! Thanks to the great resources available on the internet (links below), I was armed with all the information required to get this project working. I was able to learn about OpenCV and create a useful platform for my personal experimentation. All in all, I hope this post serves as another resource for those trying to pull the worlds of OpenCV and Libfreenect together!
ResearchKinect
- DuckDuckGo: Windows Kinect v1 on Linux
- Open Kinect
- GitHub - OpenKinect/libfreenect: Open source drivers for...
- Running GUI apps with Docker
- Docker Forum: GUI
- Start a GUI-Application as root in a Ubuntu Container
- Still not sure how to run GUI apps in docker containers
OpenCV
- DuckDuckGo: libfreenect opencv
- Experimenting with Kinect using opencv, python and open...
- Jay Rambhia: Kinect with OpenCV using Freenect
- C++ OpenCv Example
- Catatan Fahmi: Kinect and OpenCV
- Tisham Dhar: Getting Data from Kinect to OpenCV
- Tisham Dhar's Pastebin: C++ program which uses OpenCV with freenect
- OpenCV High-level GUI API
- OpenCV Tutorials
- OpenCV Basics - 03 - Windows
- OpenCV Basics - 04 - Accessing Pixels using at Method
- OpenCV Basics - 05 - Split and Merge
- OpenCV Basics - 12 - Webcam & Video Capture
- Using OpenCV and Haar Cascades to Detect Faces in a Video [C++]
zak::waitKey(millis)
- StackOverflow: How to avoid pressing enter with getchar()
- StackOverflow: How do you do non-blocking console I/O on Linux in C?
06 MAY 2020 - I was able to quickly and successfully setup the containerized build environment. However, I was unable to run the examples, because the require an X11 GUI for the video display. I found a couple of links to research possible paths forward, but in the interest of time I decided to install the example dependencies natively on my host machine in order to test the Kinect. The test was successful. The next steps will be to get the Kinect providing input to OpenCV.
07 MAY 2020 - I was able to get the application to run via the CONTAINER! Previously there were complications providing Docker access to the X11 host. I have resolved this in a hacky, not safe manner, and made notes in the Dockerfile. I found several good blog posts (linked above) and upgraded the container to include OpenCV. Installing OpenCV in the container took over an hour and I fell asleep while it was running. The next step will be to test the OpenCV and kinect sample software provided by the OpenKinect project.
09 MAY 2020 - I attempted to use the example code I found on the internet. Unfortunately, the examples were written for the Freenect API v1 and v2, and the project is currently at major version 4. Instead of installing an older version of the API, I am planning to understand the flow of the program, then remap the OpenCV usage to the newer Freenect examples.
12 MAY 2020 - I began walking through the glview.c
example of the Kinect library. I spent time looking up the documentation for the OpenGL and OpenGLUT API calls. I made comments in the example code with my findings. I documented the main loop, which processes the OpenGL thread. I still have the libfreenect thread to review and document. For my next steps, there does not appear to be API documentation for libfreenect, so I will have to read the code to understand each API call.
13 MAY 2020 - I am disappointed about the lack of API documentation for libfreenect. My goal was to learn OpenCV, a library leaned upon heavily by the industry. However, I have instead gotten myself into the weeds of libfreenect
. I am going to continue learning the Kinect, because I have been facinated by the hardware. That being said, I thought it was pertinent to call out the loss of productivity as a talking point for a retrospective. I spent 5 hours researching/documenting the API calls, and felt like I accomplished nothing. I stopped with only the DrawGLScene
function left to research/document.
18 MAY 2020 - Documenting DrawGLScene
was by far the most difficult, as it contained trigonometric functions to calculate compensation for video rotation angles. I dug deep into the weeds - even researching and studying trigometric functions (of which I had long since forgotten). While doing it, documenting the example felt exhaustive and fruitless, but I believe it helps me identify exactly which code is necessary versus which code can be discarded. Stepping back, I think going deep on trigonometry was unnecessary when considering the goals of identifying non-critical code; however the subject matter remains fascinating. Having completed the research/documentation exercise, I feel as though I have a firm grasp of the flow of a Kinect program. The next steps will be to review the examples I found early on, and see if I can integrate OpenCV into the existing Kinect examples.
Later in the evening, I watched several YouTube video tutorials (linked above) describing how to get started with OpenCV. I now feel as though I have a basic enough understanding of OpenCV to cipher the original OpenCV examples I found early on in my research.
20 MAY 2020 - I rehydrated the OpenCV + Kinect example from OpenKinect.org. I updated the API calls in the example code as well as the Makefile, and got the example running in my Docker container.
21 MAY 2020 - While updated the example code, I recognized several sections of unused or misleading code, and I'm removing the kruft before adding in the new facial recognition feature.
I updated the windowing scheme and added new key controls to toggle depth information and finally facial recognition. Adding facial recognition was simple to add. It required nearly verbatim usage of the example in the video (linked above).
27 MAY 2020 - I been fiddling with the API over the last few days, trying to upgrade the video resolution. Unfortunately, the API and wrappers are not very extensible. They would need to be completely rewritten to allow objects to be created with parameterized resolution. To create such a composition would be an interesting use case for the CRTP (Curiously Recurring Template Pattern). However, I'll leave the refactor for another day. I plan to provide the examples I've created as is, and I am electing to make a note of the shortcoming in the corresponding blog post.
28 MAY 2020 - As I was finalizing the source and preparing to share, I noticed in my notes that I had originally intended for this to run on the Raspberry Pi. Luckily, I had created Dockerfiles, so this really only amounted to rebuilding the image on ARM - or so I thought... It turns out I configured my Raspberry Pi to not have a GUI. So I created a headless version of the program. This required rewriting cv::waitKey
, because it has a dependency on the HighGUI library, the OpenCV windowing framework.
Comments