Before we start, cool promotional and non-really-informative video that presents the end results - fun game that tracks the juggler balls, giving score for successful passes, recognize the current routine and help new fresh jugglers to practice with one ball mini game:
A bit about jugglingJuggle alone is boring. It's a great hobby and fun, can be practiced almost everywhere - all you need is couple of balls, motivation and some good company. I used to juggle with friends at juggling meetings and international conference (check out the EJC conference - its great) but most of the time its a one player game - you against earth gravity.
My friends and I thought that it would be great to have kind of virtual game that live gives feedback to the juggler about it performance, learn new routines and skills while enjoying fun game with scores, combo and enjoyable mini games. This is super nice when practicing alone for beginners that need the feedback and encouragement as juggling has a steep slope at the beginning. Some spoilers - in the end the game was turned out to be a socializers' game, all friends, especially none-jugglers, trying to compete with each other, passing the balls between them watching how the game will react, beating again and again the former high score and just having fun. well that's what is juggling all about.
System OverviewBefore we dive down to the details of each algorithm, I will shortly review the system features and general workflow.
The interactive juggling game has three main components: the backend code based written with python using the openCV library that handle the detection, tracking, segmentation and the pattern recognition - all the heavy stuff. The second component is the gameplay manager (python) - handling the game variables, score, combos, open the mini game window etc. The last component is totally independent on the last two, it is unity3D application written in c# that get the information via socket connection to the python program and visualize all the UI, animation and game visual appearance.
In this project will mostly elaborate on the core algorithms, why and how they were implemented and present in little details the UI manager and the game implementation as it is more straight forward and more intuitive. All code, is attached and can examined and run out of the box. Due to the (minor) complexity of the system, two different applications that communicate via socket, it is advisable to read the readme before running and playing the interactive juggling game application.
Game FeaturesSo we have a laptop camera that on live taking video of a juggler and overlay some cool animation on the balls, what else is going on there?
From game perspective we have the following nice features:
- Add score for each successful pass
- Announcing "combo" for specific amount of successful pass. reset the combo counter when ball is falling.
- When playing with one ball, blue circles around the screen shows up and the goal is to hit them.
- The pattern that the player performs is recognized using classic AI method and the game displays a video in a separate window, demonstrating the correct execution of the routine.
- And of course cool flames are following each ball the entire time, did I mention it already?
In order to achieve all of those the balls must be recognized, tracked and movement features must be extracted so the pattern could be recognized. So in general here are some implemented cababilities behind the scene:
- balls recognition.
- hands recognition.
- balls tracking.
- pattern recognition.
So lets dig in to each one of them, why we had to implement them and how.
Algorithm #1 - Balls recognitionTo detect the balls we combination of two filters: color and motion.
Color filter - To detect correctly the balls color, ignoring different lighting environment, we changed the RGB color space representation to LAB color space to detect the red color of the balls. By checking on some histograms of images with the balls we managed to find a range of values in channel a that represent the color of the balls.
Motion filter - We used the openCV background subtraction algorithm on the grayscale image to detect the moving objects in the frame.
By combing the result from the 2 filters with AND operation we manage to remove a lot of noise and to detect only the red moving objects.
To clean up noise we passed the frame through another filter that clears areas with small amount of white pixels because we knew that the areas that represent a red ball should be larger. Clustering algorithm was used to detected group of neighboring white pixels that might be the balls, while "lonely" pixels or small groups were eliminated.
Sounds simple and straightforward - red objects in the background don't move so they won't be recognized as balls and moving parts from the body won't be recognized due to their different color. But in engineering, if it should be working it probably won't and in our case the balls were occasionally hidden behind the player's hands, causing them to be undetected by the ball filter. To overcome this challenge we thought "why not use a hands filter?".
To address the issue of hands covering the balls, we integrated a hands filter based on skin color and movement detection. We used YCrCb color space with a broad color range and the background subtractor feature of the OpenCV library to detect skin movement.
Quick explanation about the openCV BackgroundSubtractorMOG2 - The OpenCV BackgroundSubtractorMOG2 algorithm is a method used for background subtraction in video processing. It works by modeling the background of a video stream and subtracting it from the current frame to identify moving objects or foreground pixels. The algorithm is robust to changes in illumination and adapts to the scene over time. It uses a mixture of Gaussian distributions to model the background and update the model based on new observations. The model maintains a history of pixel values and their variances, which helps it to differentiate between the foreground and background pixels. A higher value for the history parameter will result in a more stable background model, but may also make the algorithm slower to adapt to changes in the scene. On the other hand, a lower value for the history parameter will make the algorithm more responsive to changes, but may also result in more false positives.
So, robust hand detection presented a significant challenge since each frame contained excessive skin, such as face and arms. However, using a broad color range led to the filter recognizing many objects, including the red balls, as noise, making it difficult to improve detection accuracy. We improved the filter's performance by limiting the hands detection to specific areas of the frame and only detecting skin when we previously recognized a ball in the frame. Combining this filter with the ball recognition algorithm led to a better balls tracking performance.
Well this is easy task, we thought. we were young, happy and clueless. First, I will try to explain the challenges that arised for this specific task.
The input of the tracking algorithm that we designed is the coordinates of the centers of mass detected in the balls recognition algorithm.The tracking algorithm has to follow the trajectory of each ball individually, and it has to know in real time if another ball has entered or left the game. In addition, if there are noise inputs that result from the balls recogition algorithm, the tracking algorithm should understand this and not treat these inputs as balls. Moreover, balls tend to be fully hidden by the hands, when they are grabbed, and sometimes they tend to hide each other. Its important to mention that we sampled the MacBook Pro 16 camera with 15 FPS - not too fast so we can respond in real time and not too slow to loose temporal information. At this settings, balls crossed each other very fast and while one screen it looks like two balls didn't move, they actually swaped places and we had to detect it.
To address all the challenges above, we used the kalman filter as the main tool for the tracking algorithm. It is very basic and common tracking filter so I highly and super recommend you to read about it (great explanation using interactive python code here).
In our game, Kalman filter estimates the position and velocity of a moving ball based on noisy measurements from the balls recognition algorithm.
General speaking, The Kalman filter works by modeling the motion of the ball as a system with a state vector that includes the position and velocity of the ball. The filter uses a prediction step to estimate the current state of the ball based on the previous state and a motion model, and then uses a measurement step to update the state estimate based on the observed position of the ball in the current frame.
In our applications, the Kalman filter used to smooth the trajectory of the ball and make more accurate predictions of its future position. It also used to handle noisy or missing measurements from the balls prediction algorithm.
How it was used? at each frame we get for each ball a new center of mass assumed by the ball recognition algorithm. Using minimum distance (L2) between the detection and Kalman filter predection we define the new position of the ball. If there is no close match (max distance paramter that we set), we check the possibilitiy that the player׳s hand hides the ball by checking for close match between kalman prediction the the hand recognition filter. If we got a match we will follow the hand (hand center assumed to be the ball position) until we get a detection from the ball recognition in the area of the hand.
To make sure that we are following a ball and not noise, we hold buffer of ball's position history. only after a certain number of frames it is determined whether the object is really a ball based on his history size and motion.If it seems that during a certain number of frames the object does not move much, then it is classified to be a noise.
The history ball's buffer helps us to also decided if ball is out of play or a new ball is introduced, and also helps us handle cases when the balls are out of the camera field of view for couple of frames.
Algorithm #4 - Pattern recognitionSo what is the best way to classify the balls' pattern from 11 known possible routines? is it machine learning? Complex deep learning architecture using dozens of transformers on top of each? machine learning is fun and deep learning is funnier, but data has to be gathered and processed, models has to be trained, and a lot of time must to be invested in order to get good results or convergence at all. So, first we decided to extract as many features as possible and later to choose how to handle them. Some of those features:
- Variance of position (x, y) in specific time period.
- ratio of x, y variances.
- balls' extremum points (x, y).
- extremum timestamp for each ball.
by juggling and looking at the numbers, we could differentiate between different pattern by setting threshold and rules. This is a basic and classic AI method that used features to make a decision. We thought about implementing more complex machine learning algorithms, but the simple threshold did a decent job. It is not to complex to understand and was written in such a way that it will be easy to be upgraded by the crowd. If one wants, the main function that get features and output the pattern enum can be override with more sophisticated algorithm, the brave people in audience are invited to send the row frame data and use trained DNN models. We love deep-nonlinear solutions but creating awesome colorful trails and epileptic UI got higher priority.
Balls' trailAfter finding the balls using the tracking algorithm we wanted to wrap the ball in a dynamic fire trail. We set the trail gif size proportional to the movement vector size of the ball. Will also take into account the direction of the ball's movement vector, and thats it. 10 minutes work and the whole screen is full of magical fireballs.
UI and gameplayIt is trendy to think that the heart of an application is unique special AI algorithm that magically does something and its the essence of the app but the chorme dinosaur game prove the opposite way around. Simple, black & white, one button game that makes the users ignore the fact that the internet connection is already restored. Here its the same, some crazy font with chimes in the background and "combooo!" screaming every once in while makes the different between a virtual juggling assistant and great game for kids, jugglers and couple of tipsy friends.
The UI was built seperately using the Unity3D game engine, and all function were written in C#. A socket between the python program and the unity app is created, passing every frame packet to the UI app with information such as the frame image, score, balls' positions and so on. It was decided to use different app with socket connection for two reasons -
- We obtain the ability to have computer with camera in one position and show the game UI on another laptop display.
- For high fast computation which occures every frame we use the openCV library in python, well known and easy to use while for 3D animation we use unity3D features for 3D objects, visual effects, video playing and so on.
The gif below shows the results when playing the mini game, interactive game, colors, music - almost a proper indie game.
few explanations regrading all the windows: The pattern window shows the current detected pattern and demonstrate the correct way to do it. The bottom-right window shows the altitude of each ball in time. when performing the "classic" juggling routine, we can see 3 lines chasing each other, like 3 sinusoids with offset (of 2/3 pi) in phase. when playing 2 balls we can see 2 sinusoids with the opposite phase - this can help the practice the perfect routine, not just in terms of keeping the balls in the air but also to make the most "smooth\rounded" passing.
All code is in repository, and ready to run in one two clicks.
The python code should be executed first, changing the variable "online" to true will make the app to wait to UI app and then they will connect automatically. "online"=false will allow to run the python without the complex UI and enjoy just the cool overlays of the trail and access all the debugging masks.
The code is not long and complex and is good introduction to practice image processing and signal processing in general. Basic concepts such as transform from one color space to another, using Kalman filter and calculating variance and mean are used in the project and are good intro for practical processing using openCV. This project could also be a good base for other algorithms - developing algorithm for better improve tracking, recognizing more balls (in that case FPS must be higher) and recognizing pattern with newer techniques. And of course, you can just call some friends and play juggling. enjoy the game!
Comments