Created January 23, 2024

Hands Free Chess Clock

A modern chess clock which you can configure by voice commands and which can track moves automatically via computer vision.

Things used in this project

Hardware components

NVIDIA Jetson Orin Nano Developer Kit

SparkFun Qwiic Alphanumeric Display - Purple

SparkFun Qwiic Cable - 100mm

Logitech C920 PRO HD WebCam

SparkFun Mini External USB Stereo Speaker

Story

Introduction

Hands Free Chess Clock is a clock which doesn't need to be touched. It listens to voice commands to start a game and automatically switches which clock is stopped and which is running by watching a game using the camera. Here is a quick demonstration:

demonstration using 2 different chess boards

Mechanical design

clock components

Since the focus of this project is AI at the Edge Applications, I didn't try to make mechanical design reproducible. I build it from parts I had laying around. For example metal frame is mostly from eitech construction set parts. Rubber rollers from old disassembled printer serve as feet to make clock not slippery to prevent it being dragged by power cable. All the easily purchasable items are listed in components section and I am sure creative makers would have no problem to invent their own clock frame. Actually they probably would prefer to make their own :) High level connection diagram is provided in schematic section. The only requirement is to put camera high enough to see chess board from not very extreme angle. 45 degrees angle from camera to the center of the board is a good start. This angle is used in my build. The higher the better because ideal location for camera is to look straight down.

Software

To obtain software clone git repository:

git clone https://github.com/kikaitachi/hands-free-chess-clock.git

The install required dependencies:

sudo apt-get install g++ cmake ninja-build libasound2-dev libopencv-dev stockfish

To build run:

./build.sh

It will take a long time when building the first time as it will download all AI models too and some of them are hundreds of megabytes.

To run with default settings:

./run.sh

This would use default microphone and speakers. Depending on your setup you might need to specify specific devices. For example on my setup I am running using command:

./run.sh plughw:DEV=0,CARD=C920 plughw:CARD=UACDemoV10,DEV=0

Where the first parameter is Alsa device for microphone and the second one is for speaker. When you run program it will list all available Alsa devices so you can try them all and find one which works.

For more details look at the README.md file in git repository.

Voice commands

Clock is controlled by voice commands. Audio is captured using Advanced Linux Sound Architecture (ALSA) library at 16kHz sampling rate. Silero Voice Activity Detector (VAD) is used to find start and end of the speech. VAD is using Open Neural Network Exchange (ONNX) Runtime. Detected speech is recognised using whisper.cpp library which is a plain C/C++ implementation without dependencies of OpenAI Whisper model. whisper.cpp library is compiled with cuBLAS for CUDA support. Small English only Whisper model is used. Inference time varies from under a second to several seconds depending on the input. This is real-time enough for clock application given it receives commands quite infrequently. Transcribed text is pattern matched (using regular expressions) against expected commands. If match is found relevant command is executed. Supported commands:

start x minute(s) game [with y second increment] - starts a new game. All chess pieces must be at their initial squares. After starting game video camera will observe the board and automatically switch relevant clock after each move.
stop the game - will stop the clock.
continue game - will resume clock.
shutdown - will halt computer running the clock. Handy when you don't have keyboard or remote terminal for a safe shutdown. /etc/sudoers file must be modified accordingly.
please tell best move - will use external chess engine to evaluate current position and will tell what it thinks the best move is.
what is worst move - will use external chess engine to evaluate current position and will tell what it thinks the worst move is.
who is winning - will use external chess engine to evaluate current position and will tell who is winning.

Slight variations of command wording will work too.

Voice output

Hands Free Chess Clock provides voice feedback after commands, chess moves and when game ends. It can also tell opening name from Lichess opening database. Voice audio is generated on the fly using Piper text to speech (TTS) engine.

Chess board detection

Open Source Computer Vision Library (OpenCV) is used to detect chess board. Software doesn't attempt to detect board in arbitrary orientation or with arbitrary initial chess position. Assumption is made that when start new game command is issued chess pieces will be at their initial positions, board will be in the view of the camera and aligned with the chess clock.

Wooden chess board shown in the video above is my first chess board I ever played on. It is over 30 years old and is quite worn. This makes detection problem more challenging because of extra visual artefacts on the board. On the other hand it is good for tuning vision algorithms not only for clean tournament grade boards but for a regular board used in amateur setting. Tests so far were made on 2 different boards, see images bellow.

From this point on, algorithms described bellow are correct at the time this project was submitted to AI Innovation Challenge with SparkFun and NVIDIA contest. This page will be preserved to represent state during contest. If you are reading this page after the contest you might want to check GitHub repository for the latest developments.

Video capturing is running all the time regardless if game is being played or not. This helps to keep camera focused and adjusted to light conditions when user issues start game command. When frame is captured its blurriness determined by calculating Laplacian and frames bellow certain threshold are ignored. Camera refocusing might severely decrease chance to correctly detect board or a move.

Board detection algorithm is executed at every start of the game and then board is assumed to be static with minimal disturbances.

Visual processing starts with classical steps to convert RGB image to grayscale and apply some blurring:

grayscale with 5x5 aperture median filter

Next, OTSU threshold is applied to aid extracting contours:

OTSU threshhold

As you can see some of squares are joined and would result in contours spanning multiple squares but we want to detect individual squares rather than groups. To fix that erosion operation is applied:

3 pixel size rectangular erosion

Now white squares are separated but black ones are joined. To rectify inverted threshold is calculated and erosion is applied to it:

erosion of inverted threshold

Both erosion results are used to find contours:

contours in red overlayed over original color image

Detected contours are approximated as polygons and all non convex or polygons which don't have exactly 4 vertices are filtered out. Additional filters are applied to remove all polygons not having 2 horizontal, not having 2 almost vertical sides and too big ones, for example, to filter out polygon for the whole board:

"good" polygons shown in green, rejected ones shown in red

Expectation is that remaining polygons will match some or all squares of the chess board. For standard chess boards all non central squares are obstructed by chess pieces spanning multiple squares because they are tall therefore preventing these squares to be recognised as 4 sided polygons. For some less common boards non central squares will be detected too:

magnetic board with pieces which are not tall and don't obscure adjacent squares

As you can see non central squares are detected too but not on all ranks. Therefore some more heuristics need to be applied to identify which 4 ranks are central. Assumption is made that there will be at least one square detected per file. Then k-means algorithm is used on polygon centers of mass to group them to 8 files. Grouping to ranks is much harder because of perspective distortion the first file and the last file squares will overlap. Therefore a different approach is taken to find out ranks. Starting with h file leftmost square is assumed to have row 0, column 0 (as looking at the image not from player perspective). Then by moving at a distance of its top line length other polygons are found and numbered. In the image 0, 2 is detected next at a distance of 2 sizes of the initial square. Moving to the lower file first we need to find a square matching any column of previous row. Moving from the left that would be 1, 0. Then other squares in the same row are detected by moving left and right and process continues until the last row. Row and column indices are relative and don't need to correspond to the chess board coordinates. The contiguous group of columns with largest number of rows is assumed to be central squares. They contain green coordinates in the picture. Yellow ones participated in detection therefore they have indices but were not assumed to be in the center. Red ones were rejected for not being square enough.

From detected central squares we can extract side lines (blue) bounding the unoccupied center of the board (red dots). Scaling center rectangle (it is exactly half of the board) allows to get rectangle spanning whole board (purple dots).

not the best but usable detection

Using purple point location we can calculate and apply perspective correction needed to get square board:

perspective corrected board

Given how move detection (see bellow) works, perspective correction doesn't need to be very accurate.

Move detection

Once game starts MOG2 background subtraction algorithm is used to detect start and end of the move. This algorithm is adaptive therefore then movement finishes even if scene changed it will eventually be recognised as new background. Speed of updating background can be configured. Once any movement starts it will cause big changes to the original background. Especially because human hand is relatively big compared to chess pieces and shadows plus lighting changes will cause even more disturbances. When disturbances exceeds certain threshold we can assume start of the move. Once disturbances falls bellow other threshold we can assume end of the move. Choosing falling threshold significantly smaller than rising threshold avoids/minimises oscillations. Once move finish event is detected we can calculate difference of background before and after move. Applying blurring and adaptive threshold helps to get more meaningful difference:

There is a lot of noise in the difference image because of table moving, shadows changing and so on. 6 squares with most changes are shown in red. These squares are scored using various rules and matched against legal moves in this position. To get legal moves and endgame conditions simple chess engine is implemented. Rules for scoring for example favour square which has another disturbed square above it as all chess pieces when looking from this angle also obscure at least one square above it. Obviously this rule can't be applied for top most h file because it doesn't have squares above it. If no match found it is assumed that either user attempted to move and changed her/his mind or it was just a disturbance caused by table motion or changing lighting conditions.

If there is no legal move match found, additionally previous moves are matched. If there is a match of moving back when moving back is not legal it is assumed that it is a "take back" move. Chess position and clocks are updated accordingly. Therefore currently it is only possible to take back moves which are not legal normal/forward moves. For example, pawn moves. In future additional voice command might be added to allow taking back any move.

Future work

Move detection is far from perfect. Especially for rapid games when player hand is not fully retracted and still drops a shadow on the board. On some boards black pieces are of a very similar blackness as black squares therefore their presence can be very easily confused with presence of shadow.

Pawn promotions is a complicated topic since most amateur chess boards don't even have spare queens so multiple queens can be represented by some randomly found token which has different dimensions, shape and color than normal pieces, thus complicating its detection. Under-promotions (promoting to something else than a queen) are rare but still need to be supported. More voice commands could solve this. For example clock could ask what you have promoted to and player could let clock know.

Adding more training functions. Currently clock can only tell who is winning with level of advantage and what is the best move. For learning purposes it would be helpful to list alternative good moves, provide insights why move is good or bad and so on.

In current mechanical design chess board fills almost entire camera view. Therefore board needs to be relatively carefully positioned otherwise part of it wouldn't be visible. Solution could be either longer or telescoping mast to move camera further away from the board or potentially a small servo motor to allow camera automatically find a board.

Connect clock to outside world. Fully offline functioning is often an advantage but it would be also great to use voice commands to share played game online whether it is for bragging or further analysis.

Code

Credits

KIKAItachi

2 projects • 0 followers