This is my entry in the Gaming category for the Build2gether Inclusive Innovation Challenge.
"What were the needs or pain points that you attended to and identified when you were solving problems faced by the Contest Masters?"
Over time I have built many interactive displays for public exhibitions and shows. The way the public interacts with them varies, but needing dexterity to reach up to and touch the installation is the most common form of interactions. This is okay, but it can be limiting for anyone who is not average.
The challenge I set myself was to create an input controller for use in public spaces, so the general public can use small head movements to interact with an installation.
I am pleased to report that my solution actually works. Yes, it came as a surprise to me as well.
Requirements breakdownThroughout this build there were a number of decisions to make and paths to follow. To help with this decision making process I listed my key requirements to judge my decision by. These were…
- Produce a simple to install and low cost module for new public displays, and to retrofit on to existing installations
- The interactions are available to [almost] everyone, including those with and without limited physical mobility
- To give the same interactive experience to everyone
- To be fun
For this project to be a success it needs to be easy to get up and running. Equally it needs to be a project that the internals can be understood and it be built on. To do this I am splitting the build instructions into two sets of instructions. First is the quick build instructions that will be good for most people. If you simply want to duplicate the core module and have a working system then just follow these instructions step by step. If you need to, or just want to evolve or customise the system then I have added the advanced section after for you.
Also note that I am going to include a lot of technical details for those who are interested. Yes there is a lot of advanced technology under the hood and inside the modules, but if you are not a technical person then you don’t need to worry about it. Please don’t let this extra information put you off if you are not interested in it. It is easier to install than you may think.
Quick and easy build - Setting up the Google Corel Mini boardIn this section I will give a simple set of instructions for configuring the heart of the system, the Google Corel Mini board. First we follow the instructions on the Google Corel web site to get the operating system set up. Once you have followed these instruction, including updating the board, and have the example working, we are ready to install our code.
We will need to be able to log in to the board over our local network in order to use a USB camera. This is because the right hand side USB-C port that we use to set the device up, is the same one we need to use to connect the webcam. If we followed all the setup instructions we should be able to use the command “mdt shell” from our computer to connect to the board over wifi, or we can use a serial adaptor if that is not working. We will not need to do this if we are using a native Coral Camera. I have made some notes in a section below about why we are not using this currently the camera selection.
We now need to power the Google Coral Dev Mini using the left hand USB-C port and press the power button to boot it. The right hand USB-C prot is now used to plug in the OTG adapter, that intern has the USB webcam plugged into it.
Now we can jump ahead a few steps by installing the project pose-net and getting thay to set up some prerequisites. Running the command “git clone https://github.com/google-coral/project-posenet.git” on the Google Coral will download it locally for us, and then “cd project-posenet” and “sh install_requirements.sh” will finish installing all the requirements for you. Running “python3 simple_pose.py” will test the code and show some example data on the screen.
Now we can copy our Python code from here to the Google Coral. Just copying it into the project-posenet folder is easy and will work fine. If we put the code in a file called “dbsmile.py” then running the command “python3 dbsmile.py” will run it, and if all is good it should run without an error message.
Quick and easy build - The Keyboard BridgeNow we should be detecting faces, but that data is being fed as serial bata through one of the pins to another microcontroller. We need to translate that into something that most computers can use easily.
This bridge is the microcontroller board and has a ATMega32U4 microcontroller. This can emulate most USB devices and for testing, and for most installations, this will be emulating a keyboard.
For the prototype I have used an Arduino Pro Micro, but it is the same base design as an Arduino Leonardo that can be used instead. The only soldering that has to be done in this project is to this board, and it is easy. You will need a female 6-pin header and feed it through the bottom on the board from GND to D6 as photographed. Then we need to solder a short wire between D6 and D9 on top of the board. Then that is it.
Then upload the Arduino code from this page using the Arduino IDE (full instructions on how to do this is on the SparkFun web site) and it is ready. Plug it in to the end of the Google Coral GPIO pins between pin 29 and 39 as shown in the photograph, connect a USB cable between the USB port and a computer, and it is now ready to run.
For testing I have been using some online games and those tests are documented in a section below. To connect to the computer just plug a USB lead between to the Pro Micro and the computer and it will show up on the computer as a keyboard.
Selecting and setting up the Web CamIt should be noted that the selection of the Webcam was more complicated than it should have been. The Google Coral boards support a low cost Coral Camera but unfortunately because of chip shortages they are not currency available to buy. If you are reading this in the future then using this camera module is the way to go. Sadly it was not available to us while we were working on this project.
Another issue we faced was that Mendel Linux, the Linux distribution used by the Google Coral bard, does not support many web cams out of the box. We could compile the drivers for the selected camera, but this is not easy and turns the project into something that is not accessible to the average maker. As a result I set out with the help of others on the Build2Gether Discord to test the compatibility of all the cameras we could lay our hands on.
The results are documented in this document.
If you have an untested webcam and a Google Coral board and would like to help then please add your camera to the list. The command “lsusb” will give the USB details, “v4l2-ctl --list-formats-ext --device /dev/video1” will show if the camera is supported by the kernel, and “v4l2-ctl --device /dev/video1 --set-fmt-video=width=640, height=480, pixelformat=MJPG --stream-mmap --stream-to=frame.jpg --stream-count=1” will test if we can capture an image from the camera.
After all this testing we found two cameras that worked. The EyonMe W6 worked best and is available second hand at a low cost around the world, so this is what I am recommending until the Coral Camera is available again.
There have been reports that this camera is the same as some Logitech models, but the all Logitech cameras we have tested so far are not compatible. If you have a modern day “Logi” then we would be grateful if you could give it a test and report back.
Advanced - Head TrackingThe core of the project is the head tracking. We need to locate key points on a face and translate those into various gestures. In this section I will do a deep dive into why some decisions were made and talk through the code. These advanced sections are here to give more information to those wanting to understand the technical aspects of the solution and use elements of it in their own solutions.
For the tracking I am using a simple web cam for the reason documented above. We use the Video for Linx 2 libraries to access the camera through the OpenCV2 library. When we installed the Posenet example project these libraries are installed for us along with the tensor libraries and trained models that are required. You can do a manual install of these, but it is strongly recommended you use the code that is already out there to do the heavy lifting.
Now on to our code. At the start we include all the libraries we need and start to initialise things. We also open a serial convection for sending data to the bridge module.
import cv2
from pose_engine import PoseEngine, KeypointType
from PIL import Image
from PIL import ImageDraw
import numpy as np
import sys
from periphery import Serial
uart1 = Serial("/dev/ttyS1", 115200)
Now we can start the video capture. We do this using OpenCV2. The USB camera will appear as video device 1 and 2, with the Coral Camera being 0 and 2. If we want to use the Coral Camera we change the first parameter to cap.open to 0.
cap = cv2.VideoCapture()
cap.open(1, apiPreference=cv2.CAP_V4L2)
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('Y', 'U', 'Y', '2'))
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1024)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 768)
cap.set(cv2.CAP_PROP_FPS, 20.0)
if not cap.isOpened():
sys.exit('Could not open video device')
We are capturing at a frame rate of 20fps, and that is more than we can process, but the reason why is documented below.
Now we initialise the Pose Engine that will take our images and work out the body positions.
engine = PoseEngine('models/mobilenet/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite')
This will take a while, but short enough that we will not notice it.
Once we are all set up we can start the main loop.
while True:
The first thing we need to do in the loop is grab an image from the webcam to process. After these commands a copy of the image frame buffer is stored in frame, and the success is stored in ret.
cap.grab()
ret, frame = cap.read()
Please note that the seemingly obvious error is not actually an error. It is a hack to reduce lag. The call to cap.read() first calls the grab() method to move an image from the USB webcam’s pipeline to an internal frame buffer, and then calls retrieve() to make the contents of that buffer available to the Python code. Calling cap.grab() and cap.read() is internally calling grab() twice and then retrieve(). We are capturing the image from the webcam twice and only using the second.
The reason why we do this is that we can only process around 10 frames a second. Tests have shown we can process almost 12 frames a second, but it is best to spend a few milliseconds waiting for a new image rather than processing one late. If we set the webcam to capture 10 frames a second we get a new image every 100ms. We could be lucky and the image be new when we get to use it, but equally we can be unlucky and it can be up to 100ms old. If we capture 20 frames a second we get one every 50 ms. Now we have reduced the potential lag down to 50ms, and an average of 25ms. There are other factors to take account of and capturing more unused images will in practice slow things down, but testing has shown this rate reduces lag
So, if we successfully capture an image, ret will be true, so we can continue.
We then take the frame buffer, convert it to a Python Image Library (PIL) image object using the fromarray() method, and then pass it to the tensor processor using the pose engine we created earlier.
if ret:
pil_image = Image.fromarray(frame)
poses, inference_time = engine.DetectPosesInImage(pil_image)
for pose in poses:
Note we need to reuse the same engine as it will take a little longer the first time it is run. In reality this will affect us by introducing a small lag, but we will catch up within a few seconds. If we created a new engine every time this delay would occur every time and as it would be very slow to run.
After this we are then given a number of “poses”. These are data structures with 2D coordinates for all the body nodes that could be identified.
Most of the time we will be given just one pose, but if multiple people are detected we will be given one pose for each person visible. By default I am having all the visible poses processed and acted upon. For example if two people are gesturing to the left and one to the right, a combination of two presses to the left and one to the right will be passed to the computer. Two people gesturing right will overcome one gesturing to the left.
So on to the code to communicate with the computer. First we check for how certain we are about the data we are getting. To communicate with the computer we use a bridge module. We give commands to this module over a UART and then it will emulate the device appropriate for what we are using. This device could be a virtual keyboard, mouse, serial port, joystick or other games controller. For testing I have been using a virtual keyboard with that code documented below.
If we are less than 40% certain of the accuracy then we can not trust the data, but we still send a command to the bridge module in case it is useful to know there is someone there, but we don’t know what they are doing.
if pose.score < 0.4:
uart1.write(b"?")
continue
The accuracy score threshold of 40% (0.4) was found through practical experimentation in two locations. This may need to be adapted for different locations or lighting conditions, but 40% has worked well for us.
If we are happy that the values we have in the pose are accurate then we can work out if the person has their head tilted.
To do this we get the position of their left and right eyes. We work out how far they are apart, and divide that by 10. Then we can work out if one eye is higher than that value than the other. By the magic of mathematics this will trigger when the user’s head is tilted more than 5.7 degrees.
Again this value has been found using practical testing. A smaller value results in a false gesture being detected, while a larger value results in the user needing to make a larger movement, making it less pleasurable to interact with and inaccessible to those with less mobility.
When the user tilts their head the relevant command is sent to the bridge, and if their head is level a fallback is sent to let the bridge know that there is someone there who is not gesturing.
left_eye = pose.keypoints[KeypointType.LEFT_EYE]
right_eye = pose.keypoints[KeypointType.RIGHT_EYE]
eye_gap = ( left_eye.point[0] - right_eye.point[0] ) / 10
if left_eye.point[1] > right_eye.point[1] + eye_gap:
uart1.write(b"l")
elif right_eye.point[1] > left_eye.point[1] + eye_gap:
uart1.write(b"r")
else:
uart1.write(b".")
Advanced - The “keyboard” bridge …The bridge is the micro-controller that connects the Google Coral board that is doing the gesture detection, and the computer that is running the interactive display.
I have been using an Arduino Pro Micro for this. It has the same functionality as an Arduino Leonardo but in a far smaller form factor. It has an ATMega32U4 processor that can emulate most USB devices and I have been testing by emulating a keyboard.
My original hope was that we could have simply soldered a header under the Arduino and that plug in to the end of the Google Coral header pins. We got close by mounting the header 4 pins back on the Arduino, but it was not working. The issue is we are using a Software Serial library on the Arduino so we could use the pins that were over the Google Coral’s UART1 pins, but D6 what is above UART1 did not have Interrupt capabilities and we needed that. As a result we now have a “botch wire” between D6 and D9. We now use D9 for receiving the data from the Google Coral, and D6 is left floating so it will not interfere. The ground on the Arduino’s pin 4 lines up with the Google Coral’s ground on pin 39.
It is worth noting that for this implantation we only need a common ground and one data line between the Google Coral and the Arduino. The Google Coral is powered by its own supply and the Arduino is powered by the computer it is plugged into. We are not currently using the link between the TX or the Arduino (D5) to the RX of the Google Coral (UART1_RX) but it is available should we want it in the future.
So on to the code. First let's include the libraries we need to talk to the Google Corel over serial, and to emulate a keyboard.
#include <Keyboard.h>
#include <SoftwareSerial.h>
SoftwareSerial mySerial(9, 5); // RX, TX
And initiate the libraries.
void setup() {
Keyboard.begin();
mySerial.begin(115200);
}
We only need to send virtual key presses to the computer if things have changed. If we are still doing what we were doing before then we can just not release the virtual key and keep it pressed. In order to know when things change we need to keep track of what was pressed before, and this is why we declare a global variable here to store that in. If we do not do this then we would need to send the same keystrokes every 100ms when we do through the loop.
int currentKey = 0;
Now we need to create the main loop. This is run continually. We start off by checking if we have received a new instruction from the Google Coral board, and if we have not then we skip this loop and then will try again on the next time round.
If there is data then we read a byte of it and check that it is a valid command and not the same as the last command. We check it is valid quite simply to help with debugging. When using the Arduino IDE it is very easy to accidentally send a carnage return or line feed character by mistake, so we ignore them, and there is no reason to remove that code.
If the command has changed then we first release all the virtual keys so it is like we are not pressing anything.
void loop() {
int newKey;
if ( mySerial.available() > 0 ) {
newKey = mySerial.read();
if ( ( newKey >= ' ' ) && ( newKey != currentKey ) ) {
Keyboard.releaseAll();
Now we need to press the new keys. This code may vary depending on the software we are driving. I have been testing with a car driving game that uses the keyboard arrow keys to move. Up is the accelerator, and left and right are steering.
When we receive an “l” (lower case L) or “r” we press the virtual up and left or right. If we receive a “.” then the user is looking at the camera but not tilting their head, so we just press the virtual up button to drive straight. If it is not clear what the person is doing, or if there is no person, then we don't press any virtual keys and the car will come to a stop.
switch (newKey) {
case '?':
break;
case 'r':
Keyboard.press(KEY_UP_ARROW);
Keyboard.press(KEY_RIGHT_ARROW);
break;
case 'l':
Keyboard.press(KEY_UP_ARROW);
Keyboard.press(KEY_LEFT_ARROW);
break;
case '.':
Keyboard.press(KEY_UP_ARROW);
break;
}
At the end we remember the key we have just sent so we can check if things have changed again next time round the loop.
Finally there is a delay. If for some reason we have a backlog of commands being sent from the Google Coral we may end up sending keystrokes faster than the receiving computer can handle. This will reduce the number of key presses we will emulate to 10 a second. As this is the rate we are able to process the processes we should never need to send more than this.
currentKey = newKey;
delay(100);
}
}
}
Case DesignI have included a design for a case for the modules. It is nothing amazing but will protect the modules from accidental damage. Please feel free to improve the design or make it more appropriate for your installation.
In the photograph above the white lead goes to a 5V USB power supply. The silver lead is an OTG USB adapter that has the webcam plugged in to it. The red USB lead is plugged in to the computer running the installation.
Update: I have updated the design slightly since these photographs were taken. The hole for the connector to the computer is changed slightly in the downloadable files.
DemonstrationThis is a video of the setup working with the driving game. Sadly this is not my huge screen and they won't let me take it home.
There is one oddity in this test. There is a longer lag in this video than testing at home and I can’t work out if it is the Chromebook that is running the driving game, or the screen that has introduced the delay. This is something for further investigation.
My location was fixed for the video so I stayed in frame, but this worked when I moved around and stood up and sat down. I found it was very usable and the bad driving on the car game was more to do with the user than the user interface.
I found it interesting that I instinctively leaned my body into some turns and I wonder if this can be tracked and used to further enhance the experience. Another small irritation is that the AI occasionally spotted a body in the camera stand I was using to film with. I need to do some more testing to find a better configuration to stop this from happening. It is not a hard thing to change. It just requires tweaking and retesting.
All in all I am very happy with the outcome and intend to use this controller on a future installation.
Comments