Published November 30, 2023

Tell me my time

A gesture operated announcement system to allow someone who is visually impaired to check lap times in a swimming pool.

BeginnerFull instructions provided109

Honorable Mention Prize Swimming

Build2gether Inclusive Innovation Challenge

Things used in this project

Hardware components

Google Coral Dev Board Mini

USB-C OTG Cable

EyonMe W6 Webcam

Story

Overview

This is my entry in the Swimming category for the Build2gether Inclusive Innovation Challenge.

"What were the needs or pain points that you attended to and identified when you were solving problems faced by the Contest Masters?"

I have been pondering what is the thing that I need to see in the swimming pool that I would be disadvantaged by if I could not. Obviously there is where you are going, but I still can’t bring myself to open my eyes under water and have not crashed into the wall yet (there is still time though ;-) ). The thing I do look at is the clock and the matrix information board.

Granted things like the water temperature are not very useful, and I can normally remember the session times, but knowing the time is useful.

This project is to build a device that scans across a pool and looks for a gesture. When it sees this gesture it will make a verbal announcement through the PC system.

Initially there will be one gesture and that will read out the time, but other gestures can be added in the future.

Build

This project is mostly software, but you will need a device with a Tensor Processor Unit (TPU) in order to do the required body tracking in real time. This project uses the Google Coral Dev Board Mini with an external webcam.

First, we must follow the setup instructions on the Google Coral website so we can log in across the network.

Second, we need to install project-posenet. Running the command “git clone https://github.com/google-coral/project-posenet.git” on the Google Coral will download it locally for us. We then then run “cd project-posenet” and “sh install_requirements.sh” will finish installing all the requirements for you. Running “python3 simple_pose.py” will test the code.

Third, copy the code from this project page into a file called swimming.py in the project-posenet directory.

Fourth, we need to install espeak to do the test to speech part of the project. Run the command “sudo apt install espeak python3-espeak” to install all the required parts.

Firth, we need to plug some speakers or a PA system into the headphone socket.

That is us set up. Just run the code by using the command “python3 swimming.py” and it will be running.

Code breakdown

First we import all the required libraries and initialise the espeak library ready for when we need it.

import cv2

from pose_engine import PoseEngine, KeypointType
from PIL import Image
from PIL import ImageDraw

from espeak import espeak
from time import sleep
import datetime

import sys
import os

espeak.set_voice("en")

Now we initialise the video capture using OpenCV2 and the pose detection engine.

cap = cv2.VideoCapture()

cap.open(1, apiPreference=cv2.CAP_V4L2)

cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('Y', 'U', 'Y', '2'))
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1024)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 768)
cap.set(cv2.CAP_PROP_FPS, 10.0)

if not cap.isOpened():
    sys.exit('Could not open video device')

engine = PoseEngine('models/mobilenet/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite')

Now we start the main loop

while True:

First thing in the loop we capture a video image

    ret, frame = cap.read()

If we got an image we convert it into a useable format and detect any people

    if ret:
        pil_image = Image.fromarray(frame)
        poses, inference_time = engine.DetectPosesInImage(pil_image)

Next we loop through each of the people that were detected as we may have detected several and want to activate if any of them make the pose.

        for pose in poses:

If we are not certain about this pose we skip it. We don’t want to randomly announce things all the time when no one asked.

        if pose.score < 0.4:
            continue

Now we gather all the positioning we are interested in and check if both wrists are above the nose. I was using shoulders but found this less reliable. I will have to tid this code up when I get a moment.

        left_hand = pose.keypoints[KeypointType.LEFT_WRIST]
        right_hand = pose.keypoints[KeypointType.RIGHT_WRIST]
        left_shoulder = pose.keypoints[KeypointType.NOSE]
        right_shoulder = pose.keypoints[KeypointType.NOSE]

        if ( left_hand.point[1] < left_shoulder.point[1] ) and ( right_hand.point[1] < right_shoulder.point[1] ):

If we have detected the gesture then we simply work out the time and say it.

            now = datetime.datetime.now()
            espeak.synth("The time is " + now.strftime("%-I %M %p and %-S seconds"))

We don’t want to detect for a second time while we are reading the current time, so we wait here until that has finished. White this is happened we are also capturing and discarding video and we capture and ignore that. If we don't do this it will trigger several times.

            while espeak.is_playing():
                cap.grab()
                pass

Testing

As mentioned before, the example code here identifies the gesture of both arms being held up above a person's head. In this video you can see me walking around the lab and making the gesture, and in the background you will hear the time being read out when I do.

A video of me testing as described in the text

Here is a photograph of the test setup that is running this. The Google Corel board is hanging in front of the white cupboard door, the webcam is on the tripod, and the sound comes from the sound system on top of the white cupboard.

A photograph of the test setup as described in the text

If you were wondering how I filed this when I was already using the tripod, well, duct tape is a wonderful thing and I taped my phone to the cupboard door. :-)

swiming.py

import cv2

from pose_engine import PoseEngine, KeypointType
from PIL import Image
from PIL import ImageDraw

from espeak import espeak
from time import sleep
import datetime

import sys
import os

espeak.set_voice("en")


cap = cv2.VideoCapture()

cap.open(1, apiPreference=cv2.CAP_V4L2)

cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('Y', 'U', 'Y', '2'))
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1024)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 768)
cap.set(cv2.CAP_PROP_FPS, 10.0)

if not cap.isOpened():
        sys.exit('Could not open video device')

engine = PoseEngine('models/mobilenet/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite')


while True:

        ret, frame = cap.read()

        if ret:

                pil_image = Image.fromarray(frame)
                poses, inference_time = engine.DetectPosesInImage(pil_image)

                for pose in poses:
                        if pose.score < 0.4:
                                continue


                        left_hand = pose.keypoints[KeypointType.LEFT_WRIST]
                        right_hand = pose.keypoints[KeypointType.RIGHT_WRIST]
                        left_shoulder = pose.keypoints[KeypointType.NOSE]
                        right_shoulder = pose.keypoints[KeypointType.NOSE]

                        if ( left_hand.point[1] < left_shoulder.point[1] ) and ( right_hand.point[1] < right_shoulder.point[1] ):


                                now = datetime.datetime.now()
                                espeak.synth("The time is " + now.strftime("%-I %M %p and %-S seconds"))

                                while espeak.is_playing():
                                        cap.grab()
                                        pass