This project aims at developing a low-cost, universal remote control system originally based on two embedded platforms -- C.H.I.P. and Arduino, but since C.H.I.P.'s manufacturer Next Thing Co. has been extinct and the product discontinued, we decided to refer to Beaglebone Black instead, which is also a Linux-based, open-hardware SBC, but in fact any SBC would do -- that were configured together as a centralized hub on home environment in order to control some electronic devices. Those two boards communicate via Bluetooth: the SBC platform uses the BlueZ lib and Arduino uses the SoftwareSerial lib.
The system is able to process video images in real time through computer vision techniques implemented on OpenCV library, which are the base for the recognition of head movements that act as an alternative interface of control, providing then autonomy for the elderly and for people with disabilities. The commands are sent to electronic devices via infrared light modulated through the IRremote library on Arduino. An overall schematic of the system is shown below as a "flowchart". Notice that the main components are open-source and open-hardware.
The bluetooth communication between the two hardware platforms also favors the displacement of the SBC (where the head pose recognition module takes place), since it does not necessarily need to be in the line-of-sight of the infrared commands sent to the electronic device (which the Arduino takes care of). Therefore, if the user needs to switch places inside the environment, the SBC can be moved to stay in front of the user rather than in front of the electronic device to be controlled.
C.H.I.P. (or Beaglebone Black) and Arduino, both open-hardware, follow the Creative Commons Attribution-ShareAlike (CC BY-SA) license. The SBCs run a Debian Jessie OS (Linux-based, open-source), which is GPL licensed, as well as BlueZ Bluetooth stack (open-source), its C/C++ API and the IRremote library (open-source). Finally, OpenCV follows the Berkeley Software Distribution (BSD 3-clause) license. The hardware modules are not for free, but their schematics are, in case you're willing to build yours on your own :) A demonstration of the system is shown on the video below:
For visual purposes, the system is running over a desktop computer on Ubuntu 14.04 LTS OS. To be more clear, the SBC (which runs a Linux-based OS) was replaced by a full laptop computer. An Arduino UNO, placed in front of the Dracula mug, runs a code that receives a signal from the laptop via Bluetooth and transmits the respective command to the TV through an infrared LED connected to an amplifier circuit.
A question that must be asked: "what if the user just needs to move their head or look in another direction? would the system be always recognizing their head gestures relentlessly? if so, the TV would get crazy commands all the time!". Well, for that we need some adaptive switches, which are peripheral devices that act like a on / off button. They will be discussed in detail at the "How does it work?" section later. First, let's talk about the importance of the project.
Why is this project important anyway?According to the last census from the Brazilian Institute of Geography and Statistics (IBGE), 23,9% of the Brazilians had declared to have some kind of disability in 2010. This is a really huge number, since it shows that 47 million people from one single country have some kind of impairment. IBGE also states that the number of elderly (60 years old or more) is about 22,8 million (approximately 11.6% of Brazilians in 2010). As for the next census, which will take place in 2020, these rates are expected to be higher in Brazil, since globally, according to the World Health Organization, the world's population aged 60 years and older is expected to total 2 billion by 2050.
As life expectancy increases, all areas of knowledge should adapt to face the consequences of an aging population. With hardware manufacturing and software development, the situation is no different, and in fact both might be considered rather fundamental, since technology is becoming (if it hasn't already) pervasive all around us. Elderly people, as well as people with disabilities, will always be in need of automation systems that provide conveninence and comfort at the task of manipulating and controlling electronic devices. The technology sector must therefore ensure that assistive solutions that prevent such people from becoming excluded from the digital world do continue to be, by all means, accessible and tangible to them, since that might be the only way for some to keep their indepenedence when interacting with appliances.
The main idea behind this project is that people who cannot frequenly move around to fetch the remote control device, and/or those whose upper limbs are compromised enough to press any buttons; can make use of six head movements to transmit commands such as turn on / off, increase / decrease, and forward / backward to many home appliances that can be remotely controlled. Of couse such people are required to have their neck movements preserved, as well as some of their cognitive in order to understand the propper mapping between the head gesture and the command to be executed.
With a module that acts as a centralized hub at home environment, there is no need to search for the remote around the house (which we all can agree it happens *all* the time). This is particularly important for those with mobility issues or even sight impairments, since most of the time one needs to look around in order to find the remote. Furthermore, a barrier naturally risen by conventional devices of that kind is the requirement for pressing physical buttons, a task rather trivial for most but definitely overwhelming for those who suffer from fine motor skills problems.
Another positive point regarding this project is the focus on both accessibility and affordability. Since there is actually no true meaning in the word "accessibility" if many people cannot bear the expense of assistive devices, we dream one day those who need technology the most can benefit from a tangible availability of it. Endorsing the open-source movement may encourage more DIY-fond hackers to reproduce the project, a task that is not possible with comercial products. Hence, in order to help mitigating such unfair economic barriers that may put people to a choice between feeding themselves and acquiring an equipment, designing a low-cost device has been essential.
How does it work?Well, this section is too technical and provides in quasi-full details how the system works from the implementation point of view. All codes can be found at https://github.com/cassiobatista/hpe-remote.
0. Summary
- 1. Adaptive Switch Based on Proximity
- 1.1 Transmitter Side of the Switch
- 1.2 Receiver Side of the Switch
- 1.3 Soldering things up - 2. Head Gesture Recognition
- 2.1 Face Detection
- 2.2 Head Pose Estimation - 3. IR Commands sent from Arduino to the TV
- 3.1 Hacking the Samsung Remote Control
- 3.2 Emulating the Samsung Remote Control - 4. Multimodal System with Speech Recognition
1. Adaptive Switch Based on Proximity
As previously discussed in the introduction, the system needs to be initliazed in order to avoid stay listening all the time for head gestures, which would produce a lot of undesidered commands at the TV. One of the solutions was to develope a proximity switch, which works by just approximating a physical object close to the circuit (as the name suggests).
In order to turn the remote control system on, an external switch was used. You can think of it as a button, except you don't need to press it, just hover something above it. It also works apart from the circuit triggered, which means it works as a wireless trigger. To do so, an infrared-based, proximity switch was combined to radio-frequency modules (encoders, decoders and antennas). You can think about this switch as a low-cost button that you can "press" while you're at the kitchen to turn something on at the living room. The details on construction and implementation are given below.
1.1 Transmitter side of the switch
The proximity switch is based on the principle of a simple line follower circuit. An infrared LED (IR LED) is placed beside an infrared sensor (IR sensor), as depicted on the image below, with both pointing to the same direction (up). This circuit is kept close to the user to be used as his/her "hover button". Once an object is placed over the IR components, the infrared light emitted by the LED is expected to be reflected by the object onto the IR sensor, which will "detect" the approximation of the object through the change on the voltage of its anode.
Since that voltage change is really small, an operational amplifier is used to (guess what) amplify this signal. The Op Amp (IC LM358) is also combined to a 10 k Ohm potentiometer to be used as a voltage comparator: when the voltage on the IR sensor's anode is greater than the voltage on the potentiometer, the 5V from the battery goes to a red LED, which provides a visual feedback for the user about the approximation of the object. Plus, a resistor was placed in series with the status LED in order to divide the voltage that goes to the pin of the radio-frequency encoder module (HT12E), as can be seen on the image below.
The HT12E integrated circuit is used to transmit an encoded 12-bit parallel data via RF transmitter module at 433 MHz frequency. The same voltage used to turn the red status LED on is put on the encoder's pin number 10 (AD8, input data pin. There's also other 3 AD pins, since the IC has 4 channels). The data pin of the RF-module (transmitter antenna, Tx), on the other hand, is connected to pin 17 of HT12E (DOUT, data output pin). The DOUT pin serializes the data to the antenna if, and only if there's a high voltage on the AD8 input pin. Otherwise, no RF wireless signal is generate by the transmitter circuit.
1.2 Receiver side of the switch
The receiver part of the wireless switch stays close to the microcomputer in order to (guess again) receive the encoded signal from the RF-Tx modules, decode the information and turn the remote control system on. The RF-module (receiver antenna, Rx) output pin is connected to decoder's pin 14 (DIN, input pin). The HT12D integrated circuit reads the serial data captured by the RF-Rx module on its pin 14, and decodes the 12 bits trying to find a match the address. If the data is successfully decoded, the output is generated in one of its data pins. Pin 10 (D8) was used here. The schematic is shown below.
The HT12D's D8 output pin is connected to an input pin (GPI) of the microcomputer. Since both C.H.I.P. and Beaglebone Black support 3.3V of input voltage, a voltage divider circuit was formulated with an LED and a resistor to limit the voltage at the output of the decoder, which avoids burning the microcomputer's pin.
1.3 Soldering things up
Everything should work fine on breadboard, but if soldering up is wanted, then the thermal transfer should do. You just need photograph paper to print the circuit with a laser printer, then put it on board and smash it with a hot iron. Then you put it in a solution (pickle solution? I can't really tell the name in English) for a couple of hours and voilà! That is what you will get if you follow the schematics attached to this project, which had been drawn on KiCad:
2. Head Gesture Recognition
Once the system is turned on by an adaptive switch, the camera starts capturing frames from the user's face/head and the image processing take place through computer vision techniques. The head gesture recognition task is performed by the combination of two key techniques: face detection and head pose estimation, as depicted on the flowchart below. OpenCV, an open-source computer vision library, takes care of the algorithms used on our remote control. This procedure was proposed in the paper [Real-Time Head Pose Estimation for Mobile Devices].
2.1 Face Detection
The first step for head gesture recognition is to detect a face. This is done by OpenCV's method detectMultiScale(). This method implements the Viola-Jones algorithm, published on the papers [Rapid Object Detection using a Boosted Cascade of Simple Features] and [Robust Real-Time Face Detection], that basically applies an ensemble cascade of weak classifiers over each frame in order to detect a face. If a face is successfully detected, three points that represent the location of the eyes and the nose are calculated over a 2D plan. An example of the algorithm applied to my face is show on the picture below:
Pretty face, isn't it? The algorithm also needs to detect a face for 10 consecutive frames in order to ensure reliability. If this condition is met, then the face detection routine stops and the tracking of the three points starts trying to detect a change in the pose of the head. If the Viola-Jones fails at detecting a face into a single frame, the counter is reset. You can see below a chunk of code with the main parts of our implementation.
/* reset face counter */
face_count = 0;
/* while a face is not found */
while(!is_face) {
/* get a frame from device */
camera >> frame;
/* Viola-Jones face detector (VJ) */
rosto.detectMultiScale(frame, face, 2.1, 3,
0|CV_HAAR_SCALE_IMAGE, Size(100, 100));
/* draw a green rectangle (square) around the detected face */
for(int i=0; i < face.size(); i++) {
rectangle(frame, face[i], CV_RGB(0, 255, 0), 1);
face_dim = frame(face[i]).size().height;
}
/* anthropometric initial coordinates of eyes and nose */
re_x = face[0].x + face_dim*0.3; // right eye
le_x = face[0].x + face_dim*0.7; // left eye
eye_y = face[0].y + face_dim*0.38; // height of the eyes
/* define points in a 2D plan */
ponto[0] = cvPoint(re_x, eye_y); // right eye
ponto[1] = cvPoint(le_x, eye_y); // left eye
ponto[2] = cvPoint((re_x+le_x)/2, // nose
eye_y+int((le_x-re_x)*0.45));
/* draw a red cicle (dot) around eyes and nose coordinates */
for(int i=0; i<3; i++)
circle(frame, ponto[i], 2, Scalar(10,10,255), 4, 8, 0);
/* increase frame counter if a face is found by the VJ */
/* reset frame counter otherwise
* because you need 10 *consecutive* frames */
if((int) face.size() > 0) // frame
face_count++;
else
face_count = 0;
/* if a face is detected for 10 consecutives, Viola-Jones (VJ)
* algorithm stops and the Optical Flow algorithm starts */
if(face_count >= 10) {
is_face = true; // ensure breaking Viola-Jones loop
frame.copyTo(Prev); // keep the current frame to be the 'previous frame'
cvtColor(Prev, Prev_Gray, CV_BGR2GRAY);
}
}// close while not face
Also, we provide a block diagram on how the face detection routine works in our implementation.
2.2 Head Pose Estimation
The gesture recognition is in fact achieved through the estimation of the head pose. Here, the tracking of the face points is performed by calculating the optical flow, thanks to the Lukas-Kanade algorithm implemented on the OpenCV's method calcOpticalFlowPyrLK(). This algorithm, proposed in the paper [An iterative image registration technique with an application to stereo vision], processes only two frames and basically tries to define where, in a certain neighborhood, a specific-intensity pixel will appear on a frame (treated here as the "current frame") based on the immediately previous frame.
Defining those three face points (two eyes, one nose) on the current frame as
and the same points, at the previous frame, as
we can calculate the head rotation 3D-axes (which also can be called Tait-Bryan angles or Euler angles: roll, yaw and pitch) through a set of 3 equations:
If the same pose happens for five consecutive frames, the program sends a value to Arduino referring to the recognized gesture and stops. To ensure reliability, some filters were created based on the anthropomorphic dimensions of the face. In general terms, three relations between the distances of the three points among themselves were considered: (1) the ratio between the distance of the eyes and the nose; (1) the distance between the eyes and (3) the distance between any of the eyes and the nose, as demonstrated below:
We also provide below a chunk of code with the main parts of our implementation in C++. You can notice when looking at the code that some filters were applied to the angles values as well, in order to ensure reliability once again.
- roll right = [-60, -15]
- roll left = [+15, +60]
- yaw right = [0, +20]
- yaw left = [-20, 0]
- pitch down = [0, +20]
- pitch up = [-20, 0]
/* (re)set pose counters (to zero) */
yaw_count = 0;
pitch_count = 0;
roll_count = 0;
noise_count = 0; // defines error
/* while there's a face on the frame */
while(is_face) {
/* get captured frame from device */
camera >> frame;
/* convert current frame to gray scale */
frame.copyTo(Current);
cvtColor(Current, Current_Gray, CV_BGR2GRAY);
/* Lucas-Kanade calculates the optical flow */
/* the points are store in the variable 'saida' */
cv::calcOpticalFlowPyrLK(Prev_Gray, Current_Gray,
ponto, saida, status, err, Size(15,15), 1);
/* Get the three optical flow points
* right eye = face_triang[0]
* left eye = face_triang[1]
* nose = face_triang[2] */
for(int i=0; i<3; i++)
face_triang[i] = saida[i];
/* calculate the distance between eyes */
float d_e2e = sqrt(
pow((face_triang[0].x-face_triang[1].x),2) +
pow((face_triang[0].y-face_triang[1].y),2));
/* calculate the distance between right eye and nose */
float d_re2n = sqrt(
pow((face_triang[0].x-face_triang[2].x),2) +
pow((face_triang[0].y-face_triang[2].y),2));
/* calculate the distance between left eye and nose */
float d_le2n = sqrt(
pow((face_triang[1].x-face_triang[2].x),2) +
pow((face_triang[1].y-face_triang[2].y),2));
/* Error conditions to opticalflow algorithm to stop */
/* ratio: distance of the eyes / distance from right eye to nose */
if(d_e2e/d_re2n < 0.5 || d_e2e/d_re2n > 3.5) {
cout << "too much noise 0." << endl;
is_face = false;
break;
}
/* ratio: distance of the eyes / distance from left eye to nose */
if(d_e2e/d_le2n < 0.5 || d_e2e/d_le2n > 3.5) {
cout << "too much noise 1." << endl;
is_face = false;
break;
}
/* distance between the eyes */
if(d_e2e > 160.0 || d_e2e < 20.0) {
cout << "too much noise 2." << endl;
is_face = false;
break;
}
/* distance from the right eye to nose */
if(d_re2n > 140.0 || d_re2n < 10.0) {
cout << "too much noise 3." << endl;
is_face = false;
break;
}
/* distance from the left eye to nose */
if(d_le2n > 140.0 || d_le2n < 10.0) {
cout << "too much noise 4." << endl;
is_face = false;
break;
}
/* draw a cyan circle (dot) around the points calculated by optical flow */
for(int i=0; i<3; i++)
circle(frame, face_triang[i], 2, Scalar(255,255,5), 4, 8, 0);
/* head rotation axes */
float param = (face_triang[1].y-face_triang[0].y) / (float)(face_triang[1].x-face_triang[0].x);
roll = 180*atan(param)/M_PI; // eq. 1
yaw = ponto[2].x - face_triang[2].x; // eq. 2
pitch = face_triang[2].y - ponto[2].y; // eq. 3
/* Estimate yaw left and right intervals */
if((yaw > -20 && yaw < 0) || (yaw > 0 && yaw < +20)) {
yaw_count += yaw;
} else {
yaw_count = 0;
if(yaw < -40 || yaw > +40) {
noise_count++;
}
}
/* Estimate pitch up and down intervals */
if((pitch > -20 && pitch < 0) || (pitch > 0 && pitch < +20)) {
pitch_count += pitch;
} else {
pitch_count = 0;
if(pitch < -40 || pitch > +40) {
noise_count++;
}
}
/* Estimate roll left and right intervals */
if((roll > -60 && roll < -15) || (roll > +15 && roll < +60)) {
roll_count += roll;
} else {
roll_count = 0;
if(roll < -60 || roll > +60) {
noise_count++;
}
}
/* check for noised signals. Stops if more than 2 were found */
if(noise_count > 2) {
cout << "error." << endl;
is_face = false;
break;
}
/* too much noise between roll and yaw */
/* to ensure a YAW did happen, make sure a ROLL did NOT occur */
if(roll_count > -1 && roll_count < +1) {
if(yaw_count <= -20) { // cumulative: 5 frames
cout << "yaw left\tprevious channel" << endl;
is_face = false;
break;
} else if(yaw_count >= +20) { // cumulative: 5 frames
cout << "yaw right\tnext channel" << endl;
is_face = false;
break;
}
/* Check if it is PITCH */
if(pitch_count <= -10) { // cumulative: 5 frames
cout << "pitch up\tincrease volume" << endl;
is_face = false;
break;
} else if(pitch_count >= +10) { // cumulative: 5 frames
cout << "pitch down\tdecrease volume" << endl;
is_face = false;
break;
}
}
/* Check if it is ROLL */
if(roll_count < -150) { // cumulative: 5 frames
cout << "roll right\tturn tv on" << endl;
is_face = false;
break;
} else if(roll_count > +150) { // cumulative: 5 frames
cout << "roll left\tturn tv off" << endl;
is_face = false;
break;
}
/* store the found points */
for(int j=0; j<4; j++)
ponto[j] = saida[j];
/* current frame now becomes the previous frame */
Current_Gray.copyTo(Prev_Gray);
}//close while isface
A flowchart of the head pose estimation routine implemented is shown below. The full program is available at our Github inside a folder named 'HPE/desktop/original'.
3. IR Commands Sent from Arduino to the TV📷
Once the signal about the gesture recognition from C.H.I.P. arrives on Arduino via Bluetooth (by means of the SHD 18 shield), it's time to send the appropriate command to the TV. If you're curious about how infrared-based remote controls actually work, I recommend this nice article from How Stuff Works: [How Remote Controls Work]. It is worthy mentioning the the system is universal-like and has been tested in a Samsung TV set only as proof-of-concept. Knowing the proper IR communication protocol allows one to control all kinds of electronic devices.
The remotes usually follow a protocol that is manufacturer specific. Since we're using a Samsung TV, we just followed the S3F80KB MCU Application Notes document, which is the IC embedded into the Samsung's remote control. Take a look if you're really really curious :) In general terms, the Samsung protocol defines a sequence of 34 bits, in which the values "0" and "1" are represented by a flip on the state of the PWM pulses, whose carrier frequency is 37.9 kHz.
Since we didn't want to code the IR communication protocol from scratch, we've just used a library called IRremote. The library provides functions to emulate several IR protocols from a variety of remotes. First of all, we had to "hack" the Samsung remote codes, because the library doesn't provide the 34 bits we are interested in. Once we have the bits for each of the 6 commands we wanted (turn on/off, increase/decrease volume and switch to next/previous channel), we can emulate our remote control with the Arduino.
3.1 Hacking the Samsung remote control
The first step was to turn the Arduino into the TV receiver circuit. In other words, we created the same circuit placed in the front part of the TV that receives the infrared light from the remote's IR LED. The hardware required for this task is just a simple TSOP VS 18388 infrared sensor/receiver attached to the Arduino digital pin number 11. Then, we run the IRrecvDump.ino
example file to "listen to" any IR communication and return the respective hexadecimal value that represents our 34 bits.
#include <IRremote.h>
IRrecv irrecv(11); // Default is Arduino pin D11.
decode_results results;
void setup() {
Serial.begin(9600); // baud rate for serial monitor
irrecv.enableIRIn(); // Start the receiver
}//close setup
void dump(decode_results *results) {
Serial.print(results->value, HEX); // data in hexadecimal
Serial.print(" (");
Serial.print(results->bits, DEC); // number of bits
Serial.println(" bits)");
}//close dump
void loop() {
if (irrecv.decode(&results)) {
Serial.println(results.value, HEX);
dump(&results);
irrecv.resume(); // Receive the next value
}
}//close loop
This code outputs the information to the Serial Monitor. The final step is just pointing the remote control to the IR receiver, press the desired buttons to hack the hexadecimal value, copy the values from serial monitor and, finally, paste elsewhere to save them. Easy-peasy.
3.2 Emulating the Samsung remote control
Once we have the information about the protocol, the task to send it to the TV is straightforward: pass the hexadecimal as argument to a specific function. On IRremote lib, there is a function called sendSAMSUNG()
that emulates the Samsung protocol over a hexadecimal value passed as argument together with the number of bits of the information. This number of bits is set to 32 because the first and the last one are the same for every command of the protocol.
#include <IRremote.h>
IRsend irsend; // pin digital D2 default as IROut
void setup() {
// nothing required by IRremote
}//close setup
void sendIR(unsigned long hex, int nbits) {
for (int i=0; i<3; i++) {
irsend.sendSAMSUNG(hex, nbits); // 32 bits + start bit + end bit = 34
delay(40);
}
delay(3000); // 3 second delay between each signal burst
}
void loop() {
sendIR(0xE0E0E01F, 32); // increase volume (pitch up)
}
The hardware required to perform the task of sending information from the Arduino to the TV is just an infrared LED (IR LED) attached to an amplifier circuit, which was built to increase the range of the IR signal. The schematic of the circuit is depicted below.
The PWM of the IR communication protocol turns the transistor on and off, which allows the IR LED to "blink" very quickly according to PWM's pulse width. A pulldown resistor limits the current passing through the IR LED from the 5V Vcc that comes from the Arduino pin. If you want to understand how we calculate the values of the resistors, take a look at Kirchhoff's Law KVL.
Here's a nice, simplified overview of the circuit I just found on Google:
4. Multimodal System with Speech Recognition
I've been spending some time around speech recognition for quite a while, so I tried to implement a mutimodal system that acceps both speech and head gestures as input. Of course they didn't work together in C.H.I.P. because it is a single core system (it almost fries with only OpenCV, even shutting down sometimes). But I think in quad-cores systems, such as Raspberry Pi 3, it might work fine.
Some of my progress with CMU Sphinx is being maintained in a separate repo on my GitHub: https://github.com/cassiobatista/asr-remote. There is also a demo video on my old Ubuntu laptop:
As you could see, I need to talk either "acordar sistema" or "wake up system" to activate the engine that accepts the control commands. This is called keyword spotting (KWS) and it can also be used as an adaptive switch to activate the remote control system. Besides, it is obvious that the speech recognition core can also serve as remote control instructions to the electronic appliances, of course as an alternative working alongside the head pose estimation techniques.
Comments