I created this as part of a school project on Home Automation, and this is my own take on the concept. In general, I think some Home Automation items take away from the everyday human interactions that make us human. Without getting too side tracked and into my philosophical beliefs behind automating your life, I decided that I would make something practical that does not take away from the actions I already do in my day-to-day life. With the prevalence of mental health during this pandemic and a little bit of brainstorming help from my teacher, I decided it would be a cool idea to create an automatic emotions journal that keeps track of my emotions throughout the day.
(As a short disclaimer, this project is just a proof of concept. I think that with the contents of this tutorial, it could be turned into something much more pragmatic. It can only detect very intense smiles and frowns, though there is room to expand the algorithm to recognize more emotions and to make the detections more accurate on subtle expressions.)
Hardware SetupFirst off, I have used my Raspberry Pi 3B+ before and already have it all set up. If you are starting from scratch and want to follow along with the rest of my tutorial below, I suggest following Raspberry Pi's official setup guide linked here and in the sources below. I would explain all of the steps here, but I am a little rusty on them now so this linked tutorial is probably a better explainer.
Once you have your general Raspberry Pi all set up, you will need to install the camera and configure some settings. First you will want your Raspberry Pi turned off. The next step is to find the camera pins which should be a slot located in between the audio jack and the HDMI port (shown below).
Once you have identified where the camera's ribbon cable is supposed to be inserted, you should gently pull up on the black plastic clip and insert the ribbon cable with the blue side facing the USB ports. The final step is to push the black clip back into place securing the ribbon cable in the port. If any of what I have written here is confusing, feel free to follow the official Raspberry Pi Guide for this as well linked here.
With the camera plugged in, I decided to build a stand out of cardboard to hold it in the correct position. To make this, I took a strip of cardboard and folded the ends in to create legs. I then drew small triangles at the bottom of each leg so the stand would angle the camera up towards my face when it sits on my desk. My last step was taping the camera to the top of the stand. Here are some photos of the stand creation process:
Now that the Camera has been successfully installed, some settings will need to be properly configured to get it working. Head to the start menu, then click on preferences, then find your way to the Raspberry Pi configuration menu. From here, you will want to enable the camera and then reboot your Raspberry Pi.
(Start —> Preferences —> Raspberry Pi Configuration)
To check that the camera has been installed correctly and the settings properly configured, open the terminal and type the following command:
raspistill -o Desktop/test.jpg
This should cause a window to display a preview of what the camera sees for 5 seconds before saving the photo to the desktop. This is all the hardware setup required for this project. Now that the hardware is set up, we need to set up the software with library installations.
Library InstallationsThere is a lot that needs to be imported to get this project up and running. This part of the project will probably take a little bit of time just because of the sheer quantity of libraries required to make it work.
I really suggest using pip for these installations because these imports would be a nightmare without if. If you don't have pip installed and/or don't know how to use it, you can follow this tutorial linked here: https://pip.pypa.io/en/stable/installing/. For some reason, I was able to get everything installed using pip except for cmake. I used Homebrew to install cmake, but since I did not already have Homebrew installed on my computer, I will cover the installation below.
To recreate this project, you will need to use a computer as well. The computer is used to train the emotion recognition model that the Raspberry Pi uses. First, I will cover the imports required to get the code I have bellow running on your computer.
Do the following library installations in order from top to bottom. Some of the lower libraries are dependent on the first few installations.
HomebrewSetup:
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
$ brew update
$ echo -e "\n# Homebrew" >> ~/.bash_profile
$ echo "export PATH=/usr/local/bin:$PATH" >> ~/.bash_profile
$ source ~/.bash_profile
Cmake Installation Using Homebrew:
$ brew install cmake
PIP Installations:
$ pip install numpy
$ pip install dlib
$ pip install face-recognition
$ pip install --upgrade tensorflow
$ pip install Pillow
If you run into any trouble with the above installations, Google tutorials on how to install them. There are much more detailed explanations on how to get these to work and what your issue might be. I know I hit plenty of system and circumstantial issue during these imports. I am sharing what worked with me above, but that doesn't necessarily mean it might work for you. If you type one of these into your console/terminal and get a lot of redlines, don't get discouraged. Luckily, all of these libraries have been around for a while which means that you are not the first person to have these errors. If you run into any serious problems during the installation process, try Googling the error message along with the name of the library you are trying to install.
Now that the imports required for your computer are out of the way, we can move on to installing the libraries required on the Raspberry Pi. Between the cmake and dlib installations, this will take roughly an hour to install correctly. Give it time, and know that these installations take a very long time. Don't worry.
TensorFlow Lite Runtime:
I actually have instructions for this installation later in the tutorial as well, but the commands go as follows:
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install python3-tflite-runtime
Cmake Installation:
$ sudo apt-get update
$ sudo apt-get install build-essential cmake
$ sudo apt-get install libgtk-3-dev
$ sudo apt-get install libboost-all-dev
Dlib and Face_recognition Installation:
$ pip install numpy
$ pip install dlib
$ pip install face-recognition
Now that all of these are out of the way, we can get to the really exciting software contents of this project.
Building The Emotion Recognition ModelI chose to build my emotion recognition model using TensorFlow Keras, but before I dive into the details, I think its important for me to briefly explain what TensorFlow is to the best of my knowledge. TensorFlow is a machine learning framework that allows for the easier creation and training of machine learning models. Keras is just one of the tools that TensorFlow offers to build those models. TensorFlow's fancy, mysterious machine learning models are really just data-flow graph, which sounds just as confusing but is actually much easier to break down.
Graphs in this sense are a mathematical structure of interconnected nodes/vertices an edges, and they are a part of a broader category of math called graph theory. It is actually a lot easier to explain visually so below is my best attempt at explaining graphs in a picture.
So with this in mind, when we think of data-flow graphs, we can think of a series of structured nodes and edges through which we can send information not too dissimilar to the way neurons work in brains. Unlike the graph I drew above, the graphs created by TensorFlow have very specific, user-controlled structures that allow the data to flow and emerge in meaningful ways. The machine learning model I built and trained was set up with the following structure:
The model I trained and used throughout this project is essentially the same structure as the graph shown above. The only difference between the drawing and the model I used is that the model I used had weights and biases between each layer. These weights and biases help determine what value goes to which nodes depending on the input, and the activation functions work in conjunction with these weights and biases. These are the only two variables in the entire model, and so they are whats used to train the models to make accurate predictions. In other words the weights and biases between each layer are nudged in directions that maximize the accuracy of the model. This is done through feedforward and backpropagation. This is where my knowledge about neural networks and machine learning models breaks down, but luck for us, we don't need to know much more or how feedforward and backpropagation work to effectively create and train a machine learning model. TensorFlow is more than capable of handling all the hard, complicated math for us, so all that is important is recognizing the general structure of the data-flow graph we are building.
Translating the model drawn above into code, we get this:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(20, 41)),
keras.layers.Dense(160, activation="relu"),
keras.layers.Dense(2, activation="softmax")
])
Thats all it takes to create the model, the hard part comes from training it and aggregating enough proper standardized data for the model to be accurate under broader use cases.
Training The ModelThis was one of the hardest part of this entire project. It turns out to be very hard to get large amounts of data that fit the my parameters for use and the shape of the model. To avoid having my model have a massive input layer that fit an entire image, I chose to use pre-trained face-recognition library to find facial landmarks which I could then pass to my own model. This allowed me to greatly reduce the size of the input layer, and though I didn't test this, increase the speed at which the model can run. Less inputs means less nodes, edges, weights, and biases which ultimately means faster training and running times. In addition, it will be way easier for me to work with a constant small number of points than an entire photo. I quickly realized that to get a work of concept up and running, I would only really need the landmarks of the top and bottom lips. This way I had even less data to have to deal with, and I could store it in a.csv file so I would only have to extract it from the photos once.
Each facial landmark was a x, y coordinate pair with the x and y measured in pixels. While very useful for locating the landmarks, the landmarks in this format were essentially un-sendable to the emotion recognition model I was training. It would be impossible for me to pass the values held in any given coordinate pair to a single node. To work around this, I plotted the facial landmarks of the upper and lower lip into set box 40 pixels wide and 20 pixels tall. Using some simple geometry shown below, I was able to standardize all of my training data so that it would fit the shape of my model's input layer.
In action the landmark plots looked like the following:
An issue I encountered while standardizing the data was that a' and b' were often not whole integer numbers. This meant that I had to round the a' and b' values to fit the plotted facial landmarks into the indexes of a 2D-array. This led to some errors where I would get an index of 40, when the array only had indexes 0-39. Naturally, I made the array one longer instead of finding a way to fix the rounding causing the error. If I have more time in the future, I might revisit this project and properly fix this issue.
When it actually came to training the model, I made minor tweaks and edits that left me with four different versions. With each version, I was able increase the models accuracy and decrease its bias to recognizing the limited training data I was able to accumulate. Here are some photos of the best model version predicting photos it has never seen before:
Something to take note of is that the model's certainty is scary high for almost every picture. This is most likely an indication that there is not enough breadth in the data I used to train my model and in the data the model has never seen before. In short, I just do not have enough varying data: all my training and testing data looks practically identical. This is something I have struggled to fix because it is challenging to accumulate and label enough data, but I am at least aware of this issue.
There is plenty of room for improvement in the training of the model, but using the approach I explained above gets the model roughly working which is fine for now. If you are interested in using this emotion recognition model in an upcoming project, I highly suggest spending the bulk of your time collecting diverse data and corresponding labels to greatly improve how well the model will work.
Converting to and Using a TensorFlow Lite ModelThe TensorFlow model I defined and trained above was all done using TensorFlow 2 on my Macbook Pro, but I want to use the model on my Raspberry Pi. Since my laptop has plenty of extra disk space, 8 gigabytes of memory, and a decent processor, it had no problem whatsoever running the TensorFlow 2 model I created. However, a Raspberry Pi has comparatively limited storage, memory, and processing capacity which means it might run into some serious performance issues by trying to run the full TensorFlow model. By converting the trained model to a TensorFlow Lite model, it should be possible to circumvent these performance issues.
Converting the Model:
Before transferring the trained model over to the Raspberry Pi, it should be converted into a .tflite
file. By following the guide on TensorFlow's website (linked here), I was able to convert my saved model with the following script (also from TensorFlow's website linked above):
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
tflite_model = converter.convert()
with open("model.tflite", "wb") as file:
file.write(tflite_model)
Using the Model:
Before the converted model can be run on the Raspberry Pi, TensorFlow Lite needs to be installed. TensorFlow Lite can be installed on the Raspberry Pi with the following terminal commands:
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install python3-tflite-runtime
Once tflite_runtime has been installed, the .tflite
model can be used to make predictions by creating an Interpreter object. Shown below is an example from TensorFlow's website (linked here) of how to use the Interpreter.
import numpy as np
import tensorflow as tf
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
Automatic Emotion JournalThe final part of this project was bringing together all these pieces to create an automatic emotions journal. The PiCamera and TensorFlow model would work in unison to ideally document my mood throughout the day.
The emotion journal script starts by taking a photo using the PiCamera module, and the photo is saved as unknown.jpg for the time being. The next step is loading the recently taken photo as a numpy array using the face-recognition module. This numpy array can then be passed to the face-recognition module's face_landmarks()
function which defines the important facial features in the passed photo using 68 points. The facial landmarks defining the top and bottom lips are then sent to the standardization function I discussed earlier. The standardization function scales the facial landmarks sent to it into set bounds. These standardized points are then sent to the format_data()
function which simply reformats the array of standardized points into a format the TensorFlow model will accept. The properly formatted data is then sent to the emotion recognition model, and the model's prediction is extracted. A time stamp is created just after the model makes its prediction. The prediction and the time stamp are then appended to a running csv file that keeps track of all the models predictions. This csv file is essentially the emotion journal, and each entry reads something like this: "happy; 23-49-31". The time stamp is the military time in hours, minutes, and then seconds.
- Raspberry Pi Setup Guide: https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up
- Raspberry Pi Camera Setup Guide: https://projects.raspberrypi.org/en/projects/getting-started-with-picamera
- pip Installation tutorial: https://pip.pypa.io/en/stable/installing/
- TensorFlow Keras tutorial (1/2): https://www.youtube.com/watch?v=cvNtZqphr6A
- TensorFlow Keras tutorial (2/2): https://www.youtube.com/watch?v=RqLD1INA_cQ
- Installing dlib (computer and Raspberry Pi): https://www.pyimagesearch.com/2018/01/22/install-dlib-easy-complete-guide/
- Converting to a.tflite model: https://www.tensorflow.org/lite/convert
- Installing TensorFlow Lite: https://www.tensorflow.org/lite/guide/python
- Using a TensorFlow Lite Model: https://www.tensorflow.org/lite/guide/inference#load_and_run_a_model_in_python
- face-recognition Library: https://pypi.org/project/face-recognition/
- Converting Video to Images with OpenCV: https://medium.com/@iKhushPatel/convert-video-to-images-images-to-video-using-opencv-python-db27a128a481
Comments
Please log in or sign up to comment.