I work in a research lab and taking my gloves off just to do some basic arithmetic on my phone has always seemed very inefficient and kind of annoying to do, since once you take gloves off it's a nightmare to put them back on immediately afterward. Therefore, I designed this gesture calculator that allows me to do all those basic calculations without having to take my gloves off or worry about contaminating my phone. I've also wanted to dive into tinyML for a while now, so this felt like an appropriate project on which to implement my first TFLite project on the PocketBeagle.
Build InstructionsFrom a hardware point of view, this project is pretty simple, all you need is the PocketBeagle, the MPU6050 IMU, and the OLED Screen. Since the PocketBeagle might be an unfamiliar platform for some of you, I am attaching the pin diagram of it here. The top of the diagram corresponds to the micro-USB port.
- Connect Power Rail to P1_14 and/or P2_23
- Connect Ground Rail to P1_16 and/or P2_21
- Connect SCL from P1_28 to BreadBoard
- Connect SDA from P1_26 to BreadBoard
- Both OLED screen and MPU6050 use I2C, so connect SCL and SDA as indicated on the packages and in the diagram
- Fit MPU6050, PocketBeagle, and Screen into 3D printed chassis
The end product should look something like this (although your cable management might be better than mine)
There are 4 main documents in the repository linked at the end of the project, as well as 2 additional files needed for configuring the PocketBeagle pins and running the script.
Acquisition
acquisition.py collects raw acceleration and gyroscope data from the MPU6050. I set a threshold of 2Gs acceleration for detection of when a gesture begins, and after that I am collecting a fixed 350 samples, corresponding to around 2 seconds, to make it consistent and easy to input into the TensorFlow model later.
Visualization
and
Wavelet
Transform
The raw data from the IMU consists of 350 samples so that it can capture the entire motion, however, we don't really need that many samples to fully capture the overall gist of the motion. That's where the wavelet transform comes in. What it does is preserve the shape of the signal, while changing its time extension, essentially reducing the number of samples, with little loss of useful signal. (cA1, cD1) The results of the wavelet transform are 2 vectors: one with approximation coefficients, and one with detail coefficients. This doubles the number of features available for classification, by splitting the signal into 2 separate signals. We can also create even more features by using different wavelets (cA2, cD2).
Data visualization.ipynb also generates visualizations of the entire data collected for a gesture which allowed me to kind of analyze how different gestures look like as a signal and adapt my hand motions.
Training the Machine Learning Model
All of the data processing and model training are found in TrainingModel.ipynb in the repo. The dataframe from the wavelet transform contains 24 columns, corresponding to the 6 raw data columns times 2 results per wavelet transform times 2 transforms. The dataset was first normalized, mapping the values from 0 to 1, and then the values were flattened into a single vector per gesture, for easier integration with TensorFlow.
normalwavedata = (fullwavedata - fullwavedata.min()) / (fullwavedata.max()-fullwavedata.min())
formatwavedata = pd.DataFrame()
for idx, gesture in enumerate(gestures):
for i in range(1, num_samples+1):
index = idx*num_samples*wavelen + (i-1) * wavelen
wavedataf = normalwavedata.iloc[index:index+wavelen].to_numpy().flatten().tolist()
formatwavedata[idx*num_samples+i-1] = wavedataf
del wavedataf
formatwavedata = formatwavedata.transpose().to_numpy()
Afterward, the dataset was split into 3 parts: Training (70%), Testing (15%), Validation (15%). I used the Stratified Shuffle Split instead of the typical test_train split. Since my training dataset was quite small, there was a lot of variability based on what gestures were more prevalent during training, so stratified shuffle ensures the same distribution of gestures in all parts.
from sklearn.model_selection import StratifiedShuffleSplit
testsplit = StratifiedShuffleSplit(n_splits = 1, test_size = 0.15)
valsplit = StratifiedShuffleSplit(n_splits = 1, test_size = 0.15/0.85)
for train_index, test_index in testsplit.split(formatwavedata, labels):
X_train, X_test = formatwavedata[train_index], formatwavedata[test_index]
y_train, y_test = labels[train_index], labels[test_index]
for train_index, val_index in valsplit.split(X_train, y_train):
X_train, X_val = X_train[train_index], X_train[val_index]
y_train, y_val = y_train[train_index], y_train[val_index]
For the actual model, I used a sequential Keras model from TensorFlow. After some parameter optimization, I arrived at my current model architecture, which provides a good compromise between accuracy and size (since we are running on a SoC, we want the model to be relatively small for good performance). Dropout layers were added for every hidden neuron layer in order to prevent overfitting on the training dataset.
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
model = None
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1024, activation='relu', name='data')) # relu is used for performance
model.add(tf.keras.layers.Dropout(0.25)) #dropout layers help prevent overfitting
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Dense(len(gestures), activation='softmax', name='result'))
model.compile(optimizer=opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
Even though the raw gesture data looked quite noisy and complex, after processing the data and training the neural network, the model still achieves over 90% accuracy on the test data.
To truly harness the power of TinyML, you need to make the model compatible with the PocketBeagle. I converted the model to a TensorFlowLite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
Inference on the Go
The last code file in the repository, predict.py contains code similar to what I already did above. It acquires data using the same procedure as acquisition, then it performs the wavelet transforms and normalizations, the only difference being that I optimized the code to predominantly use NumPy for the PocketBeagle. To run the actual inference, you first need to allocate the tensors from the model.
interpreter = tf.lite.Interpreter(model_path='test_model.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
Then, you can just pass the data to the interpreter and it will produce an array with the confidence of each prediction.
input_data = np.float32(np.resize(gesturewave, (1, 1152)))
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
Operation InstructionsThe device is set up to run the inference code upon boot, so you just need to plug in the PocketBeagle and, after waiting for a couple of minutes, it should be up and ready to predict gestures. The flow of the calculator is as follows:
- Draw out the first number in the air
- Flick your wrist (this advances the calculator to the operator selector)
- Draw out a digit 1 - 5 to select the operator you want
- Draw out the second number in the air
- Flick your wrist again to calculate the result
Although during training the model achieves over 90% accuracy on the test data, when actually using the device and performing the gestures, the accuracy is definitely lower, most likely due to the inherent variation in how different gestures are performed and how noisy of a signal it is. Gathering more training data from multiple people doing the gestures would allow the network to better generalize its learning to gestures. Implementing more operators for the calculator and a better way for selecting them could also be done in a future iteration of this project. Until then, I think the most important addition would be creating a better casing that allows for more consistent readings and stabilizes the OLED display.
Since the device is worn as a wristband, adding some form of biometric measurement device (ex. a PPG), would allow for a similar device to be used to monitor health status or physical activity by training the model on different parameters.
Comments