Published July 23, 2022 © Apache-2.0

Hand exercises classification using Wio-Terminal

Classification of rehabilitation exercises for people with arthritis. Practical aspects.

AdvancedProtipOver 83 days361

Hand exercises classification using Wio-Terminal

Things used in this project

Hardware components

Seeed Studio Wio Terminal

Seeed Studio Grove - Serial Camera Kit

Software apps and online services

Arduino IDE

Edge Impulse Studio

Hand tools and fabrication machines

3D Printer (generic)

Story

Introduction

Edge computing is a type of computing in which devices are located at the user's exact physical location. This allows the devices to interact directly with the user's source of information (data). This way, it is possible to acquire and analyse signals instantly without sending them over the network for analysis. This ease of access to the data source has been made possible because, in the last decade, these devices have evolved extraordinarily. They are considering Gorden Moore's law, which states that "the number of transistors in integrated circuits will double approximately every 24 months", It is possible to understand why we can have such small and powerful devices.

Thanks to this increase in transistors, it is now possible to have microcontrollers capable of solving complex equations and even incorporating Deel Learning (DL) models.

These edge devices are essential today, as humanity generates so much data daily that it would be complicated and costly to send it to server farms for further analysis. If we add to this the energy requirements that these farms require for their operation, we can realise that edge computing and Edge-AI is the solution to many of today's problems.

With this in mind, a proposed project integrates Edge-AI, an edge learning tool, creating a system capable of classifying exercises for hand rehabilitation (Figure 1).

Figure 1, Hand gesture for basics therapy

Exercises used for rehabilitation in elderly people or people who have undergone surgery are essential, as they represent the possibility of regaining the mobility of their hand (in the case of surgery) or avoiding further deterioration of mobility during ageing. These exercises are provided by experts such as orthopaedic doctors or physiotherapists. They know which activities are the most suitable for rehabilitation; however, these exercises are performed under the supervision of these specialists. However, this is sometimes not possible - think of a person living in a rural area far away from a specialised centre. The mere fact of having to make the journey is sometimes a real problem. Fortunately, technology can help us try to solve this problem, thanks to Edge-AI.

Devices to be used

The devices for this project are mainly two, a UART camera and the Wio-Terminal. The integration of these two elements allows the classification of the exercises. The company Seeed Studio sells the UART camera; this camera works at a voltage of 5V, has 300, 000 pixels and has different resolutions such as 640*480, 320*240, and 160*120. Its serial communication speed is 9600 to 115200 baud (in my case, the speed is 9600), allows RS485 and RS232 communication protocols, JPEG photo compression, high, medium and low degrees optional, AGC, automatic exposure event control, automatic white balance control and adjustable focus (Figure 2).

Figure 2, Grove Camera

On the other hand, the Wio-Terminal (Figure 3):

Figure 3, Wio-Terminal

This device is responsible for carrying out all the processes associated with:

The Wio-Terminal and the camera are connected via the serial port located on pins 8 (BCM-14) and 10 (BCM-15), taking care that the TX pins of the camera are connected to the RX of the Wio-Terminal and the RX of the camera to the TX of the Wio-Terminal (Figure 4).

Figure 4, Camera Connection

Image Acquisition

To perform the exercise classification, it is necessary to have a set of images representing the exercises. These images have to be captured from different angles and poses, all this to give the model a better representation of the data to be classified.

We are working with small devices with limited RAM resources and computational capacity. It is necessary to make some small transformations to these images. Typically, if we use a smartphone camera, the images are huge and with a high resolution, making difficult for the Edge device to analyse them, so they must be resized. After several attempts, it was determined that the optimal image sizes to perform a correct classification on the device used in this project are 32x32 pixels.

There were captured approx. 500 images, 100 per each class of selected exercises, and a class for no activity. These images had an original size of 160x120 and were resized to 32x32 to facilitate their classification.

The model training was performed using the Edge Impulse platform, a powerful tool for creating DL models for Edge devices. On the Edge Impulse website, all the documentation necessary to perform the first training can be found.

Figure 5 shows the four original images corresponding to each class to be classified.

1 / 4 • Figure 5, Images had an original size of 160x120

And the sample Figure 6 shows the images resized to 32x32.

Figure 6, Image resized to 32x32

Once the images have been captured, the next step is to check that our device captures the images correctly. To do this, we visualise the image captured by the camera (Figure 7).

Figure 7, Show image using the Wio-Terminal

Once this visualisation has been carried out, it is necessary to reduce the size of the image. Remember that the optimum size in our case was 32x32, and there are many techniques for this resizing. However, we used a straightforward one: the average of the neighbouring pixels to the selected pixel.

To do the resizing, we transform the image, which is a 2D array, into a 1D array; this array will have a size of 19200 data. Once this transformation is done, we start the resizing to 32x32. For which we will use the following code.

int k = 0;
int jPixel = 1;
float newHeight = 32;
float newWidth = 32;
float newImage [newHeight * newWidth] = {0};

// features is the array 1D.
// newImage is the new array 1D of 32x32=1024 that corresponds to the new
// image resized.

for (y = 0; y < newHeight; y++) {
    for (x = 0; x < newWidth; x++) {
        gx = x / (float)(newWidth) * (oldWidth - 1);
        gy = y / (float)(newHeight) * (oldHeight - 1);
        gxi = int(gx);
        gyi = int(gy);
        
        p00 = features[(gyi * oldWidth) + gxi];
        p10 = features[(gyi * oldWidth) + (gxi+ jPixel)];
        p01 = features[((gyi+ jPixel) * oldWidth) + gxi];
        p11 = features[((gyi+ jPixel) * oldWidth) + (gxi+ jPixel)];
        float mPixel = c00 + c10 + c01 + c11;
        float mean = mPixel/4;
        newImage[k] = mean;
        k = k + 1;
    }
}

This newly resized image is the one we will use as a layer to the DL model, returning the name of the exercise the user is performing.

The 1D matrix data is sent by serial port to corroborate that the resizing has been done. This data is copied to make the image reconstruction using a python script, and the figure shows this reconstruction.

Figure 8, Image reconstruction

Model training

The model's training is a straightforward, visual task and offers a quick movement and an optimised model for our device. If we compare the training process of this model using Edge Impulse with the traditional way, i.e. Python, Keras, TensorFlow. You save much time, however, if you want to play with models, hyperparameters and other things. In that case, you must go back to basics, a Colaboratory notebook and start programming. I won't go into detail on how the training is done, how to create the project and how to upload the images; this documentation can be found on Google or the Edge Impulse website.

As mentioned above, the images are 32x32; once the photos have been uploaded, the features have been extracted, and the model has been trained. The next step is to see the results and statistics the tool offers. Important note, do not use transfer learning; this training technique uses 96x96 or 160x160 images. This image size is too big, and we would have an error, not being able to accommodate the data in the arena. On a side note, I do not recommend that you use EON Tuner for this project; it will return a supermodel with beautiful confusion matrices. But the vast majority with MobileNetV1 or MobileNetV2 networks, with transfer learning and 96x96 or 64x64 input.

A live classification is a handy tool; it allows us to perform various tests or validation images (Figure 8) or with signals, we acquire through the serial port.

Figure 8, Live Classification

Once our model has been validated with a single image, we have seen that it can discriminate each class to be classified. The next step is to test the model, for which all the pictures of our test set will be used. The statistics that result from this test (Confusion Matrix and F1 Score) are perhaps one the essential data to validate the model. This result does not indicate that he also classifies our model with the information he does not know. Another possible experiment is to make a small set of validation images; these images are captured in different positions under different environmental conditions like light intensity, shadow, etc.

A confusion matrix is a tool that allows the visualisation of the performance of an algorithm used in supervised learning. Each column of the matrix represents the number of predictions of each class, while each row represents the instances in the actual course. One of the benefits of confusion matrices is that they make it easy to see if the system is confusing two types.

On the other hand, the F1-Score is a measure of the accuracy of the test; it is calculated from the precision and recall of the test. Where precision is the number of true positive results divided by the number of all positive results, including those not correctly identified. Recovery is the number of valid positive results divided by the number of all samples that should have been identified as positive.

The confusion matrix and F1-Score results can be seen in the following figure in the experiments conducted to perform the exercise classification (Figure 9).

Figure 9, Confusion Matrix and F1-Score

All in all, we have an 88.4% accuracy rate, which is not bad at all. However, we can see some uncertainties and some misclassifications. The TableTop class has an accuracy of 82.9%; the missing 17.1% is 5.7% for the Arrow class, 2.9% in the Claw class and 8.6% in uncertain. In the case of Claw and Arrow, it may be because there is a possible similarity in the exercises, which is why the model gets confused; as for the uncertain ones, it is because the model cannot discriminate correctly and say to which class these images belong.

Conclusion

Edge Impulse is a powerful tool; it is the first time I have done a project in which I use this tool; usually, we program my models using Python, Keras, PyTorch and TensorFlow to embed them in the embedded systems; we use the python library tinymlgen.
We were able to classify images of hand rehab exercises with reasonable accuracy.
A portable, inexpensive and easy to use system was developed, suitable for remote locations where access to specialised medical personnel is difficult.

Future work

Use other devices, such as ESP32, Nicla Visio, Portenta X8 and Portenta H7.
Implement a LoRa data delivery system using the Helium or Sense CAP M1 network.
Integration of servers, such as AWS, for image storage.
Use AWS to allow specialised medical staff to see the progress of rehabilitation.