Introduction
Edge computing represents a paradigm where computing devices are situated at the precise physical location of the user. This setup enables direct interaction between the devices and the user's data source, facilitating instantaneous signal acquisition and analysis without the need to transmit data over the network for processing. The remarkable evolution of these devices over the past decade, in line with Gordon Moore's law stating that "the number of transistors in integrated circuits will double approximately every 24 months, " has made this seamless access to data sources possible. Consequently, we now witness the emergence of small yet powerful devices, thanks to the exponential increase in transistors.
The proliferation of transistors has empowered microcontrollers to tackle complex equations and even integrate Deep Learning (DL) models. In today's data-rich environment, where humanity generates vast amounts of data daily, the reliance on server farms for further analysis would be both cumbersome and costly. When factoring in the substantial energy requirements of these server farms, it becomes evident that edge computing and Edge-AI offer solutions to many contemporary challenges.
In light of these considerations, the proposed project combines Edge-AI with an edge learning tool to develop a system capable of classifying exercises for hand rehabilitation (Figure 1).
Exercises designed for rehabilitation in elderly individuals or those recovering from surgery play a vital role as they offer the opportunity to regain hand mobility (in the case of surgery) or prevent further decline in mobility due to aging. Typically prescribed by experts such as orthopedic surgeons or physiotherapists, these exercises are carefully selected based on individual needs and are often conducted under the direct supervision of these specialists. However, logistical challenges can arise, particularly for individuals residing in rural areas far from specialized centers. In such cases, the logistical burden of traveling to access specialized care can pose a significant obstacle. Fortunately, technology, particularly Edge-AI, presents a promising solution to address this challenge.
ARM Cortex-M85 RA8 Vision BoardThe Vision Board development board is a new artificial vision system created through a collaboration between RT-Thread and Renesas Electronics. This development system integrates an 80 MHz Arm Cortex-M85 chip and is enhanced with cutting-edge Helium and TrustZone technologies (Figure 2).
Robust RA8 Chip Performance:
- Core: The RA8 chip boasts a formidable 480 MHz Arm Cortex-M85 core, incorporating cutting-edge Helium and TrustZone technologies.
- Storage: Integrated with 2MB/1MB flash memory and 1MB SRAM, inclusive of TCM with 512KB ECC protection.
- Peripherals: The chip offers compatibility with xSPI Quad-SPI, featuring XIP and real-time decryption/DOTF, as well as CAN-FD, Ethernet, USBFS/HS, 16-bit camera interface, and I3C, among others.
- Advanced Security: The chip features exceptional encryption algorithms, TrustZone technology, immutable storage, tamper-proofing with DPA/SPA attack protection, secure debugging, secure factory programming, and lifecycle management support. With a performance of 6.39 CoreMark/MHz, it caters to demanding IoT applications requiring high computing performance and DSP or ML capabilities.
Image Acquisition
To classify exercises accurately, a set of images representing the exercises is required. These images need to be captured from various angles and poses to provide the model with a comprehensive representation of the data.
Given the limited RAM and computational capacity of the small devices we're working with, it's necessary to perform some preprocessing on these images. Typically, smartphone camera images are large and high-resolution, which can pose challenges for Edge devices during analysis. Therefore, resizing is necessary. After experimentation, it was determined that the optimal image size for correct classification on the device used in this project is 32x32 pixels.
Approximately 500 images were captured, with 100 images per each selected exercise class, along with a class for no activity. These images were originally sized at 160x120 and were resized to 32x32 to facilitate classification.
Model training was conducted using the Edge Impulse platform, a powerful tool for creating Deep Learning models for Edge devices. The Edge Impulse website provides all the documentation necessary to perform the initial training.
And in Figure 6, the images resized to 32x32 pixels are depicted.
The first version of this project was developed using a Wio Terminal and a serial camera. You can view this project here.
In that initial project, the system captured an image and classified it. With the new Vision Board, classification is done in real-time. This represents a significant advancement, as users can now see in real-time whether the exercise they are performing is correct or not.
Model training
The model training process with Edge Impulse offers a straightforward and visually intuitive approach, resulting in quick iterations and an optimized model for our device. Comparing this training process with the traditional method using Python, Keras, and TensorFlow, Edge Impulse saves a significant amount of time. However, if you wish to experiment with different models, hyperparameters, and other aspects, you may need to revert to the basics and utilize a Colaboratory notebook for programming.
I won't delve into the specifics of how the training is conducted, how to create the project, or how to upload the images, as this information can be readily found on Google or the Edge Impulse website.
As previously mentioned, the images are resized to 32x32 pixels. Once the photos are uploaded, features are extracted, and the model is trained, the next step is to analyze the results and statistics provided by the tool. It's important to note that transfer learning is not recommended for this project. This training technique typically involves using larger image sizes such as 96x96 or 160x160, which would result in errors due to the inability to accommodate the data in the arena.
Additionally, I do not advise using EON Tuner for this project, as it may produce a supermodel with visually appealing confusion matrices. However, the majority of models generated by EON Tuner tend to utilize MobileNetV1 or MobileNetV2 networks with transfer learning and larger input sizes such as 96x96 or 64x64 pixels.
Live classification is a valuable tool as it allows us to perform various tests or validation with images or signals in real-time (see Figure 7).
Once our model has been validated with a single image, we have observed its ability to differentiate each class being classified. The next step is to test the model using all the images in our test set. The resulting statistics from this test, such as the Confusion Matrix and F1 Score, are crucial for validating the model's performance. These metrics provide insights into how well the model performs and its ability to generalize to unseen data.
The Confusion Matrix is a powerful tool for visualizing the performance of a supervised learning algorithm. Each column of the matrix represents the number of predictions for each class, while each row represents the instances of the actual classes. It helps identify if the system is confusing two types and provides insights into where the model may be making errors.
On the other hand, the F1-Score is a measure of the test's accuracy, calculated from the precision and recall of the test. Precision is the number of true positive results divided by the total number of positive results, including those incorrectly identified. Recall is the number of correctly identified positive results divided by the total number of samples that should have been identified as positive.
The results of the confusion matrix and F1-Score from the experiments conducted for exercise classification can be observed in the following figure (Figure 8).
Overall, we achieved an accuracy rate of 88.4%, which is quite satisfactory. However, upon closer examination, we observe some uncertainties and misclassifications. Specifically, the TableTop class has an accuracy of 82.9%, leaving 17.1% of instances misclassified. Within this misclassification, 5.7% pertains to the Arrow class, 2.9% to the Claw class, and 8.6% is categorized as uncertain.
The misclassifications in the Claw and Arrow classes may stem from the potential similarity between the exercises, leading to confusion for the model. As for the uncertain classifications, the model struggles to discriminate and confidently assign these images to a specific class.
Conclusion
Edge Impulse proved to be a powerful tool in this project. It was my first time using this platform, as I typically develop models using Python, Keras, PyTorch, and TensorFlow and deploy them to embedded systems using the tinymlgen Python library. With Edge Impulse, we were able to classify images of hand rehabilitation exercises with reasonable accuracy. The result is a portable, inexpensive, and user-friendly system suitable for remote locations where access to specialized medical personnel is limited.
Future Work
Moving forward, there are several avenues for further exploration:
1. Experiment with other devices such as ESP32, Nicla Visio, Portenta X8, and Portenta H7 to expand the capabilities and compatibility of the system.
2. Implement a LoRa data delivery system using the Helium or Sense CAP M1 network to enable remote monitoring and data transmission.
3. Explore the integration of cloud services such as AWS for image storage and analysis, allowing for centralized data management and scalability.
4. Utilize AWS to enable specialized medical personnel to remotely monitor and track the progress of rehabilitation, providing valuable insights and guidance to patients.
These future endeavors aim to enhance the functionality, accessibility, and effectiveness of the rehabilitation system, ultimately improving patient outcomes and quality of care.
Comments
Please log in or sign up to comment.