Application

Published June 30, 2024 © MIT

Assistant Based On Grove Vision AI Module V2

Introducing our innovative assistant designed to revolutionize physical rehabilitation and health monitoring for the elderly.

ExpertProtip125 days499

Assistant Based On Grove Vision AI Module V2

Things used in this project

Hardware components

Seeed Studio Grove Vision AI Module V2

Seeed Studio Grove - OLED Display 0.66" (SSD1306)- IIC - 3.3V/5V

Seeed Studio XIAO SAMD21 (Pre-Soldered) - Seeeduino XIAO

Software apps and online services

Arduino IDE

Onshape

Hand tools and fabrication machines

3D Printer (generic)

Story

Introduction

The advancement of artificial intelligence (AI) and computer vision has opened new possibilities in automation and human-machine interaction. In this context, the Grove Vision AI V2 module emerges as a powerful and versatile tool. Designed for easy integration into electronics and robotics projects, this module enables developers to implement advanced image and object recognition capabilities without requiring extensive programming experience or handling complex hardware.

The Grove Vision AI V2, developed by Seeed Studio, is a significant upgrade from its predecessor, offering improvements in accuracy, speed, and functionality. This compact module includes a dedicated processor for artificial vision, allowing the execution of machine learning models directly on the device. This eliminates the need to transmit data to an external server for processing, thereby reducing latency and enhancing data privacy and security.

Equipped with a camera and various communication interfaces, the Grove Vision AI V2 is compatible with a range of platforms such as Arduino, Raspberry Pi, and other popular development boards. Its ease of integration and use makes it an ideal solution for applications in smart homes, industrial automation, security and surveillance, as well as educational projects.

This assistant, based on the Grove Vision AI V2, can recognize and classify a wide variety of objects, from human faces to vehicles and traffic signs, offering a flexible platform for developing computer vision applications. Additionally, its machine learning capabilities allow for the adaptation and training of new models tailored to different tasks and environments, making this module an indispensable tool for innovators and technology enthusiasts.

Grove Vision AI Module V2

Application

Industrial Automation: Quality inspection, predictive maintenance, voice control, etc.
Smart Cities: Device monitoring, energy management, etc.
Transportation: Status monitoring, location tracking, etc.
Smart Agriculture: Environmental monitoring, etc.
Mobile IoT Devices: Wearable devices, handheld devices, etc.

For more information about the Grove Vision AI V2 system, please refer to the official documentation and resources provided by Seeed Studio. These resources offer comprehensive details on setup, integration, and usage, enabling you to fully utilize the capabilities of this powerful module in your projects.

The AssistantPart 1

Edge computing represents a paradigm where computing devices are situated at the precise physical location of the user. This configuration enables direct interaction between devices and the user's data source, facilitating instant acquisition and analysis of signals and images without the need to transmit data over the network for processing. The remarkable evolution of these devices over the past decade, in line with Gordon Moore's law which states that "the number of transistors on integrated circuits will approximately double every 24 months, " has made access to data sources possible. As a result, we are now witnessing the emergence of small yet powerful devices, thanks to the exponential increase in the number of transistors.

The proliferation of transistors has enabled microcontrollers to tackle complex equations and even integrate Deep Learning (DL) models. In today's data-rich environment, where humanity generates vast amounts of data daily, relying on server farms for analysis would be both cumbersome and costly. Given the substantial energy requirements of these server farms, it becomes clear that edge computing and Edge AI offer solutions to many contemporary challenges.

Considering these aspects, the proposed project merges Edge AI with cloud computing and edge learning tools to develop a system capable of performing multi-class classification efficiently.

This means that the agent can have an embedded model within the Himax WE2 Chip, which can host any model (e.g., hand exercise classification) while the ESP32 can access the images and send them to a web server (in the cloud) for analysis to detect persons, objects, types of food, etc.

Considering these elements, the proposed project combines Edge AI with cloud computing and edge learning tools to create a system capable of performing efficient multi-class classification.

Assistant Design

The primary objective of the assistant in this project is to accompany elderly individuals or those who are alone. This assistant incorporates a series of hardware and software tools that allow it to interact with the user.

Hardware Design

Torso Control: The robot's body is composed of a torso controlled by a XIAO-Seeduino. This microcontroller manages 12 LX16-A servo motors.
Movement Monitoring: The XIAO contains a set of angles representing typical upper limb exercises. The assistant uses a Human Pose Detection model to monitor whether the exercises are being performed as accurately as possible.
Display: The assistant is designed to be as small and versatile as possible. It is equipped with a 124x64 pixel OLED screen, which can display various functionalities such as heart signal monitoring, as well as an animated face to empathize with the user.

Software Features

Human Pose Detection: This model ensures that the exercises are performed correctly by comparing the user's movements to predefined angles.
Interactivity: The animated face on the OLED screen helps in building a connection with the user, making the interaction more engaging.

Integration

The assistant's design leverages the capabilities of Edge AI and cloud computing to deliver efficient and responsive assistance. By embedding a model within the Himax WE2 Chip and utilizing the ESP32 for image processing and cloud communication, the system provides robust multi-class classification and monitoring.

Overall, the assistant is a compact, versatile, and intelligent companion designed to support and monitor the well-being of its users. (Eye)

1 / 4

Modelo 3D, OnShape

Internally, the assistant leverages the Grove Vision AI V2 system from Seeed Studio and a Xiao ESP32-C3. These two devices work together to provide the user with a seamless experience, allowing effective interaction and assistance with their daily activities.

1 / 5

AI Models Embedded Within Grove Vision AI V2

Choosing which classification model to embed within the Grove Vision AI V2 was a challenging decision. The assistant needed to classify various objects using the same image, making the selection process critical.

We considered integrating several models, including:

- Types of food using the Food-101 dataset

- Hand gestures for basic therapy

Ultimately, we decided to embed the vision system with the Pose Detection model. The reasons for this choice will be explained later. The Pose Detection model can be found at the following link.

Pose Detection

To fully exploit the system's potential, a web service was created using Flask, as explained below.

Classification Using Other Models in the Network

To perform the classification of other models using network systems, the following infrastructure has been designed. The system operates on three hierarchical levels. First, there is the Edge node, which is closest to the user and represented in this project by the companion robot. This node has direct access to the user's daily activities. Second, there is the Fog layer, consisting of a small server located in the user's home. This server can be built using a Jetson Nano Origin or a Jetson AGX Xavier.

However, in this project, due to the unavailability of these development systems, a reTerminal and two Intel Neural Sticks were used instead.

In this Fog layer, other models are stored, and the assistant can access them through a web service. This web service was programmed using Flask.

Flask is a lightweight and flexible web development framework for Python. It is known for its simplicity and ease of use, making it ideal for both beginner developers and more advanced projects that require custom configurations. To install Flask, we used the following command:

pip install Flask

Basic example of a web server with Flask:

from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def home():
    return "Hello, Flask!"

@app.route('/hello/<name>')
def hello_name(name):
    return render_template('hello.html', name=name)

if __name__ == '__main__':
    app.run(debug=True)

This web server receives the image sent by the XIAO-ESP32-C3. The image is encoded in base64 format. The server performs the decoding of this image into a format that can be processed by OpenCV.

import base64
import numpy as np
import cv2
index = 4
# Supongamos que tienes la cadena base64 (base64_string)
# Decodificar la cadena base64 en bytes
image_bytes = base64.b64decode(base64_string[index])
# Convertir los bytes en un arreglo numpy
image_array = np.frombuffer(image_bytes, dtype=np.uint8)
# Decodificar el arreglo numpy a una imagen de OpenCV
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
image = cv2.rotate(image, cv2.ROTATE_180)
h, w, d = np.shape(image)

Or any other type of model that the user wants to add to the server. However, the reTerminal itself cannot perform this classification. In order to improve the performance of the reTerminal when classifying the images sent by the assistant, two Intel® Neural Compute Stick 2 devices have been installed.

Intel® Neural Compute Stick 2 (NCS2) is a USB device designed by Intel that enables efficient and powerful artificial intelligence (AI) inference at the edge. Developed by Intel, the NCS2 is intended for developing and deploying computer vision and deep neural network applications.

The steps for installing everything necessary for its proper operation can be found at the following link: link.

Hand gestures for basic therapy

Exercises designed for rehabilitation in the elderly or those recovering from surgery play a crucial role, as they offer the opportunity to regain hand mobility (in the case of surgery) or to prevent further decline in mobility due to aging. Typically, these exercises are prescribed by experts such as orthopedic surgeons or physical therapists, who carefully select them based on individual needs and often directly supervise their execution. However, logistical challenges can arise, especially for individuals living in rural areas far from specialized centers. In such cases, the logistical burden of traveling to access specialized care can pose a significant obstacle. Fortunately, technology, particularly Edge Artificial Intelligence (Edge-AI), offers a promising solution to address this challenge by facilitating remote rehabilitation exercises for hand mobility (Figure 1).

Hand gestures for basic therapy

All of this is integrated into a small, portable assistant that not only allows the user to classify these conditions but also to adapt the models to their specific needs. This customization capability aims to provide greater versatility and robustness in performing classifications, thereby optimizing the system's performance and accuracy.

Model training

The model's training is a straightforward, visual task and offers a quick movement and an optimised model for our device. If we compare the training process of this model using Edge Impulse with the traditional way, i.e. Python, Keras, TensorFlow. You save much time, however, if you want to play with models, hyperparameters and other things. In that case, you must go back to basics, a Colaboratory notebook and start programming. I won't go into detail on how the training is done, how to create the project and how to upload the images; this documentation can be found on Google or the Edge Impulse website.

As mentioned above, the images are 32x32; once the photos have been uploaded, the features have been extracted, and the model has been trained. The next step is to see the results and statistics the tool offers. Important note, do not use transfer learning; this training technique uses 96x96 or 160x160 images. This image size is too big, and we would have an error, not being able to accommodate the data in the arena. On a side note, I do not recommend that you use EON Tuner for this project; it will return a supermodel with beautiful confusion matrices. But the vast majority with MobileNetV1 or MobileNetV2 networks, with transfer learning and 96x96 or 64x64 input.

A live classification is a handy tool; it allows us to perform various tests or validation images (Figure 8).

Once our model has been validated with a single image, we have seen that it can discriminate each class to be classified. The next step is to test the model, for which all the pictures of our test set will be used. The statistics that result from this test (Confusion Matrix and F1 Score) are perhaps one the essential data to validate the model. This result does not indicate that he also classifies our model with the information he does not know. Another possible experiment is to make a small set of validation images; these images are captured in different positions under different environmental conditions like light intensity, shadow, etc.

A confusion matrix is a tool that allows the visualisation of the performance of an algorithm used in supervised learning. Each column of the matrix represents the number of predictions of each class, while each row represents the instances in the actual course. One of the benefits of confusion matrices is that they make it easy to see if the system is confusing two types.

On the other hand, the F1-Score is a measure of the accuracy of the test; it is calculated from the precision and recall of the test. Where precision is the number of true positive results divided by the number of all positive results, including those not correctly identified. Recovery is the number of valid positive results divided by the number of all samples that should have been identified as positive.

The confusion matrix and F1-Score results can be seen in the following figure in the experiments conducted to perform the exercise classification (Figure 9).

All in all, we have an 88.4% accuracy rate, which is not bad at all. However, we can see some uncertainties and some misclassifications. The TableTop class has an accuracy of 82.9%; the missing 17.1% is 5.7% for the Arrow class, 2.9% in the Claw class and 8.6% in uncertain. In the case of Claw and Arrow, it may be because there is a possible similarity in the exercises, which is why the model gets confused; as for the uncertain ones, it is because the model cannot discriminate correctly and say to which class these images belong.

The Assistant Part 2

Although the initial idea of the assistant was conceived as a portable system behaving like an intelligent companion assistant, there arose the necessity to incorporate a body into this assistant in order to fully utilize all the features of the Grove Vision AI V2 and the XIAO technology that Seeed Studio has been developing in recent years.

The robot body is equipped with 12 LX-16A servomotors, each with its own control system. This system serves as an interface to control the servos from the assistant.

Robot designed with Onshape

Concept Overview

The primary concept behind this design is to utilize the assistant as a tool for suggesting and monitoring physical activities for elderly individuals. Given the limited availability of a single Grove port, through which the XIAO ESP32 receives inferences from the vision system, it became necessary to incorporate a Seeeduino XIAO. This additional component facilitates the reception of data, specifically the points obtained from the estimation of the person's pose, as illustrated in the following image.

The assistant's design focuses on enhancing the quality of life for elderly individuals by providing a reliable means of tracking and suggesting physical activities. This is achieved by integrating advanced vision systems capable of inferring physical movements and translating these into actionable insights.

To accommodate the need for additional data reception, a Seeeduino XIAO was incorporated into the system. This addition addresses the limitation posed by the single available Grove port on the XIAO ESP32. The Seeeduino XIAO's role is to handle the influx of data points generated by the pose estimation process, ensuring seamless data transfer and processing.

The data points obtained from the pose estimation system are critical for accurately monitoring the physical activities of elderly users. This system captures the user’s movements and provides precise feedback on their physical activities, allowing the assistant to offer tailored suggestions and track progress effectively. The image below demonstrates the visualization and processing of the pose detected by the assistant's estimation data.

Human Pose

The images below display a set of postures that the user needs to replicate.

1 / 4 • Robot Pose

By integrating the Seeeduino XIAO with the XIAO ESP32, the assistant becomes a more robust and capable tool. This setup allows for real-time monitoring and feedback, which is essential for ensuring that elderly users can engage in physical activities safely and effectively. The assistant not only tracks their movements but also provides valuable suggestions to help them stay active and healthy.

Once these points were obtained, the angles were calculated and sent to the robot, which then replicates the user's movements. However, the system can also function in reverse. In this mode, the robot performs a series of preprogrammed movements that the user must replicate. The importance of the Grove Vision AI V2 lies in its ability to determine whether the user's activity is performed correctly, ensuring that the exercises are executed as intended.

Conclusions

The developed assistant presents significant advantages for both physical rehabilitation and health monitoring in elderly individuals. Utilizing a combination of powerful and accessible hardware, such as the Seeeduino XIAO and the Grove Vision AI V2, we have created a system capable of performing complex tasks efficiently and accurately.

Movement Replication: The system can replicate the user's movements by calculating the necessary angles from key points obtained by the Grove Vision AI V2.
Inverse Functionality: It can also operate inversely, with the robot performing preprogrammed movements that the user must imitate, aiding in rehabilitation exercises and ensuring they are performed correctly.
WiFi Capabilities: Equipped with WiFi capabilities, the assistant can send collected data to a server for analysis and storage, allowing remote evaluation by health professionals.

Future Work