With the continuous improvement of living standards, people pay more and more attention to their own health problems. And exercise is an effective way to improve health. Consider the cost of hiring a professional trainer, you can try using AI to help you record your exercise.
In this project, we used YOLOv8-Pose and LSTM to monitor the exercise of the one person in the input video in real time, record the type of exercise and the number of repetitions, and output the detection results to the screen.
Step 1: Step up NVIDIA JetsonThe core hardware platform required for this project is Nvidia Jetson, So the first thing you need to do is get an edge computing device and flash JetPack to your reComputer.
You can refer to this link for the content of this section.
Step 2: Install YOLOv8 on Nvidia JetsonYOLOv8 is the latest version of the acclaimed real-time object detection and image segmentation model. YOLOv8 is built on cutting-edge advancements in deep learning and computer vision, offering unparalleled performance in terms of speed and accuracy. Its streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs.
You can follow this link to complete the YOLOv8 installation.
- Clone the following repo
git clone https://github.com/ultralytics/ultralytics.git
- Open requirements.txt
cd ultralytics
vi requirements.txt
- Edit the following lines. Here you need to press I first to enter editing mode. Press ESC, then type :wq to save and quit
# Base ----------------------------------------
matplotlib>=3.2.2
opencv-python>=4.6.0
Pillow>=7.1.2
PyYAML>=5.3.1
requests>=2.23.0
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1
tqdm>=4.64.0
- Install YOLOv8
pip3 install -e .
- Install PyTorch and Torchvision (Refer to here)
Note: Since Jetson's platform machine is aarch64, we can't use 'pip install torch torchvision' directly
- Run the following command to make sure yolo is installed properly
yolo detect predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg'
If you find this image in the “ultralytics/runs/detect/predict” folder, it indicates that you have successfully installed YOLOv8 on NVIDIA Jetson
The YOLOv8-Pose model can detect 17 key points in the human body, then select discriminative key-points based on the characteristics of the exercise.Calculate the angle between key-point lines, when the angle reaches a certain threshold, the target can be considered to have completed a certain action.By utilizing the above-mentioned mechanism, it is possible to achieve an interesting Exercise Counter Application.
- First, you need to go to the YOLOv8 installation directory and clone exercise counter demo
cd <path to ultralytics>
git clone https://github.com/yuyoujiang/exercise-counting-with-YOLOv8.git
- Then, prepare the model weights and the input video (Refer to here)
YOLOv8-pose pretrained pose models are PyTorch models and you can directly use them for inferencing on the Jetson device. However, to have a better speed, you can convert the PyTorch models to TensorRT optimized models by following below instructions.
- Execute the following command to convert this PyTorch model into a TensorRT model
# TensorRT FP32 export
yolo export model=<model_path> format=engine device=0
# TensorRT FP16 export
yolo export model=<model_path> format=engine half=True device=0
Tip:Click here to learn more about yolo export
- Prepare a video and run it!
# For video
python3 demo.py --sport <exercise_type> --model <path_to_your_model> --show True --input <path_to_your_video>
# For webcam
python3 demo.py --sport <exercise_type> --model <path_to_your_model> --show True --input 0
Note: To run the exercise counter, enter the following commands with the exercise_type as: situp、pushup and squat
Frankly, there are many ways to implement execise type detection. For example, you can treat execise type detection as a video regression task. In this project, we use a faster scheme for execise detection.
First, it is easy for YOLOv8 to predict the keypoints of pose in each frame. Then construct a simple neural network and take these key points as input to realize execise detection. Therefore, in this section, we will provide a complete neural network application case.
Step4.1: Construct a Network odel
The input of the model is the key point in several consecutive frames,and the output is the type of execise of the person in this video. Considering the input sequence is continuous, we use LSTM as feature extraction network. Then the features are input into the full connection layer to predict the classification probability of different labels.
import torch
import torch.nn as nn
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, num_layers, output_dim, device=torch.device('cuda:0')):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.device = device
self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True).to(self.device)
self.fc = nn.Linear(hidden_dim, output_dim).to(self.device)
def forward(self, x):
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_().to(x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_().to(x.device)
out, _ = self.lstm(x, (h0.detach(), c0.detach()))
out = self.fc(out[:, -1, :])
out = nn.functional.softmax(out, dim=1)
return out
Step4. 2: Collect Data
Here we need to prepare enough data to train the execise detection model. I provide a simple script to implement the data acquisition process from video to pose keypoints.
cd <path to get_data_from_video.py>
python get_data_from_video.py --model <path to model> --input_video <path to video> --data_save_path <path to data save path>
The organized data set should be structured like this:
├─data
│ ├─pushup
│ │ ├─001.csv
│ │ ├─002.csv
│ │ ├─...
│ ├─situp
│ │ ├─001.csv
│ │ ├─002.csv
│ │ ├─...
│ ├─squat
│ │ ├─001.csv
│ │ ├─002.csv
│ │ ├─...
│ ├─...
│
Note that the sample only demonstrates the construction process of the dataset, so if you want to make your model more robust, add enough training data to the dataset.
Step4. 3: Model Training
Try training the model with this script:
cd <path to train.py>
python train.py --save_dir <./checkpoint>
To make it easier for users to read and modify, only the necessary code is retained throughout the script. You'll understand the whole code quickly.
Two files are generated when the script is finished,trained models and index to categories.
Step4. 4: Inference
Run the following command to test whether your neural network works as expected:
cd <path to Inference.py>
python Inference.py
Step5: Final ProductHere we can combine counting with detection tasks. Execute the following script for more intelligent functionality.
python demo_pro.py --detector_model <path_to_your_checkpoint> --show True --input <path_to_your_input_video>
Comments
Please log in or sign up to comment.