Desk workers (including programmers like you and me) are damaging their health every day due to lack of stretch. So, I decided to create a stretching app for desk workers. This app displays simple stretches that can be done at the desk, compares your pose detected by the camera with the correct pose, and checks if you are in the correct pose.
If you use laptops, energy efficiency is important. So, using AMD Ryzen AI to make the app energy efficient is a great idea.
In this document, I will explain how to develop the app which runs on AMD Ryzen AI.
Features- Pose estimation from camera and comparison with correct pose
- High power efficiency, with half the inference time compared to CPU-only (in the case of Ryzen 9 Pro 7940HS) thanks to the Ryzen NPU
After installing Ryzen AI Software, you can install the app by following the steps below.
Open a Anaconda Powershell Prompt and run the following commands:
git clone https://github.com/ryomo/stretchcam.git
cd stretchcam
# Create a new conda environment from existing Ryzen AI environment
conda create --name stretchcam --clone <your-ryzen-ai-env>
conda activate stretchcam
# Install Kivy and other dependencies
conda install kivy=2.1.0 -c conda-forge
# NOTE: If `opencv-python` is already installed, uninstall it first to avoid conflicts with `opencv-contrib-python`.
pip install -r requirements.txt
- If you don't mind polluting your Ryzen AI environment, you can install the app directly in the Ryzen AI environment.
conda activate stretchcam
python main.py
Yes, this is the most important part.
Please refer to the quantize.py
file.
NOTE: In the quantization phase, I recommend disabling the NPU cache. Enabling the cache skips the compilation process, so you may not notice errors from the compilation process. At the top of the quantize.py
file, I put os.environ["XLNX_ENABLE_CACHE"] = "0"
to disable the cache.
path = kagglehub.model_download("google/movenet/tensorFlow2/" + model_name)
This section is simply downloading the MoveNet model from Kaggle.
Alternatively, you can also download the model from the web.
See google/movenet
Step-1I choose Vitis AI Quantizer for ONNX flow to quantize the model.
So, I convert the TensorFlow model to ONNX format.
completed_process = subprocess.run(
["python", "-m", "tf2onnx.convert", "--opset", "13", "--saved-model", input_model_dir, "--output", output_model],
)
This Python script is the same as the command python -m tf2onnx.convert --opset 13 --saved-model input_model_dir --output output_model
.
tf2onnx
is a great tool for converting TensorFlow models to ONNX format.
shape_inference.quant_pre_process(
input_model,
output_model,
auto_merge=True, # If False (by default), 'Incomplete symbolic shape inference' exception will be raised.
)
This step is recommended in https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#recommended-pre-processing-on-the-float-model.
auto_merge=True
is important because it prevents the 'Incomplete symbolic shape inference' exception.
Document is here: https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#quantizing-using-the-vai-q-onnx-api.
This is the most difficult part of the process.
So, I will explain it starting from the simplest version below.
calibration_data_reader = None
vai_q_onnx.quantize_static(
input_model,
output_model,
calibration_data_reader,
# Recommended settings for CNNs on NPU
# https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#cnns-on-npu
quant_format=vai_q_onnx.QuantFormat.QDQ,
calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
activation_type=vai_q_onnx.QuantType.QUInt8,
weight_type=vai_q_onnx.QuantType.QInt8,
enable_ipu_cnn=True,
)
- You can set
calibration_data_reader
toNone
. It is necessary to create it to improve accuracy, but it is not necessary for the first time. - MoveNet is a CNN model, so you can set
quant_format
,calibrate_method
,activation_type
, andweight_type
to the recommended settings for CNNs on NPU. - Set
enable_ipu_cnn
toTrue
to run the quantized model on the NPU. You may notice that the docs mention `enable_dpu`, but it is deprecated and you will see the warning message.
After editing the quantize.py
file, run the following command:
python quantize.py
You will see the message [Vitis AI EP] No. of Operators : CPU 666
.
This means that the quantized model is not running on the NPU yet.
Next, you need to fix this.
Step-3.2 Fix the errorBefore going further, I would recommend you to set enable_step0
, enable_step1
, and enable_step2
to False
in config/default.ini
to avoid running the same steps again.
In PowerShell, you can see lots of messages.
Most of them are warnings, but the message starting with F
seems to be the fatal error you need to fix.
Below is the fatal error message.
F20240730 21:04:17.332799 29900 ReplaceConstPass.cpp:88] Check failed: xir::create_data_type<float>() == op_const->get_output_tensor()->get_data_type() || xir::create_data_type<double>() == op_const->get_output_tensor()->get_data_type() The data type of xir::Op{name = Resize__349:0_vaip_161_transfered_DwDeConv_weights, type = const}'s output tensor, xir::Tensor{name = Resize__349:0_vaip_161_transfered_DwDeConv_weights, type = INT32, shape = {1, 4, 4, 64}} only supports float now.
This error seems to be Ryzen AI Software side, so you need to fix it by excluding Resize__349:0_vaip_161_transfered_DwDeConv_weights
from quantization.
So, I checked the Resize__349
node using Netron, added nodes_to_exclude=["Resize__349"]
option in vaip_q_onnx.quantize_static()
to exclude the node.
But, it didn't work. I don't know why.
Next, I tried op_types_to_quantize
option.
In this model, the Conv
op type is most frequently used, the next is Clip
, and the next is Add
. But adding Add
to op_types_to_quantize
caused the same error.
So, I added Conv
and Clip
to op_types_to_quantize
.
op_types_to_quantize=[
# Op Type, Node Count, Note
"Conv", # 74
"Clip", # 35
# "Add", # 19 error
# "Unsqueeze", # 12
# "Cast", # 10
# "Reshape", # 9
# "Relu", # 7
# "Sub", # 6
# "Mul", # 5
# "Concat", # 5
# "Transpose", # 4
# "Squeeze", # 4
# "GatherND", # 4
# "Split", # 3
# "Resize", # 3 error
# "Div", # 3
# "Sigmoid", # 2
# "Pow", # 2
# "ArgMax", # 2
# "Sqrt", # 1 error
],
You can run the quantization process again.
python quantize.py
You will see the message below.
[Vitis AI EP] No. of Operators : CPU 106 IPU 439 80.55%
[Vitis AI EP] No. of Subgraphs : CPU 5 IPU 4 Actually running on IPU 4
Although the number of operators running on the NPU is not 100%, the quantized model is running on the NPU.
Now, you can run python main.py
to check the performance.
You will see the pose estimation is not working.
This is because the accuracy is low and keypoint_score_th = 0.4
in config/default.ini
is too high for the low accuracy model.
Next, I will explain how to improve the accuracy.
Step-3.3 Improve the accuracyCalibrationDataReader
To improve the accuracy, you need to create a calibration data reader.
First, you need to create a calibration image dataset.
You can take the calibration images by the camera, and put them in the datasets/mypose
directory.
10 images seems to be enough. But if you want to put more images, you need to change calibration_image_count = 100
in the configuration file.
Next, you need to create a class that reads the calibration images.
Refer to library/calibration_data_reader.py
.
class ImageDataReader(CalibrationDataReader):
"""
A class that reads image data for calibration.
"""
def __init__(
self,
image_folder,
input_size: int,
process_num=100,
model_input_name="input",
preprocess_image_astype="int32",
):
self.image_folder = image_folder
self.input_size = input_size
self.model_input_name = model_input_name
self.process_count = 0
self.process_num = process_num
self.preprocess_image_astype = preprocess_image_astype
# Files in the image_folder to be enumerated
images = os.listdir(image_folder)
self.enumerate_images = iter(images)
# Count the number of images
self.image_count = len(images)
print(f"Found {self.image_count} images in {image_folder}")
def get_next(self):
"""
generate the input data dict for ONNXinferenceSession run
"""
# Limit the number of images to be processed
if self.process_count >= self.process_num:
return None
image_file = next(self.enumerate_images, None)
if image_file is None:
return None
# Read image and preprocess
image = cv2.imread(os.path.join(self.image_folder, image_file))
image_data = Inference.preprocess(
image, self.input_size, self.preprocess_image_astype
)
# Print progress
self.process_count += 1
if self.process_count % 100 == 0:
print(f"Processed {self.process_count} images")
return {self.model_input_name: image_data}
This looks like a lot of work, but it's actually quite simple.
The point is to read the image and preprocess it in the get_next()
method.
You can also refer to test_calibration_data_reader.py
to see how it works.
After creating the calibration data reader, you need to set the calibration_data_reader
in quantize.py
.
calibration_data_reader = ImageDataReader(image_dir, input_size, image_count)
Now, you can run python quantize.py
and python main.py
to check the accuracy.
Better accuracy has been achieved, right?
Cross Layer Equalization
To improve the accuracy of further, you can use Cross Layer Equalization (CLE).
It is very effective for CNN models like MoveNet and easy to use, so I recommend using it.
Just add the following options to vaip_q_onnx.quantize_static()
.
Please refer to quantize.py
for the complete code.
# Enable CLE for better accuracy
# https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#quantizing-using-cross-layer-equalization
include_cle=True,
extra_options={
"ActivationSymmetric": True,
"ReplaceClip6Relu": True,
"CLESteps": 1,
"CLEScaleAppendBias": True,
},
Done!
Now, run python quantize.py
and python main.py
again.
You will see the accuracy is improved.
You can set enable_npu
to True
or False
in config/default.ini
to compare the inference times between the NPU and CPU.
enable_npu = False
:
enable_npu = True
:
https://ryzenai.docs.amd.com/en/1.1/onnx_e2e.html
The pre/post-processing operations are currently running on the CPU, but they can be run on the NPU.
This will improve the performance of the app.
This time, I didn't do it because time was limited.
Ryzen AI Software
Some op types are not quantized by Ryzen AI Software yet, so quantized models may not be fast enough.
When some op types are running on the CPU instead of the NPU, the performance seems to degrade because of the memory transfer between the CPU and the NPU.
In the future, I would like to see more op types quantized.
Additionally, the Ryzen AI Software only supports Int8 quantization at the moment.
I hope that Float quantization will be supported soon, which will improve the accuracy of the some types of quantized models.
NOTE1: In this project, I'm using Ryzen AI Software 1.1, but 1.2 has been released recently.
NOTE2: Ryzen AI Software highly depends on ONNX Runtime, so some of the issues may be related to ONNX Runtime.
Hardware SideMy Ryzen 9 Pro 7940HS is a great processor, but it is not the best for AI workloads.
I want to test the Ryzen AI 9 HX 375 in the near future, which is expected to have a significant performance improvement.
Other notesWhy didn't I use amd/movenet from model zoo?There is a MoveNet model in the RyzenAI Pre-Optimized Model Zoo, and it is easy to use.
However, the MoveNet model from the RyzenAI Pre-Optimized Model Zoo is made from an unofficial MoveNet implementation, which is not as accurate as the official models from Google.
I want to quantize the model myself :)
So, I decided to use the official model and quantize it myself.
NOTE: The unofficial MoveNet model was developed before the official one was released by Google, so it is not a bad model at all.
Why didn't I use YOLOv8 Pose?https://docs.ultralytics.com/tasks/pose/
YOLOv8 Pose is a great model.
But, when converting it to ONNX format, I encountered some issues.
Additionally, searching information about YOLO is not easy, because there are lots of AI generated comments in the GitHub issues and discussions. Sadly, I couldn't find the appropriate information to solve the issues.
So, I gave up using YOLOv8 Pose.
Comments