Published October 10, 2024 © MIT

Energy efficient pose estimation using AMD Ryzen AI

This is a stretching app that uses MoveNet quantized for AMD Ryzen AI to efficiently estimate your pose from the camera.

IntermediateFull instructions provided89

Energy efficient pose estimation using AMD Ryzen AI

Things used in this project

Hardware components

AMD Ryzen AI Powered PC

Camera (generic)

A laptop's built-in camera or a usb webcam

Software apps and online services

AMD Ryzen AI Software

Microsoft Windows 11

Story

Introduction

Desk workers (including programmers like you and me) are damaging their health every day due to lack of stretch. So, I decided to create a stretching app for desk workers. This app displays simple stretches that can be done at the desk, compares your pose detected by the camera with the correct pose, and checks if you are in the correct pose.

If you use laptops, energy efficiency is important. So, using AMD Ryzen AI to make the app energy efficient is a great idea.

In this document, I will explain how to develop the app which runs on AMD Ryzen AI.

Features

Pose estimation from camera and comparison with correct pose
High power efficiency, with half the inference time compared to CPU-only (in the case of Ryzen 9 Pro 7940HS) thanks to the Ryzen NPU

How to install & use

Installation

After installing Ryzen AI Software, you can install the app by following the steps below.

Open a Anaconda Powershell Prompt and run the following commands:

git clone https://github.com/ryomo/stretchcam.git
cd stretchcam

# Create a new conda environment from existing Ryzen AI environment
conda create --name stretchcam --clone <your-ryzen-ai-env>
conda activate stretchcam

# Install Kivy and other dependencies
conda install kivy=2.1.0 -c conda-forge

# NOTE: If `opencv-python` is already installed, uninstall it first to avoid conflicts with `opencv-contrib-python`.
pip install -r requirements.txt

If you don't mind polluting your Ryzen AI environment, you can install the app directly in the Ryzen AI environment.

Usage

conda activate stretchcam
python main.py

How do I develop?

Yes, this is the most important part.

Please refer to the quantize.py file.

NOTE: In the quantization phase, I recommend disabling the NPU cache. Enabling the cache skips the compilation process, so you may not notice errors from the compilation process. At the top of the quantize.py file, I put os.environ["XLNX_ENABLE_CACHE"] = "0" to disable the cache.

Step-0

path = kagglehub.model_download("google/movenet/tensorFlow2/" + model_name)

This section is simply downloading the MoveNet model from Kaggle.

Alternatively, you can also download the model from the web.

See google/movenet

Step-1

I choose Vitis AI Quantizer for ONNX flow to quantize the model.

So, I convert the TensorFlow model to ONNX format.

completed_process = subprocess.run(
    ["python", "-m", "tf2onnx.convert", "--opset", "13", "--saved-model", input_model_dir, "--output", output_model],
)

This Python script is the same as the command python -m tf2onnx.convert --opset 13 --saved-model input_model_dir --output output_model.

tf2onnx is a great tool for converting TensorFlow models to ONNX format.

See onnx/tensorflow-onnx

Step-2

shape_inference.quant_pre_process(
    input_model,
    output_model,
    auto_merge=True,  # If False (by default), 'Incomplete symbolic shape inference' exception will be raised.
)

This step is recommended in https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#recommended-pre-processing-on-the-float-model.

auto_merge=True is important because it prevents the 'Incomplete symbolic shape inference' exception.

Step-3.1 Try quantization

Document is here: https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#quantizing-using-the-vai-q-onnx-api.

This is the most difficult part of the process.

So, I will explain it starting from the simplest version below.

calibration_data_reader = None

vai_q_onnx.quantize_static(
    input_model,
    output_model,
    calibration_data_reader,
    # Recommended settings for CNNs on NPU
    # https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#cnns-on-npu
    quant_format=vai_q_onnx.QuantFormat.QDQ,
    calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
    activation_type=vai_q_onnx.QuantType.QUInt8,
    weight_type=vai_q_onnx.QuantType.QInt8,
    enable_ipu_cnn=True,
)

You can set calibration_data_reader to None. It is necessary to create it to improve accuracy, but it is not necessary for the first time.
MoveNet is a CNN model, so you can set quant_format, calibrate_method, activation_type, and weight_type to the recommended settings for CNNs on NPU.
Set enable_ipu_cnn to True to run the quantized model on the NPU. You may notice that the docs mention `enable_dpu`, but it is deprecated and you will see the warning message.

After editing the quantize.py file, run the following command:

python quantize.py

You will see the message [Vitis AI EP] No. of Operators : CPU 666.

This means that the quantized model is not running on the NPU yet.

Next, you need to fix this.

Step-3.2 Fix the error

Before going further, I would recommend you to set enable_step0, enable_step1, and enable_step2 to False in config/default.ini to avoid running the same steps again.

In PowerShell, you can see lots of messages.

Most of them are warnings, but the message starting with F seems to be the fatal error you need to fix.

Below is the fatal error message.

F20240730 21:04:17.332799 29900 ReplaceConstPass.cpp:88] Check failed: xir::create_data_type<float>() == op_const->get_output_tensor()->get_data_type() || xir::create_data_type<double>() == op_const->get_output_tensor()->get_data_type() The data type of xir::Op{name = Resize__349:0_vaip_161_transfered_DwDeConv_weights, type = const}'s output tensor, xir::Tensor{name = Resize__349:0_vaip_161_transfered_DwDeConv_weights, type = INT32, shape = {1, 4, 4, 64}} only supports float now.

This error seems to be Ryzen AI Software side, so you need to fix it by excluding Resize__349:0_vaip_161_transfered_DwDeConv_weights from quantization.

So, I checked the Resize__349 node using Netron, added nodes_to_exclude=["Resize__349"] option in vaip_q_onnx.quantize_static() to exclude the node.

But, it didn't work. I don't know why.

Next, I tried op_types_to_quantize option.

In this model, the Conv op type is most frequently used, the next is Clip, and the next is Add. But adding Add to op_types_to_quantize caused the same error.

So, I added Conv and Clip to op_types_to_quantize.

op_types_to_quantize=[
    # Op Type,      Node Count, Note
    "Conv",         # 74
    "Clip",         # 35
    # "Add",        # 19        error
    # "Unsqueeze",  # 12
    # "Cast",       # 10
    # "Reshape",    #  9
    # "Relu",       #  7
    # "Sub",        #  6
    # "Mul",        #  5
    # "Concat",     #  5
    # "Transpose",  #  4
    # "Squeeze",    #  4
    # "GatherND",   #  4
    # "Split",      #  3
    # "Resize",     #  3        error
    # "Div",        #  3
    # "Sigmoid",    #  2
    # "Pow",        #  2
    # "ArgMax",     #  2
    # "Sqrt",       #  1        error
],

You can run the quantization process again.

python quantize.py

You will see the message below.

[Vitis AI EP] No. of Operators :   CPU   106    IPU   439  80.55%
[Vitis AI EP] No. of Subgraphs :   CPU     5    IPU     4 Actually running on IPU     4

Although the number of operators running on the NPU is not 100%, the quantized model is running on the NPU.

Now, you can run python main.py to check the performance.

You will see the pose estimation is not working.

This is because the accuracy is low and keypoint_score_th = 0.4 in config/default.ini is too high for the low accuracy model.

Next, I will explain how to improve the accuracy.

Step-3.3 Improve the accuracy

CalibrationDataReader

To improve the accuracy, you need to create a calibration data reader.

First, you need to create a calibration image dataset.

You can take the calibration images by the camera, and put them in the datasets/mypose directory.

10 images seems to be enough. But if you want to put more images, you need to change calibration_image_count = 100 in the configuration file.

Next, you need to create a class that reads the calibration images.

Refer to library/calibration_data_reader.py.

class ImageDataReader(CalibrationDataReader):
    """
    A class that reads image data for calibration.
    """

    def __init__(
        self,
        image_folder,
        input_size: int,
        process_num=100,
        model_input_name="input",
        preprocess_image_astype="int32",
    ):
        self.image_folder = image_folder
        self.input_size = input_size
        self.model_input_name = model_input_name
        self.process_count = 0
        self.process_num = process_num
        self.preprocess_image_astype = preprocess_image_astype

        # Files in the image_folder to be enumerated
        images = os.listdir(image_folder)
        self.enumerate_images = iter(images)

        # Count the number of images
        self.image_count = len(images)
        print(f"Found {self.image_count} images in {image_folder}")

    def get_next(self):
        """
        generate the input data dict for ONNXinferenceSession run
        """
        # Limit the number of images to be processed
        if self.process_count >= self.process_num:
            return None

        image_file = next(self.enumerate_images, None)
        if image_file is None:
            return None

        # Read image and preprocess
        image = cv2.imread(os.path.join(self.image_folder, image_file))
        image_data = Inference.preprocess(
            image, self.input_size, self.preprocess_image_astype
        )

        # Print progress
        self.process_count += 1
        if self.process_count % 100 == 0:
            print(f"Processed {self.process_count} images")

        return {self.model_input_name: image_data}

This looks like a lot of work, but it's actually quite simple.

The point is to read the image and preprocess it in the get_next() method.

You can also refer to test_calibration_data_reader.py to see how it works.

After creating the calibration data reader, you need to set the calibration_data_reader in quantize.py.

calibration_data_reader = ImageDataReader(image_dir, input_size, image_count)

Now, you can run python quantize.py and python main.py to check the accuracy.

Better accuracy has been achieved, right?

Cross Layer Equalization

To improve the accuracy of further, you can use Cross Layer Equalization (CLE).

It is very effective for CNN models like MoveNet and easy to use, so I recommend using it.

Just add the following options to vaip_q_onnx.quantize_static().

Please refer to quantize.py for the complete code.

# Enable CLE for better accuracy
# https://ryzenai.docs.amd.com/en/1.1/vai_quant/vai_q_onnx.html#quantizing-using-cross-layer-equalization
include_cle=True,
extra_options={
    "ActivationSymmetric": True,
    "ReplaceClip6Relu": True,
    "CLESteps": 1,
    "CLEScaleAppendBias": True,
},

Done!

Now, run python quantize.py and python main.py again.

You will see the accuracy is improved.

Inference time comparison

You can set enable_npu to True or False in config/default.ini to compare the inference times between the NPU and CPU.

enable_npu = False:

42.9ms

enable_npu = True:

16.0ms

Future outlook

Software Side

Run the pre/post-processing operations on the NPU

https://ryzenai.docs.amd.com/en/1.1/onnx_e2e.html

The pre/post-processing operations are currently running on the CPU, but they can be run on the NPU.

This will improve the performance of the app.

This time, I didn't do it because time was limited.

Ryzen AI Software

Some op types are not quantized by Ryzen AI Software yet, so quantized models may not be fast enough.

When some op types are running on the CPU instead of the NPU, the performance seems to degrade because of the memory transfer between the CPU and the NPU.

In the future, I would like to see more op types quantized.

Additionally, the Ryzen AI Software only supports Int8 quantization at the moment.

I hope that Float quantization will be supported soon, which will improve the accuracy of the some types of quantized models.

NOTE1: In this project, I'm using Ryzen AI Software 1.1, but 1.2 has been released recently.

NOTE2: Ryzen AI Software highly depends on ONNX Runtime, so some of the issues may be related to ONNX Runtime.

Hardware Side

My Ryzen 9 Pro 7940HS is a great processor, but it is not the best for AI workloads.

I want to test the Ryzen AI 9 HX 375 in the near future, which is expected to have a significant performance improvement.

Other notes

Why didn't I use amd/movenet from model zoo?

amd/movenet

There is a MoveNet model in the RyzenAI Pre-Optimized Model Zoo, and it is easy to use.

However, the MoveNet model from the RyzenAI Pre-Optimized Model Zoo is made from an unofficial MoveNet implementation, which is not as accurate as the official models from Google.

I want to quantize the model myself :)

So, I decided to use the official model and quantize it myself.

NOTE: The unofficial MoveNet model was developed before the official one was released by Google, so it is not a bad model at all.

Why didn't I use YOLOv8 Pose?

https://docs.ultralytics.com/tasks/pose/

YOLOv8 Pose is a great model.

But, when converting it to ONNX format, I encountered some issues.

Additionally, searching information about YOLO is not easy, because there are lots of AI generated comments in the GitHub issues and discussions. Sadly, I couldn't find the appropriate information to solve the issues.

So, I gave up using YOLOv8 Pose.

Energy efficient pose estimation using AMD Ryzen AI

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Features

How to install & use

Installation

Usage

How do I develop?

Step-0

Step-1

Step-2

Step-3.1 Try quantization

Step-3.2 Fix the error

Step-3.3 Improve the accuracy

Inference time comparison

Future outlook

Software Side

Run the pre/post-processing operations on the NPU

Hardware Side

Other notes

Why didn't I use amd/movenet from model zoo?

Why didn't I use YOLOv8 Pose?

Code

ryomo/stretchcam-ryzenai | GitHub

Credits

ryomo

Comments

Embed the widget on your own site

Energy efficient pose estimation using AMD Ryzen AI

Energy efficient pose estimation using AMD Ryzen AI

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Features

How to install & use

Installation

Usage

How do I develop?

Step-0

Step-1

Step-2

Step-3.1 Try quantization

Step-3.2 Fix the error

Step-3.3 Improve the accuracy

Inference time comparison

Future outlook

Software Side

Run the pre/post-processing operations on the NPU

Hardware Side

Other notes

Why didn't I use amd/movenet from model zoo?

Why didn't I use YOLOv8 Pose?

Code

ryomo/stretchcam-ryzenai | GitHub

Credits

ryomo

Comments

Related channels and tags