AMD’s Ryzen AI family of laptop processors now integrate with Neural Processing Unit (NPU), this frees up CPU and GPU to do another tasks and resulting in power efficiency. This possible based on Ryzen AI tech built on XDNA architecture, purpose-built to run AI workload locally. By using this advantage, I will simulate a traffic analysis tool that requires real-time speed in an object detection where coordinates and time stamp data of the bounding boxes as an input for this Python program that will make it as vehicle speed detection, counter and direction and will be useful in collecting data for traffic arrangements or enforcement. With this efficiency and real time result obtained in its local operations on an AMD’s laptop/ minipc, it is very flexible and cost efficient to apply this project to real world applications.
To make the process easier, we will use Huggingface YOLOv8 pre-trained model which has been optimized for the AMD’s IPU/NPU.
Steps:1. Prepare your system, install pre-requisites and dependenciesRecently (in May 2023) AMD has launched its products with a dedicated AI engine that complements Windows x86 processors. This special AI chip will be explored in this object detection project. We use Minisforum UM790 Pro with AMD Ryzen 9 7940HS, which must be enabled in the IPU/NPU settings. To check whether your AMD Ryzen AI laptop/mini PC is enabled or disabled, follow these instructions: From Windows Search, enter ”Device Manager”, expand “System Device” and look for “AMD IPU Device”. If it doesn’t appear in the list, you’ll need to enable it with reboot from Recovery setting.
In Windows Search, enter “Advanced Startup” > “Recovery Options”> clik “Restart Now”. After PC reboots, select “Troubleshoot” > “ Advanced options” > “UEFI Firmware Settings” then Restart > “Advanced” > “CPU Configuration” > In IPU Control choose “Enabled” > “Save & Exit”. After reboot, download the NPU driver from this link NPU Driver and then extract the downloaded zip file. Open command prompt in admin mode and execute the bat file
.\amd_install_kipudrv.bat
Then Ensure that the NPU driver is installed from Device Manager -> System Devices -> AMD IPU Device as shown in the following image.
The next step is to make sure we have the dependencies to install Ryzen AI SW: Visual Studio 2019, CMake >=3.26, Python >=3.9, latest Anaconda/Miniconda.
Now, ensure that all the pre-requisites outlined previously have been met and that the Windows PATH variable is properly set for each component. For example, Anaconda/Miniconda requires following paths to be set in the PATH variable
path\to\anaconda3\
path\to\anaconda3\Scripts\
path\to\anaconda3\Lib\bin\
2. Install Ryzen AI SoftwareNow download the Ryzen AI SW package and extract it.
Then, open command prompt in admin mode, navigate to the extracted folder and install Ryzen AI SW with:
.\install.bat -env <env name>
which will install conda environment, install Vitis AI Quantizer ONNX, ONNX Runtime, Vitis AI EP. Now we activate the Conda environment:
conda activate <env name>
Then run the test with:
cd ryzen-ai-sw-1.1\quicktest
python quicktest.py
This test will be carried out with a simple CNN model. If successfully run, the output will be as below. This indicates the model is running on the NPU and the Ryzen AI SW installation was successful.
[Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50%
[Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1
...
Test Passed
...
After everything installed, we can configure IPU/NPU execution profiles before running the program from a new environment. Follow this Runtime setup: https://ryzenai.docs.amd.com/en/latest/runtime_setup.html till you get the vaip_config.json. It is recommended to create a copy of the vaip_config.json. file in your project directory and point to this copy when initializing the inference session.
3. Build the program based on YOLO v8 object detectionFor the next steps, we will use YOLO object detection (car) which is commonly used and in Huggingface's model zoo there is a YOLOv8 model which has been quantized and optimized for AMD Ryzen AI, so we don't need to create and optimize our own ONNX model; we will download it (in Step 4). The next stage is to create a Python program that will detect cars from video on a 2-way highway, using non-optimize for AMD's IPU model (Ultralytic's YOLOv8) to ensure the program works as desired.
This Python code a combination of OpenCV, YOLOv8 model from Ultralytics, and a custom tracker to perform object detection and tracking on a video. The primary goal is to detect and count the number of cars passing certain lines in both directions.
To build this program, we need to install dependencies: Python >=3.9 (done), opencv-python for webcam/video capture, then Ultralytics library for YOLOv8.
Open command prompt then run the following commands (1 by 1) to create conda environment and install the required library
conda create -n YOLOv8Env
conda activate YOLOv8Env
pip install opencv-python
pip install ultralytics==8.0.0
Now open VS Code build your custom program or check my Python code carcounter.py and tracker.py (in Code section), download then run it from command prompt:
python carcounter.py
To install the Ryzen AI environment to use this library, make sure you have completed stages 1 & 2. And to install optimum-amd, we run:
git clone https://github.com/huggingface/optimum-amd.git
cd optimum-amd
pip install -e .[ryzenai]
pip install Pillow
For reference on using Optimum-amd pipeline for Ryzen AI’s Yolo, you can check this Huggingface link
Now we will try to see if all the pre-requisites are complete and we will run this Python code:
import platform
import sys
import cpuinfo
from pprint import pprint
print("Python version:", sys.version)
print("Platform:", platform.platform())
print("Processor Architecture:", platform.architecture())
print("Machine:", platform.machine())
print("System:", platform.system())
cpu_info = cpuinfo.get_cpu_info()
print("Processor:", cpu_info["brand_raw"])
Results:
Python version: 3.9.20 (main, Jul 23 2024, 18:19:13) [MSC v.1916 64 bit (AMD64)]
Platform: Windows-10-10.0.22634-SP0
Processor Architecture: ('64bit', 'WindowsPE')
Machine: AMD64
System: Windows
Processor: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
If successful, we can download yolov8m.onnx from: https://huggingface.co/amd/yolov8m/tree/main and download image of cars in the road, then name it vehicles.png in the same directory.
Then run the following Python code:
from optimum.amd.ryzenai import pipeline
from optimum.amd.ryzenai.utils import plot_bbox
model_id = "amd/yolov8m"
detector = pipeline("object-detection", model=model_id, model_type="yolov8")
import requests
from PIL import Image
# Load an image
image = Image.open("vehicles.png")
outputs = detector(image)
pprint(outputs)
plot_bbox(image.copy(), outputs)
After that we will get our image complete with bounding boxes that detect cars like the example below:
For more details, you can check this link as a reference from the Huggingface team.
5. Modify the program with YOLOv8m from Ryzen AI Model ZooBefore creating this program, make sure you have a sample video (.mp4) for example: traffic settings in road intersection or 2 ways highway (try to search for free download - traffic for object detection). At this stage we will modify the Python program that we have created in Step 3, with replacing the library and making several adjustments from Ultralytics to the Huggingface optimum-amd pipeline that we have learned in Step 4. We also added Speed detection by utilizing the location of the bounding box (coordinates) and time-stamp.
For details, see file: speedncount_amd.py in the Code Section.
The following is a video demo that shows our program running by detecting the number of cars in and out by knowing the direction, and also the speed of each car.
Conclusion:In this project we have configured the necessary settings and tested the capabilities of AMD chips that have integrated IPU/NPU (on Ryzen AI PCs/laptops) which can reduce CPU workload. In this project, we successfully detected car speed, direction, and count using Ultralytics YOLOv8. This all utilizes Ultralytics YOLOv8 which in the next stage we modify with the Ryzen AI version of YOLOv8 and libraries from Huggingface Optimum-amd. By using this method of dividing the workload on the IPU & CPU/GPU, in the future energy-efficient and low-latency solutions can be applied to more use-case.
Comments