Project VisionGest: Gesture Recognition Software for AMD Platforms
AbstractProject VisionGest is a multi-purpose gesture recognition software project designed to run on the (VART) Vitis AI runtime, utilizing (NPU) DPU cores for efficient AI inference in new AMD Ryzen processors. The primary aim is to control various tasks on AMD PCs, such as presentations and other visual instructions. Additionally, we are exploring the potential of this technology for Vision-based Continuous Sign Language Recognition (CSLR), with a long-term goal of facilitating communication for the deaf and speech-impaired community through a real-time bidirectional sign language translation tool to access offline.
1. Introduction1.1 Background
Gesture recognition technology has a wide range of applications, from controlling slide shows to providing visual instructions to computers. Additionally, the World Health Organization (WHO) reports that over 400 million people globally are affected by deafness and speech impairments. Effective communication for these individuals remains a challenge, necessitating innovative solutions.
1.2 Objective
The primary objective of Project VisionGest is to develop gesture recognition software that can be used for various tasks on AMD platforms. An additional goal is to explore the feasibility of using this technology for real-time bidirectional sign language translation, aiding communication for deaf and speech-impaired individuals to access offline with their family, friends, and in work places.
2. Methodology2.1 Setting up the PC:Hardware information
CPU AMD Ryzen 9 7940HS Processor
GPU AMD Radeon 780M
System Memory 16GB x 2
Storage 512GB
Extra Trick to Set Up and Use the PC Without a Dedicated Keyboard, Mouse, or Monitor
(Skip this section and go to section 2.2 Setting up Models to Run on VART)
If your primary device is a laptop, and you need to work on another powerful PC through your laptop, you can follow my approach. I have several PCs that I use for training models and development work. Instead of using dedicated peripherals for each PC, I connect to all my PCs from my laptop and use them as tabs in the local network.
Steps:
1] Connect your laptop to the host PC using an Ethernet cable.
2] Open the Network Connections menu via the Windows Search box.
3] Right-click the active Internet connection and select Properties.
4] In the Sharing tab, enable "Allow other network users to connect" and choose the relevant Ethernet port.
5] Now you'll be able to see your AMD PC got the internet access through Ethernet cable, now get the IP address from AMD PC using CMD or graphically as below
6] Now open the build-in Remote Desktop Software on your laptop
7] Here you can add new PCs and edit their properties; Use the IP address of your AMD machine as the PC name and give a Display name like "AMD AI PC from Local Ethernet"
8] Now go to the home page of the Microsoft Remote Desktop application
Click on your AMD PC and enter your Microsoft account password (not your AMD PC pin) when prompted.
You're all set! Now you can log in to multiple working machines without needing extra keyboards, mice, or displays!!!
How to Check if the NPU is Enabled on Your PC
1. Open Device Manager:
- Press `Win + X` and select "Device Manager" from the menu, or search for "Device Manager" in the Windows Search box and open it.
2. Navigate to System Devices:
- In the Device Manager window, scroll down and expand the "System Devices" category.
3. Check for AMD IPU Devices:
- Look through the list of devices under "System Devices" for any entries labeled "AMD IPU Devices." This indicates that the IPU or the NPU (Neural Processing Unit) is enabled on your PC.
Enabling the NPU if AMD IPU Devices Do Not Appear
If "AMD IPU Devices" do not appear in Device Manager, you may need to enable the device in the BIOS. Follow these steps to enable the NPU:
1. Access Advanced Startup:
- In the Windows Search bar, type "Advanced Startup" and open the System Settings menu. Under "Recovery options," in the Advanced startup section, click "Restart Now."
- Alternatively, click the Windows Start button, select "Settings," then "Recovery."
2. Enter Advanced Startup:
- When the PC restarts, you will enter the Advanced Startup page. From there, select "Troubleshoot."
3. Select Advanced Options:
- In the Troubleshoot menu, select "Advanced options."
4. Access UEFI Firmware Settings:
- In Advanced options, select "UEFI Firmware Settings." Your machine will restart and enter the BIOS menu. (Here, the PC used is a MINIS FORUM.)
5. Navigate to CPU Configuration:
- In the BIOS menu, under the “Advanced” tab, select "CPU Configuration."
6. Enable IPU Control:
- In CPU Configuration, locate the IPU Control setting and change it from “Disabled” to “Enabled.”
7. Verify in Device Manager:
- Restart your PC and check Device Manager again under System Devices to ensure that "AMD IPU Devices" now appear.
2.2 Setting up models to Run on VARTMain Components of the Ryzen AI Software
The Ryzen AI software consists of the following key components:
1. Vitis AI Quantizer:
- This component is responsible for optimizing AI models by reducing their precision. This process, known as quantization, helps to accelerate inference and reduce memory usage while maintaining model accuracy.
2. Vitis AI Execution Provider:
- This component facilitates efficient execution of AI models on AMD hardware. It ensures that AI tasks are seamlessly integrated with the underlying hardware capabilities, optimizing performance and resource utilization.
Functionality
This software tool allows TensorFlow and PyTorch models to run on AMD's XDNA architecture after converting them to the ONNX format.
Model Conversion and Deployment:
- Models are quantized and converted into the ONNX format for deployment.
- Using the Vitis AI ONNX execution provider, the neural network is automatically partitioned into subgraphs.
- Subgraphs containing operators supported by the IPU (referred to as the NPU) are executed on the IPU.
- The remaining subgraphs are executed on the CPU.
links for Extra resources:
- Users can use pre-trained models from "huggingfac"
- Getting Started with Ryzen AI Software
- AI Developer Contest PC AI Study Guide
Steps to Set Up RyzenAI
1. Install IPU Drivers
- Download the NPU Driver:
1. Download the file from [RyzenAI Installation Guide].
2. File Name: `ipu_stack_rel_silicon_prod_1.1.zip`.
3. Extract the ZIP file.
- Install the Driver:
1. Open the Command Prompt in Administrator mode.
2. Navigate to the extracted folder.
3. Run the installation script: `.\amd_install_kipudrv.bat`.
PS C:\Users\root\Desktop\AMD Project\Installations\ipu_stack_rel_silicon_prod_1.1> .\amd_install_kipudrv.bat
C:\Windows\System32\pnputil.exe
IPU driver is already installed.
Do you want to uninstall the available driver and continue with this installation? [Y/n]: Y
Uninstalling IPU driver by uninstalling "oem4.inf"
oem4.inf is a kipudrv.inf which will be deleted
Microsoft PnP Utility
Driver package uninstalled.
Driver package deleted successfully.
oem50.inf is a kipudrv.inf which will be deleted
Microsoft PnP Utility
Driver package uninstalled.
Driver package deleted successfully.
Microsoft PnP Utility
Scanning for device hardware changes.
Scan complete.
Microsoft PnP Utility
Processing inf : kipudrv.inf
Successfully installed the driver.
Driver package added successfully.
Published name : oem4.inf
Total attempted: 1
Number successfully imported: 1
SUCCESS: Specified value was saved.
kipudrv.inf install successful
PS C:\Users\root\Desktop\AMD Project\Installations\ipu_stack_rel_silicon_prod_1.1>
2. Install Required Software
1] VS Code 2019:
- Download from [ the place I downloaded - Visual Studio 2019].
- No login required.
2] CMake:
- Download from [CMake Downloads]
3] Miniconda Quick Command Line Installation:
- For Windows 11, run the following commands:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe
start /wait "" miniconda.exe /S
del miniconda.exe
Documentation: https://docs.anaconda.com/free/miniconda/#quick-command-line-install
3. Install RyzenAI Software
- Run the `install.bat` script which performs the following actions:
1. Creates a Conda environment.
2. Installs the Vitis AI Quantizer for ONNX.
3. Installs the ONNX Runtime.
4. Installs the Vitis AI Execution Provider.
5. Configures the environment for the NPU throughput profile.
6. Prints the name of the Conda environment before exiting.
- Test the Installation:
Restart the terminal and activate your conda environment (my one is "npu"). Then run the test script located in `ryzen-ai-sw-1.1\quicktest`.
You must see the cmd output below;
Transfer Learning with YOLOv8
As the first step, I tested the transfer learning approach with YOLOv8 models for gesture recognition. Initially, I ran the standard YOLOv8 model before applying transfer learning to adapt the model for hand gestures. Here are the steps I followed:
Setup Dependencies:
- Git
- OpenCV (version 4.6.0)
- glog
- gflags
- CMake (version >= 3.26)
- Python (version >= 3.9, recommended: 3.9.13 64-bit)
- IPU driver & IPU xclbin release >= 20230726
- VOE package >= jenkins-nightly-build-id==205
Git Installation is pretty much simple on the internet.
OpenCV Installation:
Build from Source:
Start a Git Bash and clone the repository:
git clone https://github.com/opencv/opencv.git -b 4.6.0
Switch to the Conda Prompt and compile the source code with CMake:
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -G "Visual Studio 16 2019" -DCMAKE_INSTALL_PREFIX="C:\Program Files\opencv" -DCMAKE_PREFIX_PATH="./opencv" -DCMAKE_BUILD_TYPE=Release -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_WITH_STATIC_CRT=OFF -B build -S opencv
cmake --build build --config Release
cmake --install build --config Release
Install gflags:
Build from Source:
Clone the repository:
git clone https://github.com/gflags/gflags.git
Compile the source code with CMake:
cd gflags
mkdir mybuild
cd mybuild
cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -G "Visual Studio 16 2019" -DCMAKE_INSTALL_PREFIX="C:\Program Files\gflag" -B build -S ../
cmake --build build --config Release
cmake --install build --config Release
cd ../..
Install glog:
Clone the repository:
git clone https://github.com/google/glog.git
Compile the source code with CMake:
cd glog
mkdir mybuild
cd mybuild
cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -G "Visual Studio 16 2019" -DCMAKE_INSTALL_PREFIX="C:\Program Files\glog" -B build -S ../
cmake --build build --config Release
cmake --install build --config Release
cd ../..
Install the Ryzen AI Laptop Dependencies:
- Ensure all dependencies are installed on the Ryzen AI laptop. This includes Git, OpenCV, glog, gflags, CMake, Python, IPU drivers, and the VOE package.
Testing the Yolov8 Demo
As the 1st step, I used the RyzenAI-SW git repo, clone it setup and navigate to turorial\yolov8_e2e directory.
Use 'dir' and check the content inside the folder, you'll see there's a run_jpeg.bat file with a .onnx quantized model, use sample_yolov8.jpg
to test the model as below;
This is the result I got after running the model;
Now you'll see a result.jpg
file created as below;
This is my previous sample_yolov8.jpg
vs result.jpg
comparison;
Then I checked it with the camera feed as below;
To run with a live camera, the user needs to change the display and camera settings manually as below.
- Go to
Display settings
, and change Scale to 100% in theScale & layout
section. - Go to
Bluetooth & devices
->Cameras
->USB2.0 FHD UVC WebCam
, and turn off the Background effects in theWindows Studio Effects
section.
camera_nx.bat
After you run the .bat
file you'll see the below output;
Here's what I got when I did the testing at my University Digital Electronics Laboratory (UoM)
When you need to stop the model click ctrl+c and confirm the termination of the job by entering 'y' ;
Transfer Learning for Gestures
To implement transfer learning for hand gesture recognition using YOLOv8 models, we adopt a pre-trained YOLOv8 model to recognize specific gestures from a video sign language dataset. Here's a streamlined overview of the approach:
1. Dataset Preparation:
- Gather a diverse video sign language dataset with various gestures, (A few of the best datasets I used during my tests were described in the last section)
2. Pre-processing and Augmentation:
- Frame Extraction: Use tools like OpenCV to extract frames from the video dataset.
- Data Augmentation: Apply techniques like rotation, flipping, and scaling to increase dataset diversity.
3. Model Adaptation:
- Loading Pre-trained Weights: Load YOLOv8 weights trained on a general dataset like COCO.
- Model Modification: Adjust the YOLOv8 output layer to match the number of gesture classes in your dataset.
- Transfer Learning: Fine-tune the model on the hand gesture dataset with a lower learning rate to adapt pre-trained features to the new task.
4. Training:
- Hyperparameter Tuning: Adjust parameters such as learning rate, batch size, and epochs. Use a validation set to prevent overfitting.
- Training Process: Train the model and monitor metrics like loss and accuracy.
5. Quantization:
- Model Conversion: Convert the trained model to the ONNX format.
- Quantization Techniques: Apply quantization to optimize the model for AMD platforms using Vitis AI. Use quantization-aware training (QAT) if necessary.
6. Validation and Testing:
- Inference Testing: Validate the quantized model on a test set of gestures using both single image and live video feeds.
- Performance Metrics: Evaluate precision, recall, and F1-score to ensure accuracy and efficiency.
Quantizing Custom PyTorch CSLR Models
To quantize our own PyTorch-based Continuous Sign Language Recognition (CSLR) model which I show in the last section, we should follow these steps:
1. Prepare the Model: Ensure the model is fully trained and validated.
2. Convert to ONNX: Export the PyTorch model to ONNX format.
3. Apply Quantization: Use tools like ONNX Runtime or Vitis AI to perform quantization-aware training (QAT) or post-training quantization (PTQ).
4. Optimize and Validate: Test the quantized model for accuracy and efficiency, ensuring it meets performance requirements on the target AMD platform.
2.4 Technology StackUnderstanding how NPUs (Neural Processing Units) and IPUs (Intelligence Processing Units) operate is crucial for developing effective algorithms. Below is an overview of the technology stack used in this project.
- Vitis AI Library: A set of high-level libraries and APIs built for efficient AI inference with DPU cores.
- Vitis AI Runtime (VART): Provides unified APIs and easy-to-use interfaces for AI model deployment on AMD platforms.
The implementation of ML frameworks using IPU programs diagram;
Graphcore Intelligence Processing Unit (IPU)
The Graphcore Intelligence Processing Unit (IPU) is a cutting-edge hardware solution specifically designed for AI and machine learning workloads. The IPU architecture is optimized to handle the complex computations involved in training and inference for neural networks, providing significant performance improvements over traditional CPUs and GPUs. Key features of the IPU include Parallel Processing: The IPU architecture supports massive parallelism, allowing it to execute thousands of operations simultaneously, which is crucial for AI tasks.
- Efficient Memory Management: The IPU is designed to optimize memory usage, reducing latency and improving the speed of AI computations.
- Flexible Programming Model: The IPU supports a flexible programming model that enables developers to efficiently map their AI algorithms onto the hardware.
Graph Theory based representation of variables and processing
The way a program splitting into parallel sub-programs
For more detailed information on the IPU and its capabilities, you can refer to the Graphcore IPU Programmer’s Guide. The above IPU block diagram images are from this source documentation.
2.5 Ongoing Research and ImplementationsThe below Demo video is about one of my previous projects; HyperTalk which is one of the main contributing factors for the project VisionGest;
This previous project, HyperTalk Mobile App and Website needed a powerful backend server to run the model which can't done offline. This existing mobile app and website provide real-time bidirectional sign language translation to facilitate communication for individuals with hearing and speech impairments. The app's primary features include:
1. Sign Language Camera Feed to Voice: Translates sign language gestures captured via a camera into spoken language.
2. Voice Feed to Sign Language Animations: Converts spoken language into sign language animations for visual interpretation.
Enhancements and Research Focus
My primary research goal is to enhance these features, particularly focusing on continuous word-level sign language recognition. This improvement aims for more accurate and faster translations across various sign languages, using the following datasets:
1] Phoenix 2014 Dataset (German Sign Language Videos)
https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX/
2] OpenASL Dataset (American Sign Language Videos)
https://paperswithcode.com/dataset/openasl
3]CSL Dataset (Chinese Sign Language Videos)
https://ustc-slr.github.io/datasets/2021_csl_daily/
4] BOBSL Dataset (British Sign Language Videos)
https://paperswithcode.com/dataset/bobsl
3 ConclusionMy primary aim is to enable gesture recognition for various tasks on AMD PCs, such as controlling presentations and other visual instructions. We are also exploring the potential of this technology for Vision-based Continuous Sign Language Recognition (CSLR) with the ultimate goal of facilitating communication for the deaf and speech-impaired community. This involves developing a real-time bidirectional sign language translation tool that can be accessed offline. Our vision is to create a more accurate, practical product that positively impacts the lives of those relying on sign language for communication.
Comments