Published October 10, 2024 © CC BY-SA

AI Object Detection & Image Classification on AMD Ryzen NPU

This project demonstrates how to perform AI Object Detection & Image Classification on AMD Ryzen NPU locally

IntermediateFull instructions provided4 hours151

Things used in this project

Hardware components

UM 790 Pro Mini PC with AMD Ryzen 9 7940 Processor

Generic HDMI 2.1 Monitor as Output Device

Generic Mouse and Keyboard as Input Device

Software apps and online services

Jupyter Notebook

Story

Introduction

In this project I am going to demonstrate how AMD's new NPU hardware can run AI local workstations without any cloud connectivity.

The AMD Ryzen™ 7040 series processors utilize a leading-edge 4nm process node to deliver impressive performance and extended battery life in thin and light laptops. Some models feature the new Ryzen AI Engine, the first dedicated AI engine on a Windows x86 processor. Ryzen AI technology is designed to offer AI-driven capabilities that are unprecedented in real-time applications directly on laptops. One standout feature of Ryzen AI is its ability to handle up to eight concurrent spatially isolated AI workloads, allowing users to run multiple hardware-accelerated AI applications simultaneously without affecting each other’s performance.

High-Level Specifications: Operating at 35-54W, achieving 10 TOPs
Platform: Designed for commercial and consumer-grade laptops
Applications: Enhancing computer vision effects, productivity tools, and multimedia experiences
Model Support: Suitable for vision models and large language models

Hardware

The hardware I received for this project is Venus UM790 Pro mini-PC which features AMD Ryzen™ 9 7940HS Processor. This processor has a dedicated NPU/IPU (Neural Processing Unit) which can run (Ryzen) AI workload locally without needing to only use CPU or GPU hardware.

The biggest benefits of NPU are: -

It reduces CPU usage for AI workload
Reduce system power consumptions
Enables multiple AI workloads to run without causing performance issues
Allows new AI applications running locally to ensure privacy

Ryzen AI powered by AMD NPU hardware build into CPU

Here is my UM790 Pro setup. Mini PC turns into a desktop when it is connected to I/O devices like monitors, keyboard, mouse etc.

UM 790 Pro Mini PC

Specification of UM790 Pro mini-PC:

1 / 2 • Specification Sheet (Source: Minisforum)

Under the hood of UM790 Pro

I was curious to peek inside the UM790 Pro enclosure. Here is how it looks like after removing the bottom cover, we can access the RAM, SSD and Network card for replacement or upgrade. Sadly thought the

UM790 Hardware

Hardware Setup

Before getting started, connect following I/O devices to the mini PC

I/O Interface on the UM790 mini PC

120 W Power Adapter Barrel Jack connector
USB Mouse + Keyboard as input
HDMI/USB Monitor as output
Log into windows 11 and enable internet with Wi-Fi/Ethernet

What is NPU exactly?

Before we can understand the context of NPU, let's talk about evolution AI and relevant technologies a bit !

How AI is shaping the Future

Artificial Intelligence (AI) is reshaping our computing landscape much like the World Wide Web did when it was introduced in 1989. Similar to the web, AI will continue to evolve, and today’s applications are merely the precursor to an entirely new era of computing.

Imagine a future where every consumer laptop or desktop comes equipped with a specialized AI coprocessor designed specifically for efficient machine learning tasks. From enhancing image recognition and natural language processing to revolutionizing virtual assistants and personalized recommendations, the impact on our daily computing experiences will be profound.

Assistive AI:CreativityTransformation

For professionals engaged in content creation, graphic design, presentations, or personal projects, picture this scenario: as you sketch or outline your ideas in your preferred software, an AI system activates. It analyzes your work in real time, offering suggestions on techniques and artistic styles that align with your creative direction. When experimenting with colors, the AI provides complementary color palette suggestions based on the mood and theme of your work.

Struggling with a specific element, such as drawing a realistic animal or selecting the right arrangement? The AI generates a selection of reference images or designs to assist you. As you work, dynamic filters and effects are applied to your artwork, allowing you to preview different visual outcomes instantly and make informed creative decisions. Over time, your personal AI learns from your artistic choices, becoming attuned to your unique style and providing increasingly relevant suggestions.

What Is an NPU?

As the focus on machine learning workloads intensifies, hardware designers are prioritizing enhancements to accommodate these tasks. Modern CPUs and general-purpose GPUs already integrate features tailored for machine learning, such as hardware support for reduced precision number formats. However, there remains a fundamental limitation: CPUs and GPUs must also handle diverse workloads beyond machine learning.

To overcome this limitation and achieve greater efficiency in machine learning, a specialized architecture designed exclusively for these tasks is emerging.

Enter the Neural Processing Unit (NPU). This new class of processing unit excels in machine learning inference tasks. An NPU significantly enhances the speed and efficiency of machine learning computations on your computer. NPUs are sometimes referred to as Inference Processing Units (IPUs). In certain Ryzen AI documentation, you might encounter the term “IPU” instead of “NPU.” Once you’ve installed the Ryzen AI driver on your computer, you’ll notice a device named IPU in the Windows Device Manager

Ryzen AI NPU: Empowering AI Workloads

In 2023, AMD introduced the Ryzen 7000 desktop and laptop chips. Alongside the primary x86 CPU, the Ryzen 7000 series features a novel coprocessor: the Neural Processing Unit (NPU), built on the XDNA™ AI Engine architecture. This cutting-edge NPU is aptly named Ryzen AI.

Unlike the main x86 Zen 4 CPU, which follows a Von Neumann architecture, Ryzen AI employs a specialized dataflow architecture. This design optimizes performance and responsiveness for AI applications while operating at lower power levels.

NPU as a dedicated piece of Hardware on silicon

When you execute AI workloads locally on an NPU, you benefit from:

AI-Enhanced Video Conferencing: Exclusive features available to systems equipped with an NPU enhance video conferencing experiences.
Higher AI Workload Performance: The NPU delivers superior performance for AI tasks.
Reduced Latency: Faster responses due to lower latency are crucial for real-time applications. For instance, video streams won’t stutter, and audio won’t glitch during video calls.
Energy Efficiency: The NPU’s efficiency translates to longer battery life, allowing you to work uninterrupted on a single charge.

By integrating a dedicated dataflow processing unit into your Ryzen 7000 processor, you unlock new and exciting ways to experience machine learning. Your laptop becomes smarter, more capable, and enjoys extended battery life.

Setup & Installations

Enable IPU/NPU

The IPU/NPU is not enabled from UEFI BIOS settings by default.

If you are already logged into the PC's Windows 11 OS

Press windows key and type 'bios'

Go to System > Recovery > (Advance Start Up) click Restart Now

After 1st restart:

choose an option > troubleshoot > advanced options > UEFI Firmware settings, click Restart

UEFI Bios

After 2nd restart:

Setup > Advanced > CPU configuration > IPU control > Enable

Enabling NPU/IPU

Now save and exit and boot into Windows 11

Go check your device manager and you will see your IPU hardware

device manager > system devices > AMD IPU

Install NPU Driver

The Ryzen AI Software supports the following processors running Windows 11.

AMD Ryzen™ 7940HS, 7840HS, 7640HS, 7840U, 7640U.
AMD Ryzen™ 7940HS, 7840HS, 7640HS, 7840U, 7640U.
AMD Ryzen™ 8640U, 8640HS, 8645H, 8840U, 8840HS, 8845H, 8945H.
AMD Ryzen™ 8640U, 8640HS, 8645H, 8840U, 8840HS, 8845H, 8945H.

Therefore, Ryzen AI software on NPU will only work with Laptops/Desktops which have any of these processors only ( only these CPU got NPU/IPU hardware )

Go to the link below and follow the Installation Instructions from AMD Installation Instructions — Ryzen AI Software 1.1 documentation (amd.com)

Download the NPU Driverand install it by following these steps:

Extract the downloaded zip file.

Extract the downloaded zip file.

Open a terminal in administrator mode and execute the .\amd_install_kipudrv.bat bat file.

Open a terminal in administrator mode and execute the .\amd_install_kipudrv.bat bat file.

Ensure that the NPU driver is installed from Device Manager -> System Devices -> AMD IPU Device as shown in the following image.

NPU/IPU driver version check

You will have to create an AMD developers account to download the NPU drivers

Install Visual Studio 2019

Download and Install Visual Studio 2019

Installing Microsoft Visual Studio 2019

Install Python 3.9

Don't download and Install Python 3.9 directly, instead click on Modify from Visual Studio Installer on, then select Python package 3.9.13, finally install and download

Installing Python 3.9 from Visual Studio

Install Anaconda

Before proceeding with the installation of the Ryzen AI Software, it is essential to verify that all the necessary requirements listed earlier have been fulfilled. Additionally, ensure that the Windows PATH variable includes the correct settings for each component.

For instance, Anaconda necessitates the inclusion of specific paths in the PATH variable:

path\to\anaconda3, path\to\anaconda3\Scripts
path\to\anaconda3\Lib\bin

These adjustments to the PATH variable can be made through the Environment Variables section found within the System Properties window.

setting up PATH from environment variable

Download and Install Anaconda into the right directory.

Download and Install Anaconda

Install Git

Download and Install Git version 2.45 for Windows 11

GIT

Install Ryzen AI Software

Download the ryzen-ai-sw-1.1.zipRyzen AI Software installation package and extract it. You will need to log into your AMD Developer account to download this.

Open an Anaconda or Windows command prompt in the extracted folder and run the installation script as shown below. Make sure to enter “Y” when prompted to accept the EULA.

.\install.bat

If all the prerequisites are installed correctly, then you will see OK OK OK for all the packages needed for Ryzen AI. If something is not installed, reinstall it into the right path. In my case I have installed everything into the following path

Installing Ryzen AI SW from cmd prompt

The install.bat script does the following:

Creates a conda environment
Installs the Vitis AI Quantizer for ONNX
Installs the ONNX Runtime
Installs the Vitis AI Execution Provider
Configures the environment to use the throughput profile of the NPU
Prints the name of the conda environment before exiting

Conda Environment

The default Ryzen AI Software packages are now installed in the conda environment created by the installer. To begin using the Ryzen AI Software, activate the conda environment generated by the installer (its name is displayed during the installation process).

Note: The Ryzen AI Software installation directory (where the zip file was extracted) contains essential files needed during runtime for inference sessions. These files comprise the NPU binaries (*.xclbin) and the default runtime configuration file (vaip_config.json) for the Vitis AI Execution Provider. Therefore, it is crucial not to delete the installation directory and to store it in a convenient location. For detailed instructions on preparing the environment before running an inference session on the NPU,

Refer to the Runtime Setup page for more details about setting up the environment before running an inference session on the NPU.

Install Riallto

To proceed, you'll need a laptop or computer equipped with an AMD Ryzen AI processor. Note that support for the Ryzen NPU and Riallto is currently exclusive to Windows 11.

In the Windows Device Manager, the Ryzen AI NPU appears as an IPU (Inference Processing Unit), which is synonymous with an NPU. 'Full' installation options require the IPU driver version 10.1109.8.128 to be installed. For instructions on installing the Windows driver for the Ryzen AI NPU, refer to the following page:

Install WSL

AIE Build License

You will need to provide your machine's MAC address to get the license :

Getting MAC for obtaining Riallto License

Make sure to install the license from AMD into the correct directory as follows :

Installing the Riallto License onto the correct directory

Download Riallto Installer:

Riallto includes runtime software for loading Ryzen AI applications and a software toolchain for compiling and building applications. Additionally, Riallto provides a set of Jupyter Notebook tutorials.

Download the latest v1.1 Riallto installer and execute it on your Ryzen AI laptop. This installation will set up the Riallto software framework and provide access to Riallto Jupyter notebooks, which you can browse and run on your laptop. Ensure you select the appropriate installation option as prompted; the 'Full' version requires WSL2 and the AIE build license (refer to Prerequisites above for details).

If you do not have a Ryzen AI laptop, you can access the Riallto notebooks as webpages, which constitute the majority of the content on the current webpages you are browsing. Explore the NPU Architecture Features section to gain insights into the NPU, and consult the Building Applications section to learn how to create custom applications for it. The final section demonstrates how to perform Machine Learning with ONNX on Ryzen AI.

Install Riallto

Launch Riallto:

Once Riallto is installed on your laptop, click on the desktop icon to initiate it. This action will start a JupyterLab server instance in your web browser.

Press windows key and type Riallto, then hit enter to launch Riallto framework. It will launch a Jupyter Notebook in web browser where you can throw python codes and run machine learning models on AMD NPU.

Open a Notebook from Riallto as follows, we will run python code chunks in this window:

Launching Riallto

Please note that, Riallto is more of an educational tool for experimenting with AI models on AMD hardware, this is not exactly suitable for making finished AI products.

Step 1: Import Packages

Run the following cell to import all the necessary packages to be able to run the inference in the Ryzen AI NPU.

This code will run on using Anaconda's Jupyter Notebook from launching Riallto. This is basically a webpage that will run python codes, one chunk at a time !

Copy the python code blocks given below and paste into cells and execute the code by clicking the Play Icon:

Riallto

import onnx
import onnxruntime as ort

import enum
import numpy as np
import cv2
import pickle
import os
import glob
import tarfile
import urllib.request
import matplotlib.pyplot as plt
from PIL import Image
from mpl_toolkits.axes_grid1 import ImageGrid

from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sn
import pandas as pd

Step 2: Prepare the Data

A pre-trained ResNet-50 model from PyTorch Hub for the CIFAR-100 dataset will be deployed.

Download the CIFAR-100 dataset

Next. Execute the following cells to download the CIFAR-100 dataset. The dataset is stored in data/cifar-100-batches-py/.

global models_dir, data_dir
models_dir = ".\\onnx"
data_dir= ".\\onnx\\data"

# Download data - One-time only

datadirname = ".\\onnx\\data"
if not os.path.exists(datadirname):
   data_download_tar = "cifar-100-python.tar.gz"
   urllib.request.urlretrieve("https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz", data_download_tar)
   file = tarfile.open(data_download_tar)
   file.extractall(data_dir)
   file.close()

# Delete cifar-100-python.tar.gz source file after all images are extracted
data_images_path = os.path.join(os.getcwd(), "cifar-100-python.tar.gz")
files = glob.glob(data_images_path)
for f in files:
    os.remove(f)

The CIFAR-100 dataset is an extension of the popular CIFAR-10 dataset.

The CIFAR-10 dataset comprises 60, 000 color images, each with dimensions of 32x32 pixels.It consists of 10 distinct classes, with 6, 000 images per class.The training set contains 50, 000 images, while the test set has 10, 000 imagesSpecifically, there are five training batches and one test batch, each containing 10, 000 images.Within the test batch, each class is represented by 1, 000 randomly selected images.

Let’s break down the details for CIFAR 100 dataset:

Dataset Overview:

The CIFAR-100 dataset consists of 100 classes, each containing 600 images.
There are 500 training images and 100 testing images per class.
Images are of size 32x32 pixels and are color (RGB) images.
Dataset Overview:The CIFAR-100 dataset consists of 100 classes, each containing 600 images.There are 500 training images and 100 testing images per class.Images are of size 32x32 pixels and are color (RGB) images.

Class Organization:

The 100 classes are grouped into 20 superclasses.

Each image has two labels:

“Fine” label: The specific class to which it belongs (e.g., “apple, ” “lion, ” “oak”).
“Coarse” label: The superclass to which it belongs (e.g., “fruit and vegetables, ” “large carnivores, ” “trees”).
Each image has two labels:“Fine” label: The specific class to which it belongs (e.g., “apple, ” “lion, ” “oak”).“Coarse” label: The superclass to which it belongs (e.g., “fruit and vegetables, ” “large carnivores, ” “trees”).
Class Organization:The 100 classes are grouped into 20 super classes.Each image has two labels:“Fine” label: The specific class to which it belongs (e.g., “apple, ” “lion, ” “oak”).“Coarse” label: The superclass to which it belongs (e.g., “fruit and vegetables, ” “large carnivores, ” “trees”).

Superclasses and Their Classes:

Here’s the breakdown of the superclasses and their corresponding classes:
Superclasses and Their Classes:Here’s the breakdown of the superclasses and their corresponding classes:

CIFAR 100 Dataset Class and Super Class

Dataset layout

Let’s break down the layout of the CIFAR-100 dataset for the Python:

The archive contains the following files:

data_batch_1, data_batch_2, …, data_batch_5: These files are Python “pickled” objects created using cPickle.
test_batch: The test data batch.
The archive contains the following files:data_batch_1, data_batch_2, …, data_batch_5: These files are Python “pickled” objects created using cPickle.test_batch: The test data batch.

Each batch file is a dictionary with the following elements:

data: A 10000x3072 numpy array of uint8s. Each row represents a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 contain green, and the final 1024 contain blue. Images are stored in row-major order.
labels: A list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the data array.
Each batch file is a dictionary with the following elements:data: A 10000x3072 numpy array of uint8s. Each row represents a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 contain green, and the final 1024 contain blue. Images are stored in row-major order.labels: A list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the data array.

This file is also a Python dictionary objct with the following entries:

label_names: A 10-element list providing meaningful names for the numeric labels. For example, label_names[0] corresponds to “airplane, ” label_names[1] to “automobile, ” and so on.
This file is also a Python dictionary object with the following entries:label_names: A 10-element list providing meaningful names for the numeric labels. For example, label_names[0] corresponds to “airplane, ” label_names[1] to “automobile, ” and so on.
The binary version includes files like data_batch_1.bin, data_batch_2.bin, …, data_batch_5.bin, and test_batch.bin.

Each file is formatted as follows:

<1 x label><3072 x pixel>: The first byte represents the label (a number in the range 0-9), followed by 3072 bytes representing pixel values.
The first 10 bytes correspond to the label and pixel values of the first image.
Each file is formatted as follows:<1 x label><3072 x pixel>: The first byte represents the label (a number in the range 0-9), followed by 3072 bytes representing pixel values.The first 10 bytes correspond to the label and pixel values of the first image.

The CIFAR100 classes are enumerated in the Cifar100Classes class below:

class Cifar100Classes(enum.Enum):

    apple = 0
    aquarium_fish = 1
    baby = 2
    bear = 3
    beaver = 4
    bed = 5
    bee = 6
    beetle = 7
    bicycle = 8
    bottle = 9
    bowl = 10
    boy = 11
    bridge = 12
    bus = 13
    butterfly = 14
    camel = 15
    can = 16
    castle = 17
    caterpillar = 18
    cattle = 19
    chair = 20
    chimpanzee = 21
    clock = 22
    cloud = 23
    cockroach = 24
    couch = 25
    crab = 26
    crocodile = 27
    cup = 28
    dinosaur = 29
    dolphin = 30
    elephant = 31
    flatfish = 32
    forest = 33
    fox = 34
    girl = 35
    hamster = 36
    house = 37
    kangaroo = 38
    keyboard = 39
    lamp = 40
    lawn_mower = 41
    leopard = 42
    lion = 43
    lizard = 44
    lobster = 45
    man = 46
    maple_tree = 47
    motorcycle = 48
    mountain = 49
    mouse = 50
    mushroom = 51
    oak_tree = 52
    orange = 53
    orchid = 54
    otter = 55
    palm_tree = 56
    pear = 57
    pickup_truck = 58
    pine_tree = 59
    plain = 60
    plate = 61
    poppy = 62
    porcupine = 63
    possum = 64
    rabbit = 65
    raccoon = 66
    ray = 67
    road = 68
    rocket = 69
    rose = 70
    sea = 71
    seal = 72
    shark = 73
    shrew = 74
    skunk = 75
    skyscraper = 76
    snail = 77
    snake = 78
    spider = 79
    squirrel = 80
    streetcar = 81
    sunflower = 82
    sweet_pepper = 83
    table = 84
    tank = 85
    telephone = 86
    television = 87
    tiger = 88
    tractor = 89
    train = 90
    trout = 91
    tulip = 92
    turtle = 93
    wardrobe = 94
    whale = 95
    willow_tree = 96
    wolf = 97
    woman = 98
    worm = 99

Run the following two cells to display a subset of the test images.

def unpickle(file):
    with open(file,'rb') as fo:
        dict = pickle.load(fo, encoding='latin1')
    return dict

datafile = r'./onnx/data/cifar-100-batches-py/test_batch'
metafile = r'./onnx/data/cifar-100-batches-py/batches.meta'

test_batch = unpickle(datafile) 
metadata = unpickle(metafile)

images = test_batch['data']
labels = test_batch['labels']
images = np.reshape(images,(10000, 3, 32, 32))

im = []

dirname = 'onnx/onnx_test_images'
if not os.path.exists(dirname):
   os.mkdir(dirname)
for i in range(100):
    im.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))

fig = plt.figure(figsize=(10, 10))
grid = ImageGrid(fig, 111, 
                 nrows_ncols=(4, 5), 
                 axes_pad=0.3)
         

for ax, image, label in zip(grid, im, labels):
    ax.axis("off")
    ax.imshow(image)
    ax.set_title(f'Actual label: {Cifar100Classes(label).name}', fontdict={'fontsize':4})

plt.show()

Following png images are shown as output:

100 trained images sample with actual labels

Step 3: Deploy the Model on the NPU

Run the next cell to set up the XLNX_VART_FIRMWARE environmental variable to point to the NPU binary. The NPU binary 1x4.xclbin is an AI design that provides up to 2 TOPS performance. Up to four such AI streams can be run in parallel on the NPU without any visible loss of performance.

os.environ['XLNX_VART_FIRMWARE'] = os.path.join("onnx", "xclbins","1x4.xclbin")

Load quantized ONNX model

Run the following cell to load the provided ONNX quantized model.

We will use the following pre-trained quantized file:

The trained quantized ResNet-50 model on the CIFAR-100 dataset is saved at the following location: onnx/resnet.qdq.U8S8.onnx

The trained quantized ResNet-50 model on the CIFAR-100 dataset is saved at the following location: onnx/resnet.qdq.U8S8.onnx

If you would like to re-train and quantize your model, please review the PyTorch ONNX re-train notebook.

quantized_model_path = r'./onnx/resnet.qdq.U8S8.onnx'
model = onnx.load(quantized_model_path)

Load quantized ONNX modelRun the following cell to load the provided ONNX quantized model.Here is the pre-trained quantized file:The trained quantized ResNet-50 model on the CIFAR-100 dataset is saved at the following location: onnx/resnet.qdq.U8S8.onnxIf you would like to re-train and quantize your model, please review the PyTorch ONNX re-train notebook.

Deploy the quantized ONNX model on the Ryzen AI NPU

For more information on provider options visit ONNX Runtime with Vitis AI Execution Provider

The file onnx/vaip_config.json is required when configuring Vitis AI Execution Provider (VAI EP) inside the ONNX Runtime code.

providers = ['VitisAIExecutionProvider']
cache_dir = os.path.join(os.getcwd(), "onnx")
provider_options = [{
            'config_file': 'onnx/xclbins/vaip_config.json',
            'cacheDir': str(cache_dir),
            'cacheKey': 'modelcachekey'
        }]

session = ort.InferenceSession(model.SerializeToString(), providers=providers,
                               provider_options=provider_options)

Deploy the quantized ONNX model on the Ryzen AI NPUFor more information on provider options visit ONNX Runtime with Vitis AI Execution ProviderThe file onnx/vaip_config.json is required when configuring Vitis AI Execution Provider (VAI EP) inside the ONNX Runtime code.

Step 4: Inference

The first 100 images are extracted from the CIFAR-100 test dataset and converted to the.png format.

The.png images are read, classified and visualized by running the quantized ResNet-50 model on the NPU.

# Extract and dump first 100 images 
for i in range(100): 
    im = images[i]
    im  = im.transpose(1,2,0)
    im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
    im_name = f'./{dirname}/image_{i}.png'
    cv2.imwrite(im_name, im)

viz_predicted_labels = []
misclassified_images = []
misclassified_labels = []
show_imlist = []

# Pick dumped images and predict
for i in range(100): 
    image_name = f'./{dirname}/image_{i}.png'
    image = Image.open(image_name).convert('RGB')
    # Resize the image to match the input size expected by the model
    image = image.resize((32, 32))  
    image_array = np.array(image).astype(np.float32)
    image_array = image_array/255

    # Reshape the array to match the input shape expected by the model
    image_array = np.transpose(image_array, (2, 0, 1))  

    # Add a batch dimension to the input image
    input_data = np.expand_dims(image_array, axis=0)

    # Run the model
    outputs = session.run(None, {'input': input_data})

    # Process the outputs
    predicted_class = np.argmax(outputs[0])
    predicted_label = metadata['label_names'][predicted_class]
    viz_predicted_labels.append(predicted_class)
    label = metadata['label_names'][labels[i]]
    # print(f'Image {i}: Actual Label {label}, Predicted Label {predicted_label}')
    if (label != predicted_label):
        misclassified_images.append(i)
        misclassified_labels.append(predicted_label)

    show_imlist.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))


fig = plt.figure(figsize=(10, 10))
grid = ImageGrid(fig, 111,  # similar to subplot(111)
                 nrows_ncols=(4, 5),  # creates 4x5 grid of axes
                 axes_pad=0.3,  # pad between axes in inch.
                 )

for ax, image, label in zip(grid, show_imlist, viz_predicted_labels):
    ax.axis("off")
    ax.imshow(image)
    ax.set_title(f'Predicted label: {Cifar100Classes(label).name}', fontdict={'fontsize':4})

plt.show()

Then Display the misclassifications

show_imlist_mis = []

for i in misclassified_images:
    show_imlist_mis.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))

varpltsize = len(misclassified_images)

fig = plt.figure(figsize=((1 * 2 * varpltsize), 1 * 2 * varpltsize))
grid = ImageGrid(fig, 111,  # similar to subplot(111)
                 nrows_ncols=(1, len(misclassified_images)),  
                 axes_pad=0.3,  # pad between axes in inch.
                 )

for ax, image, label in zip(grid, show_imlist_mis, misclassified_labels):
    ax.axis("off")
    ax.imshow(image)
    ax.set_title(f'Predicted label: {label}', fontdict={'fontsize':8})

plt.show()

Then see the images in png format:

Images predicted by NPU with labels

for i in range(100): 
    im = images[i]
    im  = im.transpose(1,2,0)
    im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
    im_name = f'./{dirname}/image_{i}.png'
    cv2.imwrite(im_name, im)

viz_predicted_labels = []
misclassified_images = []
misclassified_labels = []
show_imlist = []

# Pick dumped images and predict
for i in range(100): 
    image_name = f'./{dirname}/image_{i}.png'
    image = Image.open(image_name).convert('RGB')
    # Resize the image to match the input size expected by the model
    image = image.resize((32, 32))  
    image_array = np.array(image).astype(np.float32)
    image_array = image_array/255

    # Reshape the array to match the input shape expected by the model
    image_array = np.transpose(image_array, (2, 0, 1))  

    # Add a batch dimension to the input image
    input_data = np.expand_dims(image_array, axis=0)

    # Run the model
    outputs = session.run(None, {'input': input_data})

    # Process the outputs
    predicted_class = np.argmax(outputs[0])
    predicted_label = metadata['label_names'][predicted_class]
    viz_predicted_labels.append(predicted_class)
    label = metadata['label_names'][labels[i]]
    # print(f'Image {i}: Actual Label {label}, Predicted Label {predicted_label}')
    if (label != predicted_label):
        misclassified_images.append(i)
        misclassified_labels.append(predicted_label)

    show_imlist.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))


fig = plt.figure(figsize=(10, 10))
grid = ImageGrid(fig, 111,  # similar to subplot(111)
                 nrows_ncols=(10, 10),  # creates 10x10 grid of axes
                 axes_pad=0.3,  # pad between axes in inch.
                 )

for ax, image, label in zip(grid, show_imlist, viz_predicted_labels):
    ax.axis("off")
    ax.imshow(image)
    ax.set_title(f'Predicted label: {Cifar100Classes(label).name}', fontdict={'fontsize':8})

plt.show()

📷⬍
Display the misclassifications

show_imlist_mis = []

for i in misclassified_images:
    show_imlist_mis.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))

varpltsize = len(misclassified_images)

fig = plt.figure(figsize=((1 * 2 * varpltsize), 1 * 2 * varpltsize))
grid = ImageGrid(fig, 111,  # similar to subplot(111)
                 nrows_ncols=(1, len(misclassified_images)),  
                 axes_pad=0.3,  # pad between axes in inch.
                 )

for ax, image, label in zip(grid, show_imlist_mis, misclassified_labels):
    ax.axis("off")
    ax.imshow(image)
    ax.set_title(f'Predicted label: {label}', fontdict={'fontsize':8})

plt.show()

Step 5: Inference for more test images

Note: the cell below may extract up to 5, 000 images. You can delete the extracted images by following the instructions in Delete all Extracted Images.

The first 5, 000 images are extracted from the CIFAR-100 test dataset and converted to the.png format.

The.png images are read, classified and visualized by running the quantized ResNet-50 model on the NPU.

max_images = len(images)//2 # 5000 test images

# Extract and dump all images in the test set 
for i in range(max_images): 
    im = images[i]
    im  = im.transpose(1,2,0)
    im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
    im_name = f'./{dirname}/image_{i}.png'
    cv2.imwrite(im_name, im)

cm_predicted_labels = []
cm_actual_labels = []

# Pick dumped images and predict
for i in range(max_images): 
    image_name = f'./{dirname}/image_{i}.png'
    try:
        image = Image.open(image_name).convert('RGB')
    except:
        print(f"Warning: Image {image_name} maybe locked moving on to next image")
        continue
    # Resize the image to match the input size expected by the model
    image = image.resize((32, 32))  
    image_array = np.array(image).astype(np.float32)
    image_array = image_array/255

    # Reshape the array to match the input shape expected by the model
    image_array = np.transpose(image_array, (2, 0, 1))  

    # Add a batch dimension to the input image
    input_data = np.expand_dims(image_array, axis=0)

    # Run the model
    outputs = session.run(None, {'input': input_data})

    # Process the outputs
    predicted_class = np.argmax(outputs[0])
    predicted_label = metadata['label_names'][predicted_class]
    cm_predicted_labels.append(predicted_class)
    label = metadata['label_names'][labels[i]]
    cm_actual_labels.append(labels[i])
    if i%990 == 0:
        print(f'Status: Running Inference on image {i}... Actual Label: {label}, Predicted Label: {predicted_label}')
Status: Running Inference on image 0... Actual Label: cat, Predicted Label: cat
Status: Running Inference on image 990... Actual Label: automobile, Predicted Label: automobile
Status: Running Inference on image 1980... Actual Label: truck, Predicted Label: truck
Status: Running Inference on image 2970... Actual Label: dog, Predicted Label: dog
Status: Running Inference on image 3960... Actual Label: bird, Predicted Label: bird
Status: Running Inference on image 4950... Actual Label: bird, Predicted Label: bird
Inference for more test images
Note: the cell below may extract up to 5,000 images. You can delete the extracted images by following the instructions in Delete all Extracted Images.
The first 5,000 images are extracted from the CIFAR-100 test dataset and converted to the .png format.
The .png images are read, classified and visualized by running the quantized ResNet-50 model on the NPU.

# License 2 (see end of notebook)

max_images = len(images)//2 # 5000 test images

# Extract and dump all images in the test set 
for i in range(max_images): 
    im = images[i]
    im  = im.transpose(1,2,0)
    im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
    im_name = f'./{dirname}/image_{i}.png'
    cv2.imwrite(im_name, im)

cm_predicted_labels = []
cm_actual_labels = []

# Pick dumped images and predict
for i in range(max_images): 
    image_name = f'./{dirname}/image_{i}.png'
    try:
        image = Image.open(image_name).convert('RGB')
    except:
        print(f"Warning: Image {image_name} maybe locked moving on to next image")
        continue
    # Resize the image to match the input size expected by the model
    image = image.resize((32, 32))  
    image_array = np.array(image).astype(np.float32)
    image_array = image_array/255

    # Reshape the array to match the input shape expected by the model
    image_array = np.transpose(image_array, (2, 0, 1))  

    # Add a batch dimension to the input image
    input_data = np.expand_dims(image_array, axis=0)

    # Run the model
    outputs = session.run(None, {'input': input_data})

    # Process the outputs
    predicted_class = np.argmax(outputs[0])
    predicted_label = metadata['label_names'][predicted_class]
    cm_predicted_labels.append(predicted_class)
    label = metadata['label_names'][labels[i]]
    cm_actual_labels.append(labels[i])
    if i%990 == 0:
        print(f'Status: Running Inference on image {i}... Actual Label: {label}, Predicted Label: {predicted_label}')

Status: Running Inference on image 0... Actual Label: cat, Predicted Label: cat
Status: Running Inference on image 990... Actual Label: boat, Predicted Label: boat
Status: Running Inference on image 1980... Actual Label: truck, Predicted Label: truck
Status: Running Inference on image 2970... Actual Label: dog, Predicted Label: dog
Status: Running Inference on image 3960... Actual Label: cup, Predicted Label: cup
Status: Running Inference on image 4950... Actual Label: boy, Predicted Label: boy

How this AI Workflow Worked:

The AMD Ryzen AI Software provides tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs. It supports running applications on the neural processing unit (NPU) integrated into the AMD XDNA™ architecture, which marks the introduction of dedicated AI processing silicon on a Windows x86 processor. This capability enables developers to build and deploy models trained.

By utilizing the embedded NPU for AI tasks instead of relying solely on the CPU or GPU, Ryzen AI powered-laptops conserve battery life, allowing the CPU and GPU resources to handle other computing tasks.

Hardware and Software layers of AI development on AMD NPU

How It Works:

1. Trained Models: Developers can create or train a model using the PyTorch/TensorFlow/Riallto framework.
2. Quantization: The AMD Vitis AI Quantizer quantizes the model to INT8 and saves it in ONNX format. Support for Microsoft Olive is also available, with the Vitis AI quantizer as a plug-in.
3. Deployment: ONNX Runtime with Vitis AI EP optimizes, partitions, compiles, and executes the quantized ONNX models efficiently on Ryzen AI.

By porting their models to utilize the NPU integrated into the Ryzen AI Processor, developers can enhance applications such as real-time voice transcription and translation, generative AI-based image generation, or chatbots powered by large language models to operate locally on their PCs.

References

CIFAR Datasets

Riallto Framework for AMD Ryzen NPU

Conclusion

AMD’s latest Ryzen 9 7940 Series Processors introduce dedicated Neural Processing Units (NPUs) specifically designed for local AI inferencing on Windows 11. These NPUs enhance performance by handling AI-specific workloads directly within the system, without relying on an internet connection. With co-engineering efforts between Microsoft and AMD, these processors seamlessly integrate with machine learning frameworks enabling efficient training and inferencing of models.

Additionally, select Ryzen processors enhance Windows-based features that leverage machine learning algorithms. Overall, AMD’s NPUs represent a significant step toward empowering AI applications at the local level. I am looking forward to seeing more innovative software solutions utilizing these new NPU hardware.

Code

Credits

Shahariar

75 projects • 270 followers

"What Kills a 'Great life' is a 'Good Life', which is Living a Life Inside While Loop"

Contact

Comments

Please log in or sign up to comment.

AI Object Detection & Image Classification on AMD Ryzen NPU

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Hardware

What is NPU exactly?

Setup & Installations

Step 1: Import Packages

Step 2: Prepare the Data

Dataset layout

Step 3: Deploy the Model on the NPU

Load quantized ONNX model

Deploy the quantized ONNX model on the Ryzen AI NPU

Step 4: Inference

Step 5: Inference for more test images

How this AI Workflow Worked:

References

Conclusion

Schematics

Development Workflow

Code

RyzenAI SW

Credits

Shahariar

Comments

Embed the widget on your own site

AI Object Detection & Image Classification on AMD Ryzen NPU

AI Object Detection & Image Classification on AMD Ryzen NPU

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Hardware

What is NPU exactly?

Setup & Installations

Step 1: Import Packages

Step 2: Prepare the Data

Dataset layout

Step 3: Deploy the Model on the NPU

Load quantized ONNX model

Deploy the quantized ONNX model on the Ryzen AI NPU

Step 4: Inference

Step 5: Inference for more test images

How this AI Workflow Worked:

References

Conclusion

Schematics

Development Workflow

Code

RyzenAI SW

Credits

Shahariar

Comments

Related channels and tags