In this project I am going to demonstrate how AMD's new NPU hardware can run AI local workstations without any cloud connectivity.
The AMD Ryzen™ 7040 series processors utilize a leading-edge 4nm process node to deliver impressive performance and extended battery life in thin and light laptops. Some models feature the new Ryzen AI Engine, the first dedicated AI engine on a Windows x86 processor. Ryzen AI technology is designed to offer AI-driven capabilities that are unprecedented in real-time applications directly on laptops. One standout feature of Ryzen AI is its ability to handle up to eight concurrent spatially isolated AI workloads, allowing users to run multiple hardware-accelerated AI applications simultaneously without affecting each other’s performance.
- High-Level Specifications: Operating at 35-54W, achieving 10 TOPs
- Platform: Designed for commercial and consumer-grade laptops
- Applications: Enhancing computer vision effects, productivity tools, and multimedia experiences
- Model Support: Suitable for vision models and large language models
The hardware I received for this project is Venus UM790 Pro mini-PC which features AMD Ryzen™ 9 7940HS Processor. This processor has a dedicated NPU/IPU (Neural Processing Unit) which can run (Ryzen) AI workload locally without needing to only use CPU or GPU hardware.
The biggest benefits of NPU are: -
- It reduces CPU usage for AI workload
- Reduce system power consumptions
- Enables multiple AI workloads to run without causing performance issues
- Allows new AI applications running locally to ensure privacy
Here is my UM790 Pro setup. Mini PC turns into a desktop when it is connected to I/O devices like monitors, keyboard, mouse etc.
Specification of UM790 Pro mini-PC:
Under the hood of UM790 Pro
I was curious to peek inside the UM790 Pro enclosure. Here is how it looks like after removing the bottom cover, we can access the RAM, SSD and Network card for replacement or upgrade. Sadly thought the
Hardware Setup
Before getting started, connect following I/O devices to the mini PC
- 120 W Power Adapter Barrel Jack connector
- USB Mouse + Keyboard as input
- HDMI/USB Monitor as output
- Log into windows 11 and enable internet with Wi-Fi/Ethernet
Before we can understand the context of NPU, let's talk about evolution AI and relevant technologies a bit !
How AI is shaping the Future
Artificial Intelligence (AI) is reshaping our computing landscape much like the World Wide Web did when it was introduced in 1989. Similar to the web, AI will continue to evolve, and today’s applications are merely the precursor to an entirely new era of computing.
Imagine a future where every consumer laptop or desktop comes equipped with a specialized AI coprocessor designed specifically for efficient machine learning tasks. From enhancing image recognition and natural language processing to revolutionizing virtual assistants and personalized recommendations, the impact on our daily computing experiences will be profound.
Assistive AI:CreativityTransformation
For professionals engaged in content creation, graphic design, presentations, or personal projects, picture this scenario: as you sketch or outline your ideas in your preferred software, an AI system activates. It analyzes your work in real time, offering suggestions on techniques and artistic styles that align with your creative direction. When experimenting with colors, the AI provides complementary color palette suggestions based on the mood and theme of your work.
Struggling with a specific element, such as drawing a realistic animal or selecting the right arrangement? The AI generates a selection of reference images or designs to assist you. As you work, dynamic filters and effects are applied to your artwork, allowing you to preview different visual outcomes instantly and make informed creative decisions. Over time, your personal AI learns from your artistic choices, becoming attuned to your unique style and providing increasingly relevant suggestions.
What Is an NPU?
As the focus on machine learning workloads intensifies, hardware designers are prioritizing enhancements to accommodate these tasks. Modern CPUs and general-purpose GPUs already integrate features tailored for machine learning, such as hardware support for reduced precision number formats. However, there remains a fundamental limitation: CPUs and GPUs must also handle diverse workloads beyond machine learning.
To overcome this limitation and achieve greater efficiency in machine learning, a specialized architecture designed exclusively for these tasks is emerging.
Enter the Neural Processing Unit (NPU). This new class of processing unit excels in machine learning inference tasks. An NPU significantly enhances the speed and efficiency of machine learning computations on your computer. NPUs are sometimes referred to as Inference Processing Units (IPUs). In certain Ryzen AI documentation, you might encounter the term “IPU” instead of “NPU.” Once you’ve installed the Ryzen AI driver on your computer, you’ll notice a device named IPU in the Windows Device Manager
Ryzen AI NPU: Empowering AI Workloads
In 2023, AMD introduced the Ryzen 7000 desktop and laptop chips. Alongside the primary x86 CPU, the Ryzen 7000 series features a novel coprocessor: the Neural Processing Unit (NPU), built on the XDNA™ AI Engine architecture. This cutting-edge NPU is aptly named Ryzen AI.
Unlike the main x86 Zen 4 CPU, which follows a Von Neumann architecture, Ryzen AI employs a specialized dataflow architecture. This design optimizes performance and responsiveness for AI applications while operating at lower power levels.
When you execute AI workloads locally on an NPU, you benefit from:
- AI-Enhanced Video Conferencing: Exclusive features available to systems equipped with an NPU enhance video conferencing experiences.
- Higher AI Workload Performance: The NPU delivers superior performance for AI tasks.
- Reduced Latency: Faster responses due to lower latency are crucial for real-time applications. For instance, video streams won’t stutter, and audio won’t glitch during video calls.
- Energy Efficiency: The NPU’s efficiency translates to longer battery life, allowing you to work uninterrupted on a single charge.
By integrating a dedicated dataflow processing unit into your Ryzen 7000 processor, you unlock new and exciting ways to experience machine learning. Your laptop becomes smarter, more capable, and enjoys extended battery life.
Setup & Installations- Enable IPU/NPU
The IPU/NPU is not enabled from UEFI BIOS settings by default.
If you are already logged into the PC's Windows 11 OS
Press windows key and type 'bios'
Go to System > Recovery > (Advance Start Up) click Restart Now
After 1st restart:
choose an option > troubleshoot > advanced options > UEFI Firmware settings, click Restart
After 2nd restart:
Setup > Advanced > CPU configuration > IPU control > Enable
Now save and exit and boot into Windows 11
Go check your device manager and you will see your IPU hardware
device manager > system devices > AMD IPU
- Install NPU Driver
The Ryzen AI Software supports the following processors running Windows 11.
AMD Ryzen™ 7940HS, 7840HS, 7640HS, 7840U, 7640U.
AMD Ryzen™ 7940HS, 7840HS, 7640HS, 7840U, 7640U.
AMD Ryzen™ 8640U, 8640HS, 8645H, 8840U, 8840HS, 8845H, 8945H.
AMD Ryzen™ 8640U, 8640HS, 8645H, 8840U, 8840HS, 8845H, 8945H.
Therefore, Ryzen AI software on NPU will only work with Laptops/Desktops which have any of these processors only ( only these CPU got NPU/IPU hardware )
Go to the link below and follow the Installation Instructions from AMD Installation Instructions — Ryzen AI Software 1.1 documentation (amd.com)
Download the NPU Driver
and install it by following these steps:
Extract the downloaded zip file.
- Extract the downloaded zip file.
Open a terminal in administrator mode and execute the .\amd_install_kipudrv.bat
bat file.
- Open a terminal in administrator mode and execute the
.\amd_install_kipudrv.bat
bat file.
Ensure that the NPU driver is installed from Device Manager
-> System Devices
-> AMD IPU Device
as shown in the following image.
You will have to create an AMD developers account to download the NPU drivers
- Install Visual Studio 2019
Download and Install Visual Studio 2019
- Install Python 3.9
Don't download and Install Python 3.9 directly, instead click on Modify from Visual Studio Installer on, then select Python package 3.9.13, finally install and download
- Install Anaconda
Before proceeding with the installation of the Ryzen AI Software, it is essential to verify that all the necessary requirements listed earlier have been fulfilled. Additionally, ensure that the Windows PATH variable includes the correct settings for each component.
For instance, Anaconda necessitates the inclusion of specific paths in the PATH variable:
path\to\anaconda3, path\to\anaconda3\Scripts
path\to\anaconda3\Lib\bin
These adjustments to the PATH variable can be made through the Environment Variables section found within the System Properties window.
Download and Install Anaconda into the right directory.
- Install Git
Download and Install Git version 2.45 for Windows 11
- Install Ryzen AI Software
Download the ryzen-ai-sw-1.1.zip
Ryzen AI Software installation package and extract it. You will need to log into your AMD Developer account to download this.
Open an Anaconda or Windows command prompt in the extracted folder and run the installation script as shown below. Make sure to enter “Y” when prompted to accept the EULA.
.\install.bat
If all the prerequisites are installed correctly, then you will see OK OK OK for all the packages needed for Ryzen AI. If something is not installed, reinstall it into the right path. In my case I have installed everything into the following path
The install.bat
script does the following:
- Creates a conda environment
- Installs the Vitis AI Quantizer for ONNX
- Installs the ONNX Runtime
- Installs the Vitis AI Execution Provider
- Configures the environment to use the throughput profile of the NPU
- Prints the name of the conda environment before exiting
The default Ryzen AI Software packages are now installed in the conda environment created by the installer. To begin using the Ryzen AI Software, activate the conda environment generated by the installer (its name is displayed during the installation process).
Note: The Ryzen AI Software installation directory (where the zip file was extracted) contains essential files needed during runtime for inference sessions. These files comprise the NPU binaries (*.xclbin) and the default runtime configuration file (vaip_config.json) for the Vitis AI Execution Provider. Therefore, it is crucial not to delete the installation directory and to store it in a convenient location. For detailed instructions on preparing the environment before running an inference session on the NPU,
Refer to the Runtime Setup page for more details about setting up the environment before running an inference session on the NPU.
- Install Riallto
To proceed, you'll need a laptop or computer equipped with an AMD Ryzen AI processor. Note that support for the Ryzen NPU and Riallto is currently exclusive to Windows 11.
In the Windows Device Manager, the Ryzen AI NPU appears as an IPU (Inference Processing Unit), which is synonymous with an NPU. 'Full' installation options require the IPU driver version 10.1109.8.128 to be installed. For instructions on installing the Windows driver for the Ryzen AI NPU, refer to the following page:
You will need to provide your machine's MAC address to get the license :
Make sure to install the license from AMD into the correct directory as follows :
Download Riallto Installer:
Riallto includes runtime software for loading Ryzen AI applications and a software toolchain for compiling and building applications. Additionally, Riallto provides a set of Jupyter Notebook tutorials.
Download the latest v1.1 Riallto installer and execute it on your Ryzen AI laptop. This installation will set up the Riallto software framework and provide access to Riallto Jupyter notebooks, which you can browse and run on your laptop. Ensure you select the appropriate installation option as prompted; the 'Full' version requires WSL2 and the AIE build license (refer to Prerequisites above for details).
If you do not have a Ryzen AI laptop, you can access the Riallto notebooks as webpages, which constitute the majority of the content on the current webpages you are browsing. Explore the NPU Architecture Features section to gain insights into the NPU, and consult the Building Applications section to learn how to create custom applications for it. The final section demonstrates how to perform Machine Learning with ONNX on Ryzen AI.
Launch Riallto:
Once Riallto is installed on your laptop, click on the desktop icon to initiate it. This action will start a JupyterLab server instance in your web browser.
Press windows key and type Riallto, then hit enter to launch Riallto framework. It will launch a Jupyter Notebook in web browser where you can throw python codes and run machine learning models on AMD NPU.
Open a Notebook from Riallto as follows, we will run python code chunks in this window:
Please note that, Riallto is more of an educational tool for experimenting with AI models on AMD hardware, this is not exactly suitable for making finished AI products.
Step 1: Import PackagesRun the following cell to import all the necessary packages to be able to run the inference in the Ryzen AI NPU.
This code will run on using Anaconda's Jupyter Notebook from launching Riallto. This is basically a webpage that will run python codes, one chunk at a time !
Copy the python code blocks given below and paste into cells and execute the code by clicking the Play Icon:
import onnx
import onnxruntime as ort
import enum
import numpy as np
import cv2
import pickle
import os
import glob
import tarfile
import urllib.request
import matplotlib.pyplot as plt
from PIL import Image
from mpl_toolkits.axes_grid1 import ImageGrid
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sn
import pandas as pd
Step 2: Prepare the DataA pre-trained ResNet-50 model from PyTorch Hub for the CIFAR-100 dataset will be deployed.
Download the CIFAR-100 dataset
Next. Execute the following cells to download the CIFAR-100 dataset. The dataset is stored in data/cifar-100-batches-py/
.
global models_dir, data_dir
models_dir = ".\\onnx"
data_dir= ".\\onnx\\data"
# Download data - One-time only
datadirname = ".\\onnx\\data"
if not os.path.exists(datadirname):
data_download_tar = "cifar-100-python.tar.gz"
urllib.request.urlretrieve("https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz", data_download_tar)
file = tarfile.open(data_download_tar)
file.extractall(data_dir)
file.close()
# Delete cifar-100-python.tar.gz source file after all images are extracted
data_images_path = os.path.join(os.getcwd(), "cifar-100-python.tar.gz")
files = glob.glob(data_images_path)
for f in files:
os.remove(f)
The CIFAR-100 dataset is an extension of the popular CIFAR-10 dataset.
The CIFAR-10 dataset comprises 60, 000 color images, each with dimensions of 32x32 pixels.It consists of 10 distinct classes, with 6, 000 images per class.The training set contains 50, 000 images, while the test set has 10, 000 imagesSpecifically, there are five training batches and one test batch, each containing 10, 000 images.Within the test batch, each class is represented by 1, 000 randomly selected images.
Let’s break down the details for CIFAR 100 dataset:
Dataset Overview:
- The CIFAR-100 dataset consists of 100 classes, each containing 600 images.
- There are 500 training images and 100 testing images per class.
- Images are of size 32x32 pixels and are color (RGB) images.
- Dataset Overview:The CIFAR-100 dataset consists of 100 classes, each containing 600 images.There are 500 training images and 100 testing images per class.Images are of size 32x32 pixels and are color (RGB) images.
Class Organization:
- The 100 classes are grouped into 20 superclasses.
Each image has two labels:
- “Fine” label: The specific class to which it belongs (e.g., “apple, ” “lion, ” “oak”).
- “Coarse” label: The superclass to which it belongs (e.g., “fruit and vegetables, ” “large carnivores, ” “trees”).
- Each image has two labels:“Fine” label: The specific class to which it belongs (e.g., “apple, ” “lion, ” “oak”).“Coarse” label: The superclass to which it belongs (e.g., “fruit and vegetables, ” “large carnivores, ” “trees”).
- Class Organization:The 100 classes are grouped into 20 super classes.Each image has two labels:“Fine” label: The specific class to which it belongs (e.g., “apple, ” “lion, ” “oak”).“Coarse” label: The superclass to which it belongs (e.g., “fruit and vegetables, ” “large carnivores, ” “trees”).
Superclasses and Their Classes:
- Here’s the breakdown of the superclasses and their corresponding classes:
- Superclasses and Their Classes:Here’s the breakdown of the superclasses and their corresponding classes:
Let’s break down the layout of the CIFAR-100 dataset for the Python:
The archive contains the following files:
data_batch_1
,data_batch_2
, …,data_batch_5
: These files are Python “pickled” objects created using cPickle.test_batch
: The test data batch.- The archive contains the following files:
data_batch_1
,data_batch_2
, …,data_batch_5
: These files are Python “pickled” objects created using cPickle.test_batch
: The test data batch.
Each batch file is a dictionary with the following elements:
data
: A 10000x3072 numpy array of uint8s. Each row represents a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 contain green, and the final 1024 contain blue. Images are stored in row-major order.labels
: A list of 10000 numbers in the range 0-9. The number at indexi
indicates the label of thei
th image in thedata
array.- Each batch file is a dictionary with the following elements:
data
: A 10000x3072 numpy array of uint8s. Each row represents a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 contain green, and the final 1024 contain blue. Images are stored in row-major order.labels
: A list of 10000 numbers in the range 0-9. The number at indexi
indicates the label of thei
th image in thedata
array.
This file is also a Python dictionary objct with the following entries:
label_names
: A 10-element list providing meaningful names for the numeric labels. For example,label_names[0]
corresponds to “airplane, ”label_names[1]
to “automobile, ” and so on.- This file is also a Python dictionary object with the following entries:
label_names
: A 10-element list providing meaningful names for the numeric labels. For example,label_names[0]
corresponds to “airplane, ”label_names[1]
to “automobile, ” and so on. - The binary version includes files like
data_batch_1.bin
,data_batch_2.bin
, …,data_batch_5.bin
, andtest_batch.bin
.
Each file is formatted as follows:
<1 x label><3072 x pixel>
: The first byte represents the label (a number in the range 0-9), followed by 3072 bytes representing pixel values.- The first 10 bytes correspond to the label and pixel values of the first image.
- Each file is formatted as follows:
<1 x label><3072 x pixel>
: The first byte represents the label (a number in the range 0-9), followed by 3072 bytes representing pixel values.The first 10 bytes correspond to the label and pixel values of the first image.
The CIFAR100 classes are enumerated in the Cifar100Classes
class below:
class Cifar100Classes(enum.Enum):
apple = 0
aquarium_fish = 1
baby = 2
bear = 3
beaver = 4
bed = 5
bee = 6
beetle = 7
bicycle = 8
bottle = 9
bowl = 10
boy = 11
bridge = 12
bus = 13
butterfly = 14
camel = 15
can = 16
castle = 17
caterpillar = 18
cattle = 19
chair = 20
chimpanzee = 21
clock = 22
cloud = 23
cockroach = 24
couch = 25
crab = 26
crocodile = 27
cup = 28
dinosaur = 29
dolphin = 30
elephant = 31
flatfish = 32
forest = 33
fox = 34
girl = 35
hamster = 36
house = 37
kangaroo = 38
keyboard = 39
lamp = 40
lawn_mower = 41
leopard = 42
lion = 43
lizard = 44
lobster = 45
man = 46
maple_tree = 47
motorcycle = 48
mountain = 49
mouse = 50
mushroom = 51
oak_tree = 52
orange = 53
orchid = 54
otter = 55
palm_tree = 56
pear = 57
pickup_truck = 58
pine_tree = 59
plain = 60
plate = 61
poppy = 62
porcupine = 63
possum = 64
rabbit = 65
raccoon = 66
ray = 67
road = 68
rocket = 69
rose = 70
sea = 71
seal = 72
shark = 73
shrew = 74
skunk = 75
skyscraper = 76
snail = 77
snake = 78
spider = 79
squirrel = 80
streetcar = 81
sunflower = 82
sweet_pepper = 83
table = 84
tank = 85
telephone = 86
television = 87
tiger = 88
tractor = 89
train = 90
trout = 91
tulip = 92
turtle = 93
wardrobe = 94
whale = 95
willow_tree = 96
wolf = 97
woman = 98
worm = 99
Run the following two cells to display a subset of the test images.
def unpickle(file):
with open(file,'rb') as fo:
dict = pickle.load(fo, encoding='latin1')
return dict
datafile = r'./onnx/data/cifar-100-batches-py/test_batch'
metafile = r'./onnx/data/cifar-100-batches-py/batches.meta'
test_batch = unpickle(datafile)
metadata = unpickle(metafile)
images = test_batch['data']
labels = test_batch['labels']
images = np.reshape(images,(10000, 3, 32, 32))
im = []
dirname = 'onnx/onnx_test_images'
if not os.path.exists(dirname):
os.mkdir(dirname)
for i in range(100):
im.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))
fig = plt.figure(figsize=(10, 10))
grid = ImageGrid(fig, 111,
nrows_ncols=(4, 5),
axes_pad=0.3)
for ax, image, label in zip(grid, im, labels):
ax.axis("off")
ax.imshow(image)
ax.set_title(f'Actual label: {Cifar100Classes(label).name}', fontdict={'fontsize':4})
plt.show()
Following png images are shown as output:
Run the next cell to set up the XLNX_VART_FIRMWARE
environmental variable to point to the NPU binary. The NPU binary 1x4.xclbin
is an AI design that provides up to 2 TOPS performance. Up to four such AI streams can be run in parallel on the NPU without any visible loss of performance.
os.environ['XLNX_VART_FIRMWARE'] = os.path.join("onnx", "xclbins","1x4.xclbin")
Load quantized ONNX modelRun the following cell to load the provided ONNX quantized model.
We will use the following pre-trained quantized file:
The trained quantized ResNet-50 model on the CIFAR-100 dataset is saved at the following location: onnx/resnet.qdq.U8S8.onnx
- The trained quantized ResNet-50 model on the CIFAR-100 dataset is saved at the following location:
onnx/resnet.qdq.U8S8.onnx
If you would like to re-train and quantize your model, please review the PyTorch ONNX re-train notebook.
quantized_model_path = r'./onnx/resnet.qdq.U8S8.onnx'
model = onnx.load(quantized_model_path)
Load quantized ONNX modelRun the following cell to load the provided ONNX quantized model.Here is the pre-trained quantized file:The trained quantized ResNet-50 model on the CIFAR-100 dataset is saved at the following location: onnx/resnet.qdq.U8S8.onnx
If you would like to re-train and quantize your model, please review the PyTorch ONNX re-train notebook.
For more information on provider options visit ONNX Runtime with Vitis AI Execution Provider
The file onnx/vaip_config.json
is required when configuring Vitis AI Execution Provider (VAI EP) inside the ONNX Runtime code.
providers = ['VitisAIExecutionProvider']
cache_dir = os.path.join(os.getcwd(), "onnx")
provider_options = [{
'config_file': 'onnx/xclbins/vaip_config.json',
'cacheDir': str(cache_dir),
'cacheKey': 'modelcachekey'
}]
session = ort.InferenceSession(model.SerializeToString(), providers=providers,
provider_options=provider_options)
Deploy the quantized ONNX model on the Ryzen AI NPUFor more information on provider options visit ONNX Runtime with Vitis AI Execution ProviderThe file onnx/vaip_config.json
is required when configuring Vitis AI Execution Provider (VAI EP) inside the ONNX Runtime code.
The first 100 images are extracted from the CIFAR-100 test dataset and converted to the.png format.
The.png images are read, classified and visualized by running the quantized ResNet-50 model on the NPU.
# Extract and dump first 100 images
for i in range(100):
im = images[i]
im = im.transpose(1,2,0)
im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
im_name = f'./{dirname}/image_{i}.png'
cv2.imwrite(im_name, im)
viz_predicted_labels = []
misclassified_images = []
misclassified_labels = []
show_imlist = []
# Pick dumped images and predict
for i in range(100):
image_name = f'./{dirname}/image_{i}.png'
image = Image.open(image_name).convert('RGB')
# Resize the image to match the input size expected by the model
image = image.resize((32, 32))
image_array = np.array(image).astype(np.float32)
image_array = image_array/255
# Reshape the array to match the input shape expected by the model
image_array = np.transpose(image_array, (2, 0, 1))
# Add a batch dimension to the input image
input_data = np.expand_dims(image_array, axis=0)
# Run the model
outputs = session.run(None, {'input': input_data})
# Process the outputs
predicted_class = np.argmax(outputs[0])
predicted_label = metadata['label_names'][predicted_class]
viz_predicted_labels.append(predicted_class)
label = metadata['label_names'][labels[i]]
# print(f'Image {i}: Actual Label {label}, Predicted Label {predicted_label}')
if (label != predicted_label):
misclassified_images.append(i)
misclassified_labels.append(predicted_label)
show_imlist.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))
fig = plt.figure(figsize=(10, 10))
grid = ImageGrid(fig, 111, # similar to subplot(111)
nrows_ncols=(4, 5), # creates 4x5 grid of axes
axes_pad=0.3, # pad between axes in inch.
)
for ax, image, label in zip(grid, show_imlist, viz_predicted_labels):
ax.axis("off")
ax.imshow(image)
ax.set_title(f'Predicted label: {Cifar100Classes(label).name}', fontdict={'fontsize':4})
plt.show()
Then Display the misclassifications
show_imlist_mis = []
for i in misclassified_images:
show_imlist_mis.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))
varpltsize = len(misclassified_images)
fig = plt.figure(figsize=((1 * 2 * varpltsize), 1 * 2 * varpltsize))
grid = ImageGrid(fig, 111, # similar to subplot(111)
nrows_ncols=(1, len(misclassified_images)),
axes_pad=0.3, # pad between axes in inch.
)
for ax, image, label in zip(grid, show_imlist_mis, misclassified_labels):
ax.axis("off")
ax.imshow(image)
ax.set_title(f'Predicted label: {label}', fontdict={'fontsize':8})
plt.show()
Then see the images in png format:
for i in range(100):
im = images[i]
im = im.transpose(1,2,0)
im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
im_name = f'./{dirname}/image_{i}.png'
cv2.imwrite(im_name, im)
viz_predicted_labels = []
misclassified_images = []
misclassified_labels = []
show_imlist = []
# Pick dumped images and predict
for i in range(100):
image_name = f'./{dirname}/image_{i}.png'
image = Image.open(image_name).convert('RGB')
# Resize the image to match the input size expected by the model
image = image.resize((32, 32))
image_array = np.array(image).astype(np.float32)
image_array = image_array/255
# Reshape the array to match the input shape expected by the model
image_array = np.transpose(image_array, (2, 0, 1))
# Add a batch dimension to the input image
input_data = np.expand_dims(image_array, axis=0)
# Run the model
outputs = session.run(None, {'input': input_data})
# Process the outputs
predicted_class = np.argmax(outputs[0])
predicted_label = metadata['label_names'][predicted_class]
viz_predicted_labels.append(predicted_class)
label = metadata['label_names'][labels[i]]
# print(f'Image {i}: Actual Label {label}, Predicted Label {predicted_label}')
if (label != predicted_label):
misclassified_images.append(i)
misclassified_labels.append(predicted_label)
show_imlist.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))
fig = plt.figure(figsize=(10, 10))
grid = ImageGrid(fig, 111, # similar to subplot(111)
nrows_ncols=(10, 10), # creates 10x10 grid of axes
axes_pad=0.3, # pad between axes in inch.
)
for ax, image, label in zip(grid, show_imlist, viz_predicted_labels):
ax.axis("off")
ax.imshow(image)
ax.set_title(f'Predicted label: {Cifar100Classes(label).name}', fontdict={'fontsize':8})
plt.show()
📷⬍
Display the misclassifications
show_imlist_mis = []
for i in misclassified_images:
show_imlist_mis.append(cv2.cvtColor(images[i].transpose(1,2,0), cv2.COLOR_RGB2BGR))
varpltsize = len(misclassified_images)
fig = plt.figure(figsize=((1 * 2 * varpltsize), 1 * 2 * varpltsize))
grid = ImageGrid(fig, 111, # similar to subplot(111)
nrows_ncols=(1, len(misclassified_images)),
axes_pad=0.3, # pad between axes in inch.
)
for ax, image, label in zip(grid, show_imlist_mis, misclassified_labels):
ax.axis("off")
ax.imshow(image)
ax.set_title(f'Predicted label: {label}', fontdict={'fontsize':8})
plt.show()
Step 5: Inference for more test imagesNote: the cell below may extract up to 5, 000 images. You can delete the extracted images by following the instructions in Delete all Extracted Images.
The first 5, 000 images are extracted from the CIFAR-100 test dataset and converted to the.png format.
The.png images are read, classified and visualized by running the quantized ResNet-50 model on the NPU.
max_images = len(images)//2 # 5000 test images
# Extract and dump all images in the test set
for i in range(max_images):
im = images[i]
im = im.transpose(1,2,0)
im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
im_name = f'./{dirname}/image_{i}.png'
cv2.imwrite(im_name, im)
cm_predicted_labels = []
cm_actual_labels = []
# Pick dumped images and predict
for i in range(max_images):
image_name = f'./{dirname}/image_{i}.png'
try:
image = Image.open(image_name).convert('RGB')
except:
print(f"Warning: Image {image_name} maybe locked moving on to next image")
continue
# Resize the image to match the input size expected by the model
image = image.resize((32, 32))
image_array = np.array(image).astype(np.float32)
image_array = image_array/255
# Reshape the array to match the input shape expected by the model
image_array = np.transpose(image_array, (2, 0, 1))
# Add a batch dimension to the input image
input_data = np.expand_dims(image_array, axis=0)
# Run the model
outputs = session.run(None, {'input': input_data})
# Process the outputs
predicted_class = np.argmax(outputs[0])
predicted_label = metadata['label_names'][predicted_class]
cm_predicted_labels.append(predicted_class)
label = metadata['label_names'][labels[i]]
cm_actual_labels.append(labels[i])
if i%990 == 0:
print(f'Status: Running Inference on image {i}... Actual Label: {label}, Predicted Label: {predicted_label}')
Status: Running Inference on image 0... Actual Label: cat, Predicted Label: cat
Status: Running Inference on image 990... Actual Label: automobile, Predicted Label: automobile
Status: Running Inference on image 1980... Actual Label: truck, Predicted Label: truck
Status: Running Inference on image 2970... Actual Label: dog, Predicted Label: dog
Status: Running Inference on image 3960... Actual Label: bird, Predicted Label: bird
Status: Running Inference on image 4950... Actual Label: bird, Predicted Label: bird
Inference for more test images
Note: the cell below may extract up to 5,000 images. You can delete the extracted images by following the instructions in Delete all Extracted Images.
The first 5,000 images are extracted from the CIFAR-100 test dataset and converted to the .png format.
The .png images are read, classified and visualized by running the quantized ResNet-50 model on the NPU.
# License 2 (see end of notebook)
max_images = len(images)//2 # 5000 test images
# Extract and dump all images in the test set
for i in range(max_images):
im = images[i]
im = im.transpose(1,2,0)
im = cv2.cvtColor(im,cv2.COLOR_RGB2BGR)
im_name = f'./{dirname}/image_{i}.png'
cv2.imwrite(im_name, im)
cm_predicted_labels = []
cm_actual_labels = []
# Pick dumped images and predict
for i in range(max_images):
image_name = f'./{dirname}/image_{i}.png'
try:
image = Image.open(image_name).convert('RGB')
except:
print(f"Warning: Image {image_name} maybe locked moving on to next image")
continue
# Resize the image to match the input size expected by the model
image = image.resize((32, 32))
image_array = np.array(image).astype(np.float32)
image_array = image_array/255
# Reshape the array to match the input shape expected by the model
image_array = np.transpose(image_array, (2, 0, 1))
# Add a batch dimension to the input image
input_data = np.expand_dims(image_array, axis=0)
# Run the model
outputs = session.run(None, {'input': input_data})
# Process the outputs
predicted_class = np.argmax(outputs[0])
predicted_label = metadata['label_names'][predicted_class]
cm_predicted_labels.append(predicted_class)
label = metadata['label_names'][labels[i]]
cm_actual_labels.append(labels[i])
if i%990 == 0:
print(f'Status: Running Inference on image {i}... Actual Label: {label}, Predicted Label: {predicted_label}')
Status: Running Inference on image 0... Actual Label: cat, Predicted Label: cat
Status: Running Inference on image 990... Actual Label: boat, Predicted Label: boat
Status: Running Inference on image 1980... Actual Label: truck, Predicted Label: truck
Status: Running Inference on image 2970... Actual Label: dog, Predicted Label: dog
Status: Running Inference on image 3960... Actual Label: cup, Predicted Label: cup
Status: Running Inference on image 4950... Actual Label: boy, Predicted Label: boy
How this AI Workflow Worked:The AMD Ryzen AI Software provides tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs. It supports running applications on the neural processing unit (NPU) integrated into the AMD XDNA™ architecture, which marks the introduction of dedicated AI processing silicon on a Windows x86 processor. This capability enables developers to build and deploy models trained.
By utilizing the embedded NPU for AI tasks instead of relying solely on the CPU or GPU, Ryzen AI powered-laptops conserve battery life, allowing the CPU and GPU resources to handle other computing tasks.
How It Works:
- 1. Trained Models: Developers can create or train a model using the PyTorch/TensorFlow/Riallto framework.
- 2. Quantization: The AMD Vitis AI Quantizer quantizes the model to INT8 and saves it in ONNX format. Support for Microsoft Olive is also available, with the Vitis AI quantizer as a plug-in.
- 3. Deployment: ONNX Runtime with Vitis AI EP optimizes, partitions, compiles, and executes the quantized ONNX models efficiently on Ryzen AI.
By porting their models to utilize the NPU integrated into the Ryzen AI Processor, developers can enhance applications such as real-time voice transcription and translation, generative AI-based image generation, or chatbots powered by large language models to operate locally on their PCs.
ReferencesRiallto Framework for AMD Ryzen NPU
ConclusionAMD’s latest Ryzen 9 7940 Series Processors introduce dedicated Neural Processing Units (NPUs) specifically designed for local AI inferencing on Windows 11. These NPUs enhance performance by handling AI-specific workloads directly within the system, without relying on an internet connection. With co-engineering efforts between Microsoft and AMD, these processors seamlessly integrate with machine learning frameworks enabling efficient training and inferencing of models.
Additionally, select Ryzen processors enhance Windows-based features that leverage machine learning algorithms. Overall, AMD’s NPUs represent a significant step toward empowering AI applications at the local level. I am looking forward to seeing more innovative software solutions utilizing these new NPU hardware.
Comments