Demonstration
Introduction
Project Overview
Experimental Setup
1. KR260 Setup
1.1. Setting up the SD Card Image (Ubuntu
1.2. Connecting Everything
1.3. Boosting your Starter Kit and Login with GNOME Desktop
1.4. Install the PYNQ DPU for future AI inference applications on the KR260 Board
1.5. Testing PYNQ DPU with Python or C++ VART APIs
1.6. Notes
2. Vitis-AI Setup
2.1. Pull the Vitis-AI docker image in CPU version
2.2. Build the Vitis-AI docker image in GPU version
AI Model Establishment
1. Yolov5 Training
1.1. COCO Dataset Establishment
1.2. Dataset Collection and Labeling
1.3. Dataset Transformation into COCO Format
1.4. Model Training
2. Yolov5 Quantization & Compilation
2.1. Model quantization
2.2. Model Compilation
GPIO&DPU Overlay Establishment
Assembly on KR260
1.1. Activating the PYNQ environment
1.2. Run the Top Script
Appendices
Appendix A
Appendix B

Team AIllume:

•

•

•

•

•

Created December 7, 2023

AI-llume

A customizable intelligent wearable vest designed for the blind and visually impaired

Things used in this project

Hardware components

AMD Kria™ KR260 Robotics Starter Kit

Vibrating motor

L298N H Bridge

Webcam, Logitech® HD Pro

Software apps and online services

AMD Vivado Design Suite

AMD Vitis Unified Software Platform

AMD Vitis-AI

OpenCV – Open Source Computer Vision Library OpenCV

Story

Demonstration

Demonstration Video

Introduction

Nowadays, there are at least 2.2 billion people suffering from vision impairment or blindness in the world [1]. Although guide dogs are effective navigation assistants for the blind and visually impaired (BVI) individuals, their training process and maintenance are costly, making them inaccessible to most people in need. Additionally, individuals with visual impairments often have unique expectations for navigation systems, shaped by their preferences or other disabilities like hearing loss [2-5]. Unfortunately, most existing navigation systems overlook these expectations, forcing users to adapt to the system. Only a few systems have begun incorporating customizable navigation options to address these needs [6-10].

Figure 1: A blind man is led by his guide dog [11]

To address these challenges, we propose an intelligent wearable vest developed on top of AMD Kira KR260, named "AI-llume" (a combination of "AI" and "illume"). It is designed to enhance the lives of BVI individuals by serving as a cost-effective and customizable alternative to guide dogs. Unlike current wearable navigation systems, AI-llume offers a high degree of customization, allowing users to personalize the vest according to their specific needs and preferences. Additionally, AI-llume is designed to be portable and affordable, facilitating easy use during outdoor activities and significantly reducing the costs associated with guide dog ownership.

Figure 2: Concept Diagram of AIllume

Project Overview

AI-llume utilizes advanced AI and hardware components to provide an intelligent navigation solution. The system captures visual data through integrated cameras and processes it using an AI model powered by the AMD Kira KR260 board. The main features of AI-llume include:

Environment Sensing and Recognition: AI-llume employs cameras and image processing algorithms to detect and recognize obstacles in the user's environment, providing real-time feedback and guidance.
Multimodal Feedback: The system offers feedback through motor-driven vibrators encircling a belt and a buzzer, delivering tactile and acoustic feedback to the user. Distinctive vibration patterns and frequencies indicate different orientations and potential dangers, while spoken statements provide additional auditory cues.

The workflow of AI-llume's guiding process in an outdoor environment is as follows:

Figure 3: Workflow of AIllume

By integrating advanced AI technologies and providing customizable features, AI-llume aims to significantly improve the navigation and independence of BVI individuals, offering a viable and cost-effective alternative to guide dogs. This structured approach ensures a thorough and systematic development process, leveraging state-of-the-art AI and hardware capabilities to create a highly effective navigation assistant for BVI individuals.

Experimental Setup

1. KR260 Setup

To start with the KR260 Starter Kit, here are some prerequisites:

The KR260 Starter Kit
balenaEtcher

The following steps illustrate the setting up of the KR260 board based on the KR260 official open repository "Kria-RoboticsAI" [13].

1.1. Setting up the SD Card Image (Ubuntu)

As the KR260 board has two boot devices, primary one and secondary one, isolating the boot firmware from the run-time OS and application. We need to upload the Ubuntu image onto the board using the microSD card provided by the starter kit and develop our project codes based on the uploaded image. Before uploading the image, we need to set up the microSD card [12]:

Download the Kria KR260 Robotics Starter Kit Image and save it on your computer.
Download the Balena Etcher (recommended; available for Window, Linux, and macOS). Find additional OS-specific tool options below.
Follow the instructions in the tool and select the downloaded image to flash onto your microSD card.

Figure 4: Screenshot of balenaEtcher

1.2. Connecting Everything

Figure 6: Planform of Kria KR260 Robotics Starter Kit

Now, we are able to turn on the board and login with the default credentials:

username: ubuntu
password: ubuntu

To log in to the GNOME Desktop, you need to connect a DisplayPort monitor, a USB keyboard, and a mouse to the board. Turn on the Starter Kit by connecting the power supply to the AC plug. The power LEDs should light up, and within 10-15 seconds, you should see console output on the connected display. After about a minute, the desktop login screen should appear, featuring the traditional Jellyfish.

Please note that the Starter Kit powers up immediately when you connect the AC plug to a wall socket, as there is no ON/OFF switch on the board. If the heartbeat LED is active but there's no output on the monitor, ensure your monitor is powered on and the correct input is selected.

Once logged in, you should see the default Ubuntu 22.04 LTS GNOME 3 desktop. Open a terminal and set the date to today with the following command:

sudo date -s "YYYY-MM-DD HH:MM:SS"

Then verify internet connectivity with the command:

ping 8.8.8.8

If the packets are transmitted and received with no packet loss, your internet connectivity is working and active.

Please note that without internet connectivity, you are not able to perform the ROS-AI application steps or install the necessary tools and packages.

1.4. Install the PYNQ DPU for future AI inference applications on the KR260 Board

Become a superuser

sudo su

Clone the repository

cd ${YOUR/WORKSPACE/PATH}
git clone https://github.com/amd/Kria-RoboticsAI

Check shell scripts with dos2unix

Ensure all shell files are in Unix format by running the following commands:

sudo apt install dos2unix
cd ./KR260-Robotics-AI-Challenge/files/scripts
for file in $(find . -name "*.sh"); do
    echo ${file}
    dos2unix ${file}
done

Install PYNQ

Run the installation script as a superuser

sudo su
cd ${YOUR/WORKSPACE/PATH}/KR260-Robotics-AI-Challenge
cp files/scripts/install_update_kr260_to_vitisai35.sh ${YOUR/WORKSPACE/PATH}
cd ${YOUR/WORKSPACE/PATH}
source ./install_update_kr260_to_vitisai35.sh

This script installs the required Debian packages, creates a Python virtual environment named pynq_venv, and configures a Jupyter portal. The process takes about 30 minutes and updates packages from Vitis-AI 2.5 to Vitis-AI 3.5.

Possible Issues During Installation

Kernel Mismatch Warning: If a window appears with a warning about a xilinx-zynqmp kernel mismatch, accept the maintainer's version and continue.
Installation Errors: If the installation stops with an error, rerun the script

source ./install_update_kr260_to_vitisai35.sh

Post-Installation

Once the installation is complete, you should see all the packages listed in the Included Overlays reference document installed, including the DPU-PYNQ repository. Reboot the board as a superuser: reboot
Running Applications on the PYNQ DPU
Before running any application, set the Python virtual environment：

sudo su
source /etc/profile.d/pynq_venv.sh

To exit the virtual environment, use:

deactivate

1.5. Testing PYNQ DPU with Python or C++ VART APIs

You can run ML inference applications on the PYNQ DPU of KR260 in three ways:

Jupyter Notebook:

cd $PYNQ_JUPYTER_NOTEBOOKS
pynq get-notebooks pynq-dpu -p .

Plain Python Scrip：Run your Python scripts with the .py file extension.
C++ Modules: Compile and run your C++ modules. You can find example notebooks and scripts in the following directories

(pynq-venv) root@kria:/home/root/jupyter_notebooks# ls -l pynq-dpu/
(pynq-venv) root@kria:/home/root/jupyter_notebooks# ls -l getting_started/
(pynq-venv) root@kria:/home/root/jupyter_notebooks# ls -l pynq-helloworld/
(pynq-venv) root@kria:/home/root/jupyter_notebooks# ls -l kv260

1.6. Notes

If you see a warning message like WARNING: Logging before InitGoogleLogging(), you can ignore it.

Following these steps, you can then use PYNQ on your KR260 board for AI inference applications.

2. Vitis-AI Setup

Under the Vitis-AI development environment, we employed the Vitis-AI Quantizer and Compiler from the Vitis-AI toolkit to perform post-training quantization and compilation of the model. Subsequently, we utilized a Vitis-AI DPU with an additional GPIO port for edge inference and prediction using the YOLOv5 deep learning model.

The following steps will guide you to set up theVitis-AI development environment on your computer and train your model with CPU/GPU docker images.

Notice: The following setup and building processes are implemented on Ubuntu 20.04 with 2 NVIDIA GeForce RTX4090 devices.

Prerequisite: docker

First, clone the Vitis-AI repository to your workspace and redirect to it by the following commands.

git clone https://github.com/Xilinx/Vitis-AI
cd ./Vitis-AI

Then, Vitis-AI docker images are required. We can either pull the vitis-ai docker in theCPU version or build a GPU version docker based on your own devices.

2.1. Pull the Vitis-AI docker image in CPU version

To obtain the Vitis-AI docker image in theCPU version, we can directly pull the Vitis-AI image from the dockerhub based on the following command. If any internet connection issues occur, please refer to Appendix A for a possible solution.

sudo docker pull xilinx/vitis-ai:latest

2.2. Build the Vitis-AI docker image in GPU version

To use NVIDIA GPU in docker, we first need to install nvidia-container-toolkit with the following commands.

#Add sources for nvidia-docker
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# installnvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit

#Restart docker
sudo systemctl restart docker

After successfully installing nvidia-container-toolkit, redirect to the Vitis-AI repo, and run the docker_build.sh script under the docker directory. Here, we set the target framework as pytorch since we will use the Yolov5 model for training, which is established in the pytorch framework.

cd ${VITIS/AI/PATH}/Vitis-AI/docker
./docker_build.sh-t gpu -f pytorch

Figure 7 : Demonstration of docker images

Now we can run the docker based on the following command. You need to replace ${IMAGE/ID} with your target docker image's.

./docker_run.sh xilinx/vitis-ai:${IMAGE/ID}

If logo "Vitis-AI" pops out in your terminal with docker declaration including "GPU" as Figure 8 shows, it means Vitis-ai docker in GPU version has been successfully built. To exit the docker, type exit.

Figure 8: Vitis-AI docker image in GPU version.

After setting up the docker image, you can start model training, quantization, and compilation process inside this docker image.

AI Model Establishment

1. Yolov5 Training

1.1. COCO Dataset Establishment

In our project, we undertook a comprehensive data collection process within the school campus to establish a robust dataset for training object detection models. We focused on capturing images of nine distinct types of obstacles commonly found on campus. These obstacles included but were not limited to, indoor chairs, tables, outdoor flower beds, roadblocks, and cars.

1.2. Dataset Collection and Labeling

To ensure the dataset's quality and relevance, we employed systematic image capturing techniques to cover various scenarios and perspectives. This approach was crucial for creating a diverse and representative dataset that would enhance the model's ability to generalize across different real-world situations.

Once the image collection was complete, we used LabelMe, an efficient annotation tool, to meticulously label each obstacle in the images. Labeling involved drawing precise rectangle bounding boxes around each instance of the obstacles and assigning them appropriate class labels. This step was critical in defining the spatial extents of the objects and providing accurate ground truth data for training the object detection model.

1.3. Dataset Transformation into COCO Format

Next, we converted the labeled data into the COCO format, a widely used standard for object detection tasks. The COCO format supports extensive metadata, making it ideal for complex datasets. This conversion facilitated the integration of our dataset with our YOLOv5 model.

First, we created corresponding .txt files for each .json file of the image, and you can find the transfer script here. These files contained the metadata required for training and validation, such as labels and annotation coordinates. Subsequently, we converted the annotated data into the COCO format using Python, obtaining the coordinates of the four points and the classifications, and making it compatible with our YOLOv5 model.

Then we shuffled the whole images and randomly split them into the training, validating, and test dataset by 8:1:1. Stored them in a hierarchy format with two top directories named "images" and "labels", and three subdirectories for each: train, val, and test with corresponding images (*.jpg) and labels (*.txt).

1.4. Model Training

With the COCO-formatted dataset ready, we proceeded to utilize it for training our YOLOv5 model. YOLO is renowned for its efficiency and accuracy in real-time image segmentation and object detection tasks. We chose YOLOv5 because of its improved performance, ease of use, and ability to detect objects with high precision and speed, making it ideal for our application needs.

Simultaneously, we fine-tuned the YOLOv5s pretrained model on our specific dataset. This choice was primarily due to YOLOv5s’s advantage of having a minimal number of parameters (only 7.2 million) and low GFLOPs (16.5), along with the fastest inference speed across most tasks, all without sacrificing much accuracy [16]. These features make it highly beneficial for deploying models on edge devices.

The following steps will guide you on how to train the YOLOv5 model in the Vitis-AI development environment.

Notice: The training process is conducted on Ubuntu 20.04 using two NVIDIA GeForce RTX4090 GPUs within the pre-configured Vitis-AI GPU Docker image.

First, clone the YOLOV5 repo to your Vitis-AI Workspace and redirect to it with the following commands.

git clone https://github.com/ultralytics/yolov5 
cd yolov5

Then install the required packages.

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

Activate the environment in Vitis-AI for our model training based on Pytorch framework.

conda activate vitis-ai-pytorch

Yolov5 adjusted the activation function from ReLU to SiLU, but the SiLU function is not supported by DPU [15], so before training, we need to change the activation function to ReLU or LeakyReLU.

Then, modify the dataset and model configuration file based on your own workload directory and class separation. The example dataset configuration file is located at ${YOLOV5/PATH}/data/coco.yaml, and the model configuration file is located at ${YOLOV5/PATH}/models/yolov5s.yaml, which can be used as the baseline for modification.

For the dataset configuration file, you should modify the dataset path, the number of classes as well as the label names for each class.

For the model configuration file, you should modify the activation function to ReLU or LeakyReLU with the following command.

act: nn.ReLU()

Finally, with all configurations and modifications finished, we can train our model with train.py. We can start with a pretrained checkpoints to simplify the training process, which can be downloaded from https://github.com/ultralytics/yolov5. We use yolov5s.pt to ensure our lightweight model here.

python train.py --data ${YOLOV5/PATH}/data/${YOUR/DATASET/CONFIGURATION/YAML/FILE} --epochs 300 --weights yolov5s.pt --cfg ${YOLOV5/PATH}/models/${YOUR/MODEL/CONFIGURATION/YAML/FILE}  --batch-size 16

If you encounter GLIBCXX version mismatch issues during the training process, you can refer to Appendix B for a possible solution.

2. Yolov5 Quantization & Compilation

After completing the model training, we used the Vitis-AI Quantizer and Compiler to compress and optimize our deep learning models for deployment on edge devices. The Vitis-AI Quantizer converts 32-bit floating-point weights and activations into fixed-point numbers like INT8, reducing computational complexity while maintaining accuracy. This conversion enhances speed and power efficiency due to lower memory bandwidth requirements. The Vitis-AI Compiler further optimizes the model by mapping it to an efficient instruction set and data flow, implementing layer fusion, instruction scheduling, and maximizing on-chip memory reuse, ultimately achieving optimal performance in edge environments.

The following steps provide detailed guidance on how to quantize and compile the model by utilizing Vitis-AI Quantizer and Vitis-AI Compiler.

2.1. Model quantization

First, we need to modify the forward function inside yolo.py.

Then, we write a quantization script, quantize.py, to quantize the YOLOv5 model. The quantize.py script can be found through this link: quantize.py [14].

Inside the scripts, we first need to load the model with the trained weight file (.pt or .pth).

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = DetectMultiBackend(weights=weights)
model = model.to(device)

Then, we perform the quantization with the torch_quantizer function from the pytorch_nndct library which is provided by Vitis AI.

rand_in = torch.randn(1, 3, 640, 640)
quantizer = torch_quantizer(quant_mode, model, rand_in, output_dir=quant_model)
quantized_model = quantizer.quant_model
quantized_model = quantized_model.to(device)

Subsequently, we load the dataset using a newly defined class, CustomImageDataset, as referenced in this guidance [14].

test_dataset = CustomImageDataset(os.path.join(dataset + 'labels/train2017'), os.path.join(dataset + 'images/train2017'), 640, 640)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False)

To enable quantization, a forward pass is required to propagate input data through the network's layers and compute an output for calibration. During this forward pass, input data flows through the model to produce a prediction, allowing us to verify the quantization process by applying post-processing to obtain the final output.

quantized_model.eval()
with torch.no_grad():
    for image, target in test_loader:
        print(f'Image {target["image_id"][0][0]}')
        output = quantized_model(image.to(device))
        pred = non_max_suppression(output)
        print(pred)

Finally, we can handle the quantization result based on the two different modes of quantization: calibration (quant_mode == 'calib') and testing (quant_mode == 'test').

if quant_mode == 'calib':
    quantizer.export_quant_config()
if quant_mode == 'test':
    quantizer.export_xmodel(deploy_check=False, output_dir=quant_model)

We first calibrate quantizer with quant_mode == 'calib', and then the quantized xmodel can be generated with quant_mode == 'test'.

python quantize.py -q calib --weights ${YOUR/MODEL/WEIGHT/PATH} --dataset ${YOUR/QUANTIZATION/DATASET/PATH}
python quantize.py -q test --weights ${YOUR/MODEL/WEIGHT/PATH} --dataset ${YOUR/QUANTIZATION/DATASET/PATH}

After this step, we will obtain the quantized model under ./build/quant_model/DetectMultiBackend_int.xmodel

2.2. Model Compilation

Then, we can compile the quantized xmodel to generate a DPU-supported xmodel with additional hardware information. This can be accomplished using the vai_c_xir command provided by Vitis AI.

vai_c_xir -x ./build/quant_model/DetectMultiBackend_int.xmodel -a ${YOUR/ARCH/JSON/PATH} -o ./ -n ${OUTPUT/COMPILED/MODEL/NAME}

Then the final compiled model for DPU usage is generated with suffix .xmodel.

GPIO&DPU Overlay Establishment

To accelerate the overall process, we need to implement the DPU on KR260 board, while also enabling GPIO connections with other hardware, including motors. Luckily, Xilinx provides a PYNQ overlay, DPU-PYNQ, which can enable designers to implement DPU on PYNQ-enabled platforms fastly and conveniently.

Unfortunately, the default DPU-PYNQ overlay only implements the DPU module and cannot get access to the GPIO pins from PS. As a result, we need to rebuild the DPU block design based on our modified hardware project, following the official guidelines provided here.

First, clone down the DPU-PYNQ repository and our modified design files. Copy the related files into DPU-PYNQ directory and enter the directory.

git clone https://github.com/Xilinx/DPU-PYNQ.git
git clone https://github.com/iCAS-SJTU/AI-llume.git
cp -r ./DPU-PYNQ/boards/kr260_som ./DPU-PYNQ/boards/kr260_som_gpio
cp -f ./AI-llume/dpu/* ./DPU-PYNQ/boards/kr260_som_gpio
cd ./DPU-PYNQ/boards

Make sure Vivado, Vitis, and XRT are installed with the correct version and also be sourced.

source <vivado-install-path>/Vivado/2022.1/settings64.sh
source <vitis-install-path>/Vitis/2022.1/settings64.sh
source <xrt-install-path>/xilinx/xrt/setup.sh

Then, clone the board files before rebuilding any designs.

git clone https://github.com/Xilinx/XilinxBoardStore -b 2022.1

Now, it's time to rebuild the design!

make BOARD=kr260_som_gpio

During the process, a modified basic hardware design will be generated by Vivado following the tcl script gen_platform.tcl. In our modified hardware design, we support the access to PMOD1, which contains 8 GPIO ports in KR260 boards. The PMOD1 module location is shown in Figure 9.

Figure 9: The PMOD1 module is marked in red.

After the basic hardware design is generated, thecorresponding Vitis platform will be generated combined with the official DPU module.

When this kind of output is printed on your screen, it means that you have successfully rebuilt the DPU overlay supporting GPIO access. You can find dpu.bit, dpu.hwh, and dpu.xclbin under ./DPU-PYNQ/boards/kr260_som_gpio/.

Assembly on KR260

Figure 10: Assembly on KR260

After model training, quantization, and compilation, we deployed the resulting DPU-ready model file onto the GPIO-enabled DPU overlay on the KR260 board for inference. The assembly on KR260 can be seen in Figure 10.

The implementation starts by connecting a webcam to capture video streams, utilizing a PYNQ script. The video stream is processed using the GPIO DPU overlay and the compiled model with suffix .xmodel for inference. We periodically extract frames for image segmentation and recognition, input them into the model for inference, and then quantize them back. Post-processing is applied to obtain object locations and labels, as well as to determine the positions of obstacles in the video. This information is then processed further to generate control signals for the motors, triggering vibration alerts on the vest. This setup enables efficient object detection and responsive feedback on KR260 edge devices.

1.1. Activating the PYNQ environment

We can utilize the following command to activate the PYNQ environment for the top implementation of our project.

source /etc/profile.d/pynq_venv.sh

1.2. Run the Top Script

In the terminal of KR260, clone our repo

git clone https://github.com/iCAS-SJTU/AI-llume.git

and in the PYNQ environment, activate the top script demo.py

python ${REPO/ROOT}/pynq/demo.py

To check the details inside the top script, please refer to the following explanations

*Camera scripts

To capture the real-time image, we connect a USB web camera of Logitech C920 (as shown in Figure 11 below) to the board through the USB port on KR260 board. After the web camera is connected, we are then able to access the video stream in Python scripts through a library called imutils. A python class called VideoStram is created, and each time we need a new frame from the camera, we can directly call the VideoStram.read() function, which will directly return the frame the camera has just captured.

Figure 11: Logitech Web-cam

*Checking the compiled model network

Before utilizing the quantized model with the DPU overlay for image segmentation and recognition, we should check the compiled model's network structure to understand how to handle inputs for the quantized model and perform post-processing to obtain the desired results. Luckily, an online visualizer for neural networks called Netron can be easily used to view the network structure.

To obtain the network structure information, simply visit the Netron website and upload the compiled model file with suffix .xmodel.

First, check the input image tensor for this network from the input upload node.

Figure 12: Uplode node information for model input

Figure 12 indicates that the input image is a fixed-point number in xint8 format, with the decimal point at the sixth position, and the size is [1,640,640,3].

Then, the output shape could be checked by finding the download node, as shown in Figure 13.

Figure 13: Three detection layer output node

For our YOLOv5 model, as shown in Figure 13 There are three detection nodes with shape [1,80,80,39], [1,40,40,39], [1,20,20,39] correspondingly. Index 0 of the shape indicates a batch size of 1. The dimensions at indices 1 and 2 are determined by the three strides [8, 16, 32]. For example, the output shape at the detection head corresponding to a stride of 8 is calculated as Shape_in[0]/stride[i] = 640/8 = 80. Index 3 represents the number of classes nc，calculated as (nc+5)*3 = 39.

Besides, we know that the outputs of these three detection nodes are quantized with the decimal point at the third position. This means that during dequantization, we need to divide the outputs of each detection head by 2^3 = 8.

*Deploying the model on the DPU overlay for edge inference

Before loading the DPU overlay, we should first place dpu.bit, dpu.hwh, and dpu.xclbin under the same directory as your Python scripts.

Then we load the DPU overlay and define the input and output data types as int8 based on the previous check results of the model's network inputs.

from pynq_dpu import DpuOverlay
overlay = DpuOverlay("../overlay/dpu.bit")

dpu = overlay.runner
inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()
shapeIn = tuple(inputTensors[0].dims)
shapeOut0 = (tuple(outputTensors[0].dims))
shapeOut1 = (tuple(outputTensors[1].dims))
shapeOut2 = (tuple(outputTensors[2].dims))

outputSize0 = int(outputTensors[0].get_data_size() / shapeIn[0])
outputSize1 = int(outputTensors[1].get_data_size() / shapeIn[0])
outputSize2 = int(outputTensors[2].get_data_size() / shapeIn[0])

input_data = [np.empty(shapeIn, dtype=np.int8, order="C")]
output_data = [np.empty(shapeOut0, dtype=np.int8, order="C"), 
               np.empty(shapeOut1, dtype=np.int8, order="C"),
               np.empty(shapeOut2, dtype=np.int8, order="C")]
image = input_data[0]

Subsequently, we first use cv2.imread to read an image from the target video streams. And preprocess and reshape the image based on the preprocessing process inside yolov5 source scripts.

im = preprocess_fn(cv2.imread(${YOUR/TARGET/IMAGE/PATH})
im = im.transpose((2, 0, 1))  # HWC to CHW
im = np.ascontiguousarray(im)  # contiguous
im = np.transpose(im,(1, 2, 0)).astype(np.float32) / 255 * (2**6) # norm & quant
if len(im.shape) == 3:
    im = im[None]  # expand for batch dim
# Reshape into DPU input shape
image[0,...] = im.reshape(shapeIn[1:])

By utilizing dpu.execute_async function we could obtain the output data from the compiled model based on DPU.

job_id = dpu.execute_async(input_data, output_data) # image below is input_data[0]
dpu.wait(job_id)

Then, we perform post-processing on the model outputs to obtain the final detection box positions and corresponding class labels.

First, dequantize the outputs by dividing each detection head’s output by 2^3 = 8 based on the previous network check results.

# Quantize back
conv_out0 = np.transpose(output_data[0].astype(np.float32) / 8, (0, 3, 1, 2)).reshape(1, 3, 13, 80, 80).transpose(0, 1, 3, 4, 2)
conv_out1 = np.transpose(output_data[1].astype(np.float32) / 8, (0, 3, 1, 2)).reshape(1, 3, 13, 40, 40).transpose(0, 1, 3, 4, 2)
conv_out2 = np.transpose(output_data[2].astype(np.float32) / 8, (0, 3, 1, 2)).reshape(1, 3, 13, 20, 20).transpose(0, 1, 3, 4, 2)
pred = [conv_out0, conv_out1, conv_out2]

Then, we first make slight adjustments to the post-processing part of the YOLOv5 code to suit our specific steps and use the non-max suppression algorithm from the YOLOv5 source code to obtain the desired bounding box positions and corresponding class labels.

# Apply post-processing
pred = postprocess(pred, anchors, stride, nc)
# Apply NMS
nms_results = non_max_suppression(pred)

The NMS (non-max suppression) output includes results from the three detection nodes, each consisting of a tensor with shape (N, 6).

Each detection box is a 6-dimensional tensor, with the first four indices indicating the box location [x_1, y_1, x_2,y_2], and the 5th and 6th indices representing the confidence value and class label [conf, label], respectively.

*Post-processing for GPIO inputs

As mentioned above, the prediction results are represented by the dimensions of the predicted bounded boxes, confidence, and labels. These results should be converted to signals that can be consumed by the devices connected to KR260. In this project, once a roadblock is detected in a specific direction, the blind should be signaled by a vibrator equipped with the vest. To demonstrate the effect in a simplified way, we utilized a motor and the motor driver L298N to make signals. And the connection between these devices and KR260 is through GPIOs in KR260.

To activate the motor, we need to control L298N first, which is shown in Figure 14. Besides the power supply applied, the key is to output correct control signals to IN1, IN2, IN3, and IN4. To drive Motor A/B, IN1/IN3 and IN2/IN4 should have a voltage difference (which means logical 0 and 1). And we add direction judge logic to differentiate the output to different motors.

Figure 14: L298N motor driver ports

Appendices

Appendix A

This appendix provides a potential solution for internet connectivity issues encountered during docker pull.

To pull the docker images successfully, some open-source mirrors could be added to daemon.json with the following commands.

sudo gedit /etc/docker/daemon.json

In daemon.json:

{
    "registry-mirrors": ["https://pee6w651.mirror.aliyuncs.com/","http://hub-mirror.c.163.com", "https://registry.docker-cn.com"]
}

Reload daemon file and restart docker:

sudo systemctl daemon-reload
sudo systemctl restart docker

Check the docker status

sudo systemctl status docker.service

Appendix B

This appendix provides a potential solution for issues related to the GLIBCXX version.

There might exist some bugs for the first usage due to the the missing version of GLIBCXX_3.4.29 required by libstdc++.so.6. To solve this issue, we should obtain the new version from PPA packages.

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade

You can verify with this command to check the GLIBC version.

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

If GLIBCXX_3.4.29 appears, then the process is successfully conducted. We can train the yolov5 model now.

Since we have modified the docker, we can use the following command to store the modification. Then next time you can directly build with your own modified image.

docker ps -l

References:

[1] Nair, V. et al. (2020) ‘Assist: Evaluating the usability and performance of an indoor navigation assistant for blind and visually impaired people’, Assistive Technology, 34(3), pp. 289–299. doi:10.1080/10400435.2020.1809553.

[2] Dourado, A.M. and Pedrino, E.C. (2023) ‘Towards interactive customization of multimodal embedded navigation systems for visually impaired people’, International Journal of Human-Computer Studies, 176, p. 103046. doi:10.1016/j.ijhcs.2023.103046.

[3] R. Manduchi and J. Coughlan, “(computer) vision without sight,” Communications of the ACM, vol. 55, no. 1, pp. 96–104, Jan. 2012. doi:10.1145/2063176.2063200

[4] T. Pun, P. Roth, G. Bologna, K. Moustakas, and D. Tzovaras, “Image and video processing for visually handicapped people,” EURASIP Journal on Image and Video Processing, vol. 2007, pp. 1–12, 2007. doi:10.1155/2007/25214

[5] N. Amin and M. Borschbach, “Classification criteria for local navigation digital assistance techniques for the visually impaired,” 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Dec. 2014. doi:10.1109/icarcv.2014.7064576

[6] P. E. Ponchillia, E. C. Rak, A. L. Freeland, and S. J. LaGrow, “Accessible GPS: Reorientation and target location among users with visual impairments,” Journal of Visual Impairment & Blindness, vol. 101, no. 7, pp. 389–401, Jul. 2007. doi:10.1177/0145482x0710100702

[7] L. Ran, S. Helal, and S. Moore, “Drishti: An integrated indoor/outdoor blind navigation system and service,” Second IEEE Annual Conference on Pervasive Computing and Communications, 2004. Proceedings of the, 2004. doi:10.1109/percom.2004.1276842

[8] M. Donati, F. Iacopetti, A. Celli, R. Roncella, and L. Fanucci, “An aid system for autonomous mobility of visually impaired people on the historical city walls in Lucca, Italy,” Technological Trends in Improved Mobility of the Visually Impaired, pp. 379–411, Jul. 2019. doi:10.1007/978-3-030-16450-8_16

[9] “Microsoft Soundscape,” Microsoft Research, https://www.microsoft.com/en-us/research/product/soundscape/ (accessed Jul. 29, 2024).

[10] “Nearby explorer (full) and nearby Explorer online for Android user guide¶,” Nearby Explorer (Full) and Nearby Explorer Online for Android User Guide, https://tech.aph.org/neandroid/ (accessed Jul. 29, 2024).

[11] “Guide dog,” Wikipedia, https://en.wikipedia.org/wiki/Guide_dog (accessed Jul. 29, 2024).

[12] Setting up the SD card image, https://www.amd.com/en/products/system-on-modules/kria/k26/kr260-robotics-starter-kit/getting-started/setting-up-the-sd-card-image.html (accessed Jul. 30, 2024).

[13] https://github.com/amd/Kria-RoboticsAI?tab=readme-ov-file#robotics-ai-with-kr260

[14] https://www.hackster.io/LogicTronix/yolov5-quantization-compilation-with-vitis-ai-3-0-for-kria-7b005d#toc-quantizing-yolov5-pytorch-with-vitis-ai-3-0-5

[15] https://github.com/Xilinx/DPU-PYNQ

[16] https://github.com/ultralytics/yolov5?tab=readme-ov-file#pretrained-checkpoints

Credits

Comments

Please log in or sign up to comment.

Embed the widget on your own site

AI-llume

AI-llume

Things used in this project

Hardware components

Software apps and online services

Story

Demonstration

Introduction

Project Overview

Experimental Setup

1. KR260 Setup

1.1. Setting up the SD Card Image (Ubuntu)

1.2. Connecting Everything

1.3. Boosting your Starter Kit and Login with GNOME Desktop

1.4. Install the PYNQ DPU for future AI inference applications on the KR260 Board

1.5. Testing PYNQ DPU with Python or C++ VART APIs

1.6. Notes

2. Vitis-AI Setup

2.1. Pull the Vitis-AI docker image in CPU version

2.2. Build the Vitis-AI docker image in GPU version

AI Model Establishment

1. Yolov5 Training

1.1. COCO Dataset Establishment

1.2. Dataset Collection and Labeling

1.3. Dataset Transformation into COCO Format

1.4. Model Training

2. Yolov5 Quantization & Compilation

2.1. Model quantization

2.2. Model Compilation

GPIO&DPU Overlay Establishment

Assembly on KR260

1.1. Activating the PYNQ environment

1.2. Run the Top Script

Appendices

Appendix A

Appendix B

Code

AI-llume Github Repo

Credits

Ziye Chen

Runxi Wang

Cai Yichen

Fan Hu

Xinyi Gao

Xinfei Guo

Comments