We are a team of 4 Electronics and Telecommunication undergraduates at the University of Moratuwa, Sri Lanka. Despite having some difficulties finding hardware resources, we are enrolled in the AMD AI Robotic Challenge
. Grateful to AMD, we were able to get an AMD KRIA 260
board for our project. From here on, we will explain our process step by step.
One of our members did his research internship at a hardware acceleration lab at NTU Singapore. While he was doing his internship he learned about visual SLAM systems, their lucrative applications in the world of robotics, and the core theory behind it. And most importantly while doing the internship he had some experiences with the AMD Kria Kr260 Board and was fascinated with the power the board offers for robotics applications.
Regarding the other three members, one works on deep learning projects, another on hardware accelerations, and the other on programming. After gathering our team of four members, we shared our knowledge among us and had weekly meetings. During the competition, we also had internships, lectures, assignments, and other academic evaluations to manage. Despite these workloads, we were able to work on the competition as well.
Before continuing our story on implementing the HFNet SLAM in AMD Kria Robotics Starter Kit, let's first understand the basic concepts of the things,
Although we were engaged in this project for about three months we could not complete the project to get the expected results at the beginning. This is a description of our the things we learnt and how we solved numerous problems we faced on the way.Overview
Our idea is to implement deep learning techniques to replace some parts of existing SLAM systems to enhance performance and robustness. Running deep learning models on embedded hardware is challenging due to limited resources compared to laptop/PC performance. However, after making the necessary modifications, we are trying to implement these models on embedded hardware as well. Specifically, we have implemented HFNet
on the AMD KRIA 260 Robotics Starter Kit
.
VSLAM stands for Visual Simultaneous Localization and Mapping, which allows computers to create a 3D map of space and determine its location in it. While by itself, VSLAM is not Navigation, of course having a map and knowing your position on it is a prerequisite for navigating from point A to point B.
In our project, we primarily used a monocular camera to achieve these objectives.
HFNetHF-Net: Robust Hierarchical Localization at Large Scale
Robust and precise visual localization is crucial for numerous applications, such as autonomous driving, mobile robotics, and augmented reality. Despite its importance, it remains a challenging task, especially in large-scale environments with significant appearance changes. Current state-of-the-art methods often struggle with these scenarios and can be too resource-intensive for certain real-time applications.
The authors propose HF-Net, a hierarchical localization method utilizing a unified CNN to predict both local features and global descriptors for accurate 6-DoF localization. This method employs a coarse-to-fine localization strategy, initially retrieving global location hypotheses and then matching local features within these candidate locations. This hierarchical approach significantly reduces runtime, making the system suitable for real-time operation. By leveraging learned descriptors, HF-Net achieves remarkable localization robustness across large variations in appearance and sets a new state-of-the-art on two challenging large-scale localization benchmarks.
Problem: The model is coded using older versions of libraries.
Given Requirements: (See this documentation for the complete list)
- tensorflow-gpu==1.12
- torch==0.4.1
Since these versions are no longer in use and not supported by other libraries, running this code is quite challenging.
Solution: By using a Docker image, we are able to run HFNet on a PC.
Docker is an open-source containerization platform that packages applications and their dependencies into standardized containers. These containers are lightweight, portable, and isolated, simplifying the building, running, managing, and distributing of applications. Docker ensures consistency across different environments and accelerates application development. For more information, visit the official Docker website.
There are several Docker images available for Xilinx Vitis AI. For our case, we used the tensorflow:1.15-cpu
image, which meets our requirements. We followed the instructions provided for this installation.
After cloning the HFNet repository, there are some required assets to run the HFNet model. All the instructions are given in the repository. By following those and using the Vitis AI docker, we are able to run the HFNet model on a PC.
HFNet-SLAMAn accurate and real-time monocular SLAM system with deep features
HFNet-SLAM integrates and extends the well-known ORB-SLAM3 framework with a unified CNN model known as HF-Net. By utilizing HF-Net’s image features to completely replace the hand-crafted ORB features and the Bag of Words (BoW) method in ORB-SLAM3, this innovative approach enhances tracking and loop closure performance, thereby improving the overall accuracy of HFNet-SLAM.
Ok, since we got an Initial Idea about the HFNet and HFNet SLAM let's continue how we approached our implementation, the problems we faced, and how we solved them.
Getting Started with the BoardAlthough One of our members has used the AMD Kria Starter kit before he also didn't have any experience in the workflow of deploying DNNs in the FPGA part and we had to learn everything from the beginning of the project. As the first step, we followed the Getting started with the Kria Kr260 guide.
Booting Ubuntu on the BoardWe followed the exact procedure in the Getting Started with the Kria Kr260 guide to boot Ubuntu on the board.
Setting Up VNC Server on Kria KR260Problem:-Most of the team members weren't used to working with a command-line interface, and we didn't have a monitor that supports DisplayPort.
Solutions:-
1. We bought a DisplayPort adapter, which didn't seem to work. We had to buy two to confirm.
2. We enabled a VNC server on the board for remote desktop connections.
Visit this repository to see our process, provided step by step.
Accessing the Board Using WebProblem:- One of our members had to work remotely, so we had to find a way to give him access to the board.
Solutions:-
1. Accessing over SSH connection through port forwarding on the LAN was not possible because the public IP address of the LAN was not visible. We need to contact the ISP to enable that.
2. Remote server connection
Installation of ROS2This is a straightforward procedure.
- Set up your ROS2 environment:
source /opt/ros/humble/setup.bash
- Launch the turtle simulator:
ros2 run turtlesim turtlesim_node
- Control the turtle:
ros2 run turtlesim turtlesim_node
Setup Instructions
Create ROS 2 Workspace:
mkdir -p ~/ros2_ws/src
cd ~/ros2_ws
source /opt/ros/humble/setup.bash
Create Face Recognition Package:
cd ~/ros2_ws/src
ros2 pkg create --build-type ament_cmake face_recognition_pkg
Implement Face Recognition Node:
- Create
src
directory and write the face recognition node:
cd ~/ros2_ws/src/face_recognition_pkg
mkdir src
cd src
code face_recognition_node.cpp
- Paste the face recognition node code into
face_recognition_node.cpp.
- Download Haar Cascade XML file:
mkdir -p ~/ros2_ws/src/face_recognition_pkg/resources
cd ~/ros2_ws/src/face_recognition_pkg/resources
wget https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml
Update CMakeLists.txt
and package.xml
:
cd ~/ros2_ws/src/face_recognition_pkg
code CMakeLists.txt
code package.xml
Update the file from the repository.
Build the ROS 2 Workspace:
cd ~/ros2_ws
source /opt/ros/humble/setup.bash
colcon build --packages-select face_recognition_pkg
Run the Face Recognition Node:
source ~/ros2_ws/install/setup.bash
ros2 run face_recognition_pkg face_recognition_node
See the output:
ros2 run image_view image_view --ros-args -r image:= /webcam/face_recognition
Our idea was to connect a camera to the board and use the SLAM process with it.
Problems:-
1. Connecting the camera straight to the code was not scalable because people would have different types of cameras for their applications.
2. Need to normalize the image for accuracy
Solution:-
We took a ROS wrapper initially used for ORB-SLAM 3 and updated it to work with HF-Net SLAM. The modified SLAM system now processes normalized images (compensated for the camera's intrinsic parameters) for the SLAM process. We have provided the code for both the ROS wrapper and the SLAM system. Additionally, we have included code to run a ROS 2 node for receiving video input over LAN and will also provide code for a node to receive video from a USB webcam.
Visit this repository to see our process, provided step by step.
Running ORBSLAM3 on PC with and without ROS2Visit this repository to see our process, provided step by step.
Deploying A Model In the DPUA Data Processing Unit (DPU)
is a programmable processor crucial for data centers. It includes a high-performance multi-core CPU (often Arm-based), a high-performance network interface for efficient data transfer, and flexible, programmable acceleration engines to enhance tasks like AI, machine learning, security, telecommunications, and storage.
We followed several tutorials to get familiar with DPU implementations
Compiling the HFNet Model to Run on Embedded HardwareCompiling a model for embedded hardware is essential for several reasons. Performance optimization ensures efficient use of limited resources like CPU, memory, and power, while leveraging hardware accelerators to enhance execution speed. Compatibility is achieved by converting models to specific formats required by different platforms (e.g., TensorRT for NVIDIA, Xilinx-specific formats for Kria boards), preventing runtime errors. Optimized models also improve power efficiency, extending battery life and generating less heat, which reduces cooling needs and enhances device longevity. Deployment is simplified as compiling handles pre-processing steps, making the process less complex and reducing potential compatibility issues.
AMD Vitis™ AI is a development environment designed to accelerate AI inference on AMD platforms. It offers optimized IP, tools, libraries, and models, along with resources like example designs and tutorials. The toolchain is engineered for high efficiency and ease of use, maximizing AI acceleration on AMD Adaptable SoCs and Alveo Data Center accelerator cards. For more information, visit the Vitis AI webpage. In the AMD Vitis AI documentation, there are two releases: 3.0 and 3.5. We have used version 3.5.
The Vitis™ AI Model Zoo, part of the Vitis AI repository, offers optimized deep learning models to accelerate inference on AMD platforms. It includes models for various applications such as ADAS/AD, medical, video surveillance, robotics, and data centers. You can use these free pre-trained models to benefit from deep learning acceleration. For more details, visit the Vitis AI webpage. From the Vitis™ AI Model Zoo, we can download the quantized tf_HFNet_3.5
model which is which is 0% pruned, to our local machine.
The next step is to compile this model. You can find all the instructions here.
We used the following command for the compilation:
vai_c_tensorflow -f /PATH/TO/quantize_eval_model.pb -a /PATH/TO/arch.json -o /OUTPUTPATH -n netname
Now we have the hfnet_model.xmodel
.
We have implemented local features extraction (see the diagram below) to run on the DPU.
Problem:-
We couldn't load the model into DPU using PYNQ using Jupyter Notebook.
AssertionError Traceback (most recent call last)
Input In [5], in <cell line: 2>()
1 # Load the HFNet model
----> 2 overlay.load_model("/root/jupyter_notebooks/pynq-dpu/Hfnet/Hfnet_model.xmodel")
File /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq_dpu/dpu.py:178, in DpuOverlay.load_model(self, model)
176 self.graph = xir.Graph.deserialize(abs_model)
177 subgraphs = get_child_subgraph_dpu(self.graph)
--> 178 assert len(subgraphs) == 1
179 self.runner = vart.Runner.create_runner(subgraphs[0], "run")
AssertionError:
reason for the problem:- The original HFNet model has some unsupported layers in the middle of the inferencing process and the graph is broken down into 3 subgraphs which can be implemented in the DPU as separate runners. The middle layers should be processed in the CPU.
At that time we didn't know much about the reason for this and the main solution we could think of was, since the HFNet is a supported model by the VITIS AI, using an already available tool to do inferencing with HFNet.
There were two possible options we had,
Vitis AI LIbrary
We could find an example implementation of HFNet in Vitis AI repository.
Problem: When Trying to build the example building process faced with linking errors.(specifically OpenCV library versions) We tried changing the makefile and putting soft links in shared library locations but we could not resolve the issue. As another solution, we tried to cross-compile the Vitis AI Library in the host machine using the given resources but when we tried to run the compiled executable on the board, we again faced with version mismatch in libraries like opencv. (It seems like the cross-compiling environment created in the host machine and the actual environment built in the board after we followed the tutorial are different)
Possible Solutions:-
1. Use Prebuilt Image which can be downloaded from Vitis AI Library. (This Method was not feasible for us case because then we didn't have the support to install ROS and other tools in the board. if we followed this method.)
2. Copy the /usr folder into the host machine and then try to resolve errors and build the code base. (This was the method we followed for building the SLAM system but we still faced issues when using the Makefile to build HFNet code.)
VMSS 2.0
VMSS is a software application that functions as a
V
ideo
M
achine-Learning
S
treaming
S
erver. The latest version, VMSS 2.0, offers several features to create flexible ML pipelines.
VMSS 2.0 has HFnet implementation supported and uses its graph-based method to manipulate the data flow of the system. At that time we were facing an issue with OS dying when trying to infer with the DPU(As mentioned later VMSS is not the problem here.) so we were not able to follow this method.
Since a SLAM system requires computational efficiency to increase the performance we felt running another tool as a workaround would not be a good option.
But we think this is also a good option if you just want to do inferencing with an already supported model in Vitis AI model zoo. And since this is running in a docker container you won't face any issues with environment configuration.(but this will increase computational overhead.)
There are VMSS webinars held for competitors.
OS Dying Issue
While we were trying to solve the HFNet having multiple subgraphs, we faced this issue, When trying to run the inferencing process with DPU the board would suddenly shut down(heartbeat LED will also turn off.) This happened for the example codes also.
We first thought this was an issue with the board hardware but we ruled it out because example codes worked fine when we installed a fresh Ubuntu image. We couldn't resolve this problem until the last week of the challenge. (Every time this happens we had to start the process from the beginning)
Possible cause:- We always had this issue after trying to run the code with Jupyter Notebooks. Our potential suspicion is with DPU overlay and runner, not being properly deleted after inferencing and the Jupyter kernel storing those objects even after reboot. shutting down Jupyter kernel manually proved to be a solution to this problem.
Solution for HFNet Subgraph issue
After we could not use any off-the-shelf method to straightly do inferencing with HFNet, we started to do more research about the subgraph issue and found out that we had to implement the unsupported layers in the CPU.
how to see the graph.
use this web app:- netron.app
you can upload the compiled model into this app and see how the graph is formed in it. You can see the device as CPU for DPU unsupported graphs and DPU for DPU supported graphs. As of our knowledge, you need to write your own code to process data in DPU unsupported layers.(you can use HLS to process these unsupported layers in the fpga as well. refer)
We used the modified code already used in the VITIS AI library implementation to process data in unsupported layers in the DPU.
1. we converted the code to a cmake project (with this we could resolve and debug most of the linking issues)
2. Installed Hfnet inferring code as a shared library.
HFNet SLAM implementationWe forked HFNet SLAM code implementation and changed the code to infer with Vitis AI. We have done comprehensive research on how the HFNet SLAM is built and modified the code to do inference with DPU.
Implementing ROS 2 Humble Wrappers for HFNET SLAMWe forked an ORBSLAM 3 ROS 2 Humble Wrapper and with minor modifications, we could update the wrapper to work for HFNet SLAM. We first tested this with HFNet SLAM implementation in our host machine and then deployed it on the board.
Cross-Compiling for AArch64Problem:-The slam system building process is a resource-intensive process so the system will crash after running out of memory in the board.
Solutions:-
1. Cross-compiling the SLAM system within the host machine and copying the files to the board. There are a few ways to cross-compile
- Downloading CMAKE toolchain for AARCH64 architecture
- Using the VAI kr260 guide to do cross-compiling
- Using a tool like QEmu in the host machine to run programs meant for the board (can't use DPU)
- Using Docker Images. (Not sure how to use DPU with this method)
The method we followed:
if you follow the tuto
How to Recreate the Project1. Follow the tutorial until the end. make sure that you can do inferencing with DPU by running tutorials.
2. Clone the HFNet code repo and install the HFNet inference code as a shared library.
HFNet Slam code is given here.
git clone ...
cd ..
mkdir build
cd build
cmake ..
make
make install
3. From the host machine's command line interface rsync
your board's /usr library to a folder in the host machine.
cd path/to/folder
rsync -r ubuntu@<ip-address>:/usr .
4. Clone updated HFnet SLAM repo and follow the process mentioned on the README.md.
5. Clone the HFNet SLAM wrapper to the board and install it.
6. Run HFnet SLAM wrapper and IPcamera node to run the slam system.
References- Vitis library API
- VMSS
- Sarlin, Paul-Edouard, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. "From coarse to fine: Robust hierarchical localization at large scale." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12716-12725. 2019.
- Liu, Liming, and Jonathan M. Aitken. "HFNet-SLAM: An Accurate and Real-Time Monocular SLAM System with Deep Features."Sensors 23, no. 4 (2023): 2113.
- Campos, Carlos, Richard Elvira, Juan J. Gómez Rodríguez, José MM Montiel, and Juan D. Tardós. "Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam." IEEE Transactions on Robotics 37, no. 6 (2021): 1874-1890.
Comments