Outline
GitHub Repository
System Overview
Algorithm (Sensor Data Acquisition
Algorithm (SLAM
Performance
How to Build
How to Operate
Stereo calibration
Limitations
Future Plans
License

Published April 29, 2023 © MIT

Visual SLAM on Ultra96-V2

An implementation of stereo-vision-based SLAM on Ultra96-V2 with FPGA acceleration for 10 FPS real-time operation.

AdvancedWork in progress12 hours2,150

Things used in this project

Hardware components

Tria Technologies Ultra96-V2

U96-SVM

Button G Click

Software apps and online services

AMD Vitis Unified Software Platform

AMD Vivado Design Suite

AMD PetaLinux

Microsoft Visual Studio 2015

Story

Outline

SLAM (Simultaneous Localization and Mapping) attracts attention in various applications such as self-driving, AGVs, and drones. Though excellent SLAM projects are already available, they are high-performance but complicated at the same time and depend on many external libraries that prevent them from porting to simpler platforms such as embedded systems.

This project puts more weight on concise algorithms and less dependency. Libraries with non-permissive licenses are also removed. On the other hand, FPGA acceleration is utilized to achieve practical processing speed.

Features include

10 FPS real-time operation,
loop-closure detection,
3D occupancy grid map generation,
and real-time monitoring via a USB 3.0 connection.

All design files for both software and hardware are available under a permissive open-source license.

GitHub Repository

All design files are contained in the following GitHub repository.

U96-SLAM (GitHub)

bin --- Pre-built binary files
doc--- Relevant documents
src --- Source files
vivado --- Working directory for Vivado

System Overview

The system-level block diagram is shown below.

Sensor Board

A sensor board (U96-SVM) is attached to Ultra96-V2 to capture stereo images. This board is published here under an open-source hardware license. This board contains dual CMOS image sensors and two mikroBUS sites.

A module with a single push switch with a built-in LED (Button G Click) is mounted on one of the two mikroBUS sites to control the system in stand-alone mode. IMU module is also mounted but not used in this project.

The image format is 640x480 pixels with 30 FPS. The frame rate is then reduced for a desired rate inside FPGA.

■ Migration to other sensor boards

This sensor board is designed to be compliant with 96Boards specifications. Migration to other sensor boards should be possible if they are also compliant with these specifications. The image sensors are configured for 640x480 resolution and 30 FPS. A push switch and a LED are connected to FPGA.

FPGA

Image sensors are connected to FPGA (or Programmable Logic, PL). Image processing pipelines for stereo vision such as stereo rectification and block matching are implemented in FPGA. FPGA is also used as hardware acceleration for some functions.

RemoteApplication

A bare-metal application runs on one of the two R5 processors to control FPGA. This application is also called the "remote application" in this article. This application works cooperatively with the Linux application. This application also controls USB 3.0 connection so that If connected to Windows PC, this system works as if this is a USB web camera with some stereo-vision capabilities.

Linux Application

Petalinux system is built on four A53 processors. An application to handle SLAM-related operations runs on this system. This application is called "Linux application" in this article.

Petalinux system runs in SMP (Symmetric Multiprocessing) mode. This means workloads are distributed to each processor by the Linux system.

Inter-Processor Communication (IPC)

These two applications communicate with each other through memory-mapped registers implemented in FPGA. These registers consist of "Message" registers and "Parameter" registers. A processor writes a specific message ID in the "Message" register to notify the other. The other CPU polls the "Message" register and responds appropriately. Four 32-bit parameters can be sent if necessary.

Debug PC

Debug PC is used to monitor the status of the board. Other than the usual debug functions through UART, stereo images in the video processing pipeline can be viewed in real-time through a USB 3.0 connection. This system is viewed as UVC (USB Video Class) device when attached to Windows PC so special device drivers are unnecessary.

Memory Map

Ultra96-V2 has 2GB of physical memory. The first 3 / 4 of the area is used by the Linux system. The other is reserved for the remote application.

Development Environment

The main development is performed on Windows PC but a Linux environment is needed for Petalinux development. So VirtualBox is used to virtually build a Linux environment on Windows 10.

There are two Vitis installations in this figure. The one on Windows is used for remote application development the other is used for the Linux application.

Development Phase

The development of an embedded system is troublesome so it is divided into three phases.

Phase 1 is a pure software solution working on Windows PC. This phase is most efficient for software development. The performance of the algorithm was also verified during this stage.

In phase 2, the software is ported to the Petalinux system running on the Ultra96-V2 board. During this phase of transition, care is taken for the software source code to be the same. An onboard SD card is used for storing the data.

In the last phase, some of the functions are replaced by FPGA circuitry and bare-metal application that controls FPGA. Hardware acceleration is also applied for some functions to further reduce the processing time.

Algorithm (Sensor Data Acquisition)

Stereo Rectification

The stereo rectification process transforms the left and right images on the same plane and makes them horizontally aligned. The stereo rectification is performed on-the-fly inside FPGA, followed by bilinear interpolation before storing to DDR memory.

To make FPGA circuitry simpler, supportive information is generated by the software beforehand. This information consists of the location and the length of data to be processed and is sorted in the order of the time of arrival.

Stereo calibration parameters are obtained by an OpenCV function that implements Bouguet's algorithm with a 7x5 chessboard pattern. These parameters are saved in XML files and stored on an SD card.

The current implementation ignores lens distortion because the image sensors in use have almost no distortion, and trying to undistort them resulted in yielding more distortion in the image.

X-Sobel Filter

X-Sobel filter is applied as a pre-process for block matching. The results are stored in DDR memory as they are repeatedly used in the following block matching.

Block Matching

Block matching searches for visual correspondences between the stereo image pair. After the stereo rectification a location in the left image appears left side on the same row in the right image. These differences for every pixel in the source image form the dense depth map.

Block matching is implemented in FPGA by porting OpenCV's StereoBM function. The amount of computation needed for block matching is very large but that can be reduced by the "sliding window" technique as implemented in OpenCV. To further reduce the processing time, FPGA computes 32 disparities in parallel.

GFTT Detector

GFTT (Good Features To Track) is used to detect key points. Key points are distinctive parts in the image that usually contains a corner.

The algorithm is the same as OpenCV's goodFeaturesToTrack function but parts of the function are implemented in FPGA to reduce the software processing time.

This function consists of the following steps.

1. Apply XY-Sobel filter to extract edges

2. Calculate the eigenvalue to quantify the edginess of corners

3. Apply thresholding to eigenvalues and select good key points

Steps 1 and 2 are applied for every pixel in the image and have a large computation load, so they are implemented in FPGA.

ORB DescriptorGenerator

ORB (Oriented FAST and Rotated BRIEF) feature descriptor is used to quantify the visual uniqueness of the detected key points. Key points of the same object have similar descriptors so that we can search for key points of the same object from the different image frames even if the scale and the angle are slightly different.

The actual computation is performed by an OpenCV function. Each ORB descriptor is a 256-bits binary string.

Algorithm (SLAM)

The SLAM algorithm is built following the F2F algorithm implemented in RTAB-Map.

Coordinates

There are two coordinate systems involved in this project. They are the image coordinate and the world (or robot) coordinate.

These two are both right-handed, so a simple rotation matrix R can convert between them. Source images are captured in the image coordinate. Computed camera poses are then converted to the world coordinate.

Visual Odometry

Visual odometry computes transitions of camera poses during consecutive image frames.

The algorithm consists of the following stages.

1. Key-frame Selection

The actual visual odometry is computed between the keyframe and the new image frame. The reason to use keyframes is to reduce the odometry errors that accumulate every frame especially when the camera is near stationed. The key frame will be updated when the number of matched key points is below the threshold.

2. Keypoint Matching

Keypoints are matched between the two image frames. Similarities are calculated by comparing ORB descriptors of the key points. Previous camera poses if available are used to narrow the search range. Outputs of this stage are IDs of matched key points and their 2D/3D locations.

3. Motion Estimation

Solve PnP problems to compute the rotation and the translation of the camera which minimizes the error between

3D locations of the key points in the reference frame projected onto the current image plane and
2D locations of the key points in the current frame.

The output is the relative motion of the camera. It is computed in the image coordinate and then converted to the world coordinate.

In this project, the camera poses are described using the graph. Estimated camera poses and motions are added to the graph as nodes and links respectively.

Visual Word Dictionary

The visual word dictionary contains visual words which actually are ORB descriptors assigned with unique IDs. Every time a new image frame arrives, ORB descriptors contained in that frame are matched with existing visual words. If it matches with an existing word a reference counter of that word is incremented. If not, the descriptor is assigned a new ID and becomes a new visual word.

The number of visual words increases with time. Matching with all existing visual words is actually the most time-consuming process in this application. To make the software run in real-time, this computation is processed in a separate thread. This works well because loop-closure detection is not necessarily run in every frame.

Loop-Closure Detection

Loop-closure detection is to recognize the previously visited scene and add another link to that node.

Adding a loop-closure link to the graph can reduce graph errors in two ways.

1. The graph will be reconstructed when a loop-closure link is added. During this process, the shortest path from the start node to the end node is connected. This will eliminate odometry errors accumulated during the loop.

2. A loop-closure link will add an extra constraint to the graph. By minimizing errors caused by such constraints, the accuracy of the estimated poses will be improved.

Loop-closure detection operates once in every 5 frames by default. The latest 30 frames are also ignored. These decimations are to remove adjacent nodes from being accepted as a closed loop as they have almost no contribution to graph optimization.

Every time a new image frame arrives, TF-IDF (Term Frequency–Inverse Document Frequency) scores are calculated against other image frames by consulting the visual word dictionary. This score will be higher when more words are contained in common, and rarer that words are.

An image frame with the highest TF-IDF score is selected as a candidate for the closed loop. Then, the motion estimation similar to the one in the visual odometry is performed between the two image frames. When the reprojection error is below the threshold, that link is accepted as a loop-closure link and added to the graph.

Graph Optimization

When loop-closure links add extra constraints to the graph, a discrepancy arises causing an error in the graph. By minimizing such errors the accuracy of the graph can be improved.

In this project, only camera poses are estimated. This is called "pose adjustment" as opposed to "bundle adjustment" which estimates both camera poses and 3D coordinates of observed points.

This kind of problem can be solved by minimizing a cost function F(x) of this form.

Here eij is defined as an error vector between poses xi and xj, with zij being a constraint between them. Ω is an information matrix that is obtained by the reciprocal of the covariance of the reprojection error.

First order approximation of F(x) is given as follows by Taylor series expansion around the initial value of x.

Jij is Jacobian matrices regarding xi and xj.

F(x) is minimized by x which is obtained by solving the following equation, with λ being a dumping factor.

Add Δx to the initial value to get closer to the optimal solution.

Jacobian computation follows an implementation of g2o. Original camera poses include rotation matrices and they can't be put into the equation directly because they are an over-parameterized representation. In this project, the axis of a normalized quaternion is used as a minimal representation. This also follows g2o's implementation so that we can use the same computation for the Jacobian matrices. Therefore camera poses will be vectors of 6 elements.

Assume we have N nodes, then the size of H is 6N by 6N and vector b is 6N. Matrix H can be very large but its elements are mostly zeros because H is non-zero only where constraints between the corresponding nodes exist. Therefore, it is advantageous to treat H as a sparse matrix. Building a sparse matrix and solving the equation is performed by Eigen with SimplicialLDLT as a sparse linear solver.

Occupancy Grid Map

The 3D occupancy grid map is generated from the optimized pose graph and the dense depth map. The actual generation of the map is performed by Octomap. The result is in "binary tree (.bt)" format and stored on the SD card.

Performance

The KITTI dataset is used as a reference while verifying the algorithm.

Accuracy

The below figure shows the bird's-eye view of the trajectory when simulating with KITTI dataset sequence 00.

In this simulation, closed loops were detected largely in 5 areas and the translation and the rotation errors were 0.91% and 0.0038 deg/m respectively.

Note that this result only shows the performance of the algorithm because the images used in this simulation were captured by different image sensors.

3D Occupancy Grid Map

The below figure is a 3D occupancy grid map generated in the above simulation and displayed by octovis. The resolution of the dense depth maps is reduced by 1 / 4 in both directions and then projected by the estimated camera poses before building a voxel map by Octomap.

Processing Time

The below figure shows the processing time for the main thread of the application and FPGA when the image sensor inputs are processed. We have some slack against the 100 msec time slot of 10 FPS operation.

The visual word dictionary update and the loop-closure detection runs in the sub-thread of the application. The processing time increases with the number of visual words as below.

The time slot is 500 msec as they run once in 5 frames. When the processing time exceeds this time slot then the next execution will be postponed until the previous thread is complete so that the real-time operation of the visual odometry will not be disturbed.

Memory Consumption

The below figure shows memory consumption when KITTI dataset sequence 00 is processed on Windows (only shows the first 1700 frames). Memory consumption increases over time and the majority of them are dense depth maps and visual words. When the application runs on Ultra96-V2 this memory occupies the Linux-controlled memory space and limits the continuous operating time.

FPGA Utilization

The below table shows the FPGA resource utilization. The target device is XCZU3EG-SBVA484-1-I.

How to Build

Prerequisites

Xilinx Tools 2020.2 must be installed on both platforms.
Petalinux 2020.2 must be installed on Ubuntu.
It is assumed that Xilinx Tools are installed to [XILINX_DIR] on Ubuntu.
It is assumed that the necessary files in the git repository are copied to both platforms.
Download Eigen 3.4.0 and place it under "slam/include" directory as the below directory structure.

slam
 └─include
       └─Eigen

Hardware

The extension interface of the sensor board is an open-drain circuit and can not control switches and LEDs directly. So the below modification was needed on Button G click board.

Build FPGA Project (on Windows)

* Board files for ultra96v2 must be installed.

* FPGA projects are necessary only when you modify the FPGA design. Resultant files are already included in the git repository.

■ "dvp" project

Launch Vivado, and type the following commands in "Tcl Console".

cd [WORK_DIR]/U96-SLAM/vivado
source create_dvp.tcl

⇒ A project named "dvp" will be created.

Click "Tools→Create and Package New IP..." to open the "Create and Package New IP" dialog. Proceed with the following settings.

[Create Peripheral, Package IP or Package a Block Design]
  Packaging Options: Package your current project
[Package Your Current Project]
  IP location: [WORK_DIR]/U96-SLAM/src/ip_repo/dvp

When prompted choose "Yes", then click "Finish".

"Package IP - dvp" will appear.

Select "Review and Package", and click "Package IP".

⇒ IP sources for "dvp" will be exported to "ip_repo/dvp" directory.

Close "dvp" project.

■ "fpga_top" project

Launch Vivado, and type the following commands in "Tcl Console".

cd [WORK_DIR]/U96-SLAM/vivado
source create_fpga_top.tcl

⇒ A project named "fpga_top" will be created.

Double-click "Design Sources→design_1_wrapper→design_1_i" in the "Sources" pane to open the "design_1.bd" block diagram.

If "/dvp_0 block in this design should be upgraded." is displayed at the top of the window, click "Report IP Status", then click "Upgrade Selected". This happens only when you have modified the "dvp" module.

Click "Generate Bitstream" in the Flow Navigator.

⇒ "design_1.bit" will be created in "fpga_top.runs/impl_1/" directory.

Click "File→Export→Export Hardware" to open the "Export Hardware Platform" dialog. Proceed with the following settings.

[Output]
  Include bitstream: Selected
[Files]
  XSA file name: design_1_wrapper
  Export to: [WORK_DIR]/U96-SLAM/vivado/fpga_top

⇒ "design_1_wrapper.xsa" will be created in the specified directory.

Build Bare-metal Application (on Windows)

* This project is necessary only when you modify the bare-metal application. "StereoBM.elf" is already included in the git repository.

Create a directory named "vitis" under "[WORK_DIR]/U96-SLAM". Launch Vitis, set this directory as Vitis Workspace, and click "Launch".

[WORK_DIR]/U96-SLAM/vitis

Click "Create Application Project" to open the “New Application Project” dialog. Proceed with the following settings. Notice to choose the R5 processor.

[Platform]
  Create a new platform from hardware (XSA)
  XSA File: [WORK_DIR]\U96-SLAM\vivado\fpga_top\design_1_wrapper.xsa
  Target processor to create FSBL: psu_cortexr5_0
[Application Project Details]
  Application project name: StereoBM
  Target processor: psu_cortexr5_0
[Domain]
  Remain as default
[Templates]
  SW development templates: Empty Application

Click "Finish".

⇒ "StereoBM_system" project will be created.

In "Application Project Settings", click "Navigate to BSP Settings".

Click “Modify BSP Settings...” in "Board Support Package". The "Board Support Package Settings" dialog will open.

Click "Overview→standalone" and make the following modification.

stdin: psu_uart_1
stdout: psu_uart_1

This is necessary because uart_1 is used as stdin/out in the Ultra96v2 board.

Right-click "StereoBM_system→StereoBM→src" in the "Explorer" and click "Import Sources..." from the menu to open the "Import Sources" dialog. Proceed with the following settings.

From directory: [WORK_DIR]/U96-SLAM/src/StereoBM/src
Select All: click

Click "Finish".

Select "StereoBM" in Explorer pane, and choose "Release" build by clicking the Arrow icon next to the Hammer icon.

⇒ "StereoBM.elf" will be generated in the "Release" directory.

* Building "StereoBM_system" instead of "StereoBM" will also generate ROM boot files.

Build Petalinux System (on Ubuntu)

■ Configure the System

Source the Petalinux environment.

source [XILINX_DIR]/petaLinux-2020.2/bin/settings.sh

Create "petalinux" project by typing the following commands.

cd [WORK_DIR]/U96-SLAM
petalinux-create --type project --template zynqMP --name petalinux
cd petalinux/

⇒ "petalinux" directory will be created under [WORK_DIR]/U96-SLAM/".

Copy "design_1_wrapper.xsa" to "U96-SLAM/vivado" directory. If you haven't changed the FPGA design, the pre-built file is provided in "U96-SLAM/bin" directory in the git repository.

Configure the Petalinux system as follows.

* The following "petalinux-xxxx" commands must be issued inside "petalinux" directory.

petalinux-config --get-hw-description ../vivado

The "misc/config System Configuration" dialog will open. Proceed with the following settings, then "Exit".

Subsystem AUTO Hardware Settings → Serial Settings →
  PMUFW Serial stdin/stdout : psu_uart_1
  FSBL Serial stdin/stdout: psu_uart_1
  ATF Serial stdin/stdout: psu_uart_1
  DTG Serial stdin/stdout: psu_uart_1
DTG Settings → MACHINE_NAME: avnet-ultra96-rev1
Image Packaging Configuration → Root filesystem type: EXT4 (SD/eMMC/SATA/USB)

* The following command may take a long time for the first run.

petalinux-config -c kernel

The "Linux/arm64 5.4.0 Kernel Configuration" dialog will open. Proceed with the following settings, then "Exit".

Enable loadable module support [*] (default)
Networking support → Bluetooth subsystem support < >
Device Drivers → Remoteproc drivers →
  Support for Remote Processor subsystem [*] (default)
  ZynqMP_r5 remoteproc support <M> (default)

Lastly, type the following config command.

petalinux-config -c rootfs

The "Configuration" dialog will open. Proceed with the following settings, then "Exit".

Filesystem Packages →
  libs → libmetal → libmetal [*]
  misc → gdb [*] (for debug purpose)
       → sysfsutils → libsysfs [*]
Petalinux Package Groups →
  packagegroup-petalinux-openamp → packagegroup-petalinux-openamp [*]
  packagegroup-petalinux-opencv → packagegroup-petalinux-opencv [*]
Image Features → auto-login [*]

Open "system-user.dtsi" in "[WORK_DIR]/U96-SLAM/petalinux/project-spec/meta-user/recipes-bsp/device-tree/files/" and copy & paste the following text.

/include/ "system-conf.dtsi"
/ {
reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;
        rproc_0_dma: rproc@0x6ed00000 {
            no-map;
            compatible = "shared-dma-pool";
            reg = <0x0 0x6ed00000 0x0 0x00100000>;
        };
        rproc_0_reserved: rproc@0x5ed00000 {
            no-map;
            reg = <0x0 0x5ed00000 0x0 0x10000000>;
        };
    };
    zynqmp-rpu {
        compatible = "xlnx,zynqmp-r5-remoteproc-1.0";
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;
        core_conf = "split";
        r5_0: r5@0 {
            #address-cells = <2>;
            #size-cells = <2>;
            ranges;
            memory-region = <&rproc_0_reserved>, <&rproc_0_dma>;
            pnode-id = <0x7>;
            mboxes = <&ipi_mailbox_rpu0 0>, <&ipi_mailbox_rpu0 1>;
            mbox-names = "tx", "rx";
            tcm_0_a: tcm_0@0 {
                reg = <0x0 0xFFE00000 0x0 0x10000>;
                pnode-id = <0xf>;
            };
            tcm_0_b: tcm_0@1 {
                reg = <0x0 0xFFE20000 0x0 0x10000>;
                pnode-id = <0x10>;
            };
        };
    };
    zynqmp_ipi1 {
        compatible = "xlnx,zynqmp-ipi-mailbox";
        interrupt-parent = <&gic>;
        interrupts = <0 29 4>;
        xlnx,ipi-id = <7>;
        #address-cells = <1>;
        #size-cells = <1>;
        ranges;
        /* APU<->RPU0 IPI mailbox controller */
        ipi_mailbox_rpu0: mailbox@ff90000 {
            reg = <0xff990600 0x20>,
                  <0xff990620 0x20>,
                  <0xff9900c0 0x20>,
                  <0xff9900e0 0x20>;
            reg-names = "local_request_region",
                        "local_response_region",
                        "remote_request_region",
                        "remote_response_region";
            #mbox-cells = <1>;
            xlnx,ipi-id = <1>;
        };
    };
    chosen {
        bootargs = "console=ttyPS0,115200 root=/dev/mmcblk0p2 rw earlyprintk rootfstype=ext4 rootwait uio_pdrv_genirq.of_id=generic-uio devtmpfs.mount=1 earlycon";
    };
};

&dvp_0 {
  compatible = "generic-uio";
};

This file declares the use of a remote processor and reserves its memory space. This file also sets "dvp" module inside FPGA as a "generic-uio" device so that we can access it with the built-in "generic-uio" device driver.

■ Auto-Run Application

The following procedures will make our application run automatically on system boot.

petalinux-create -t apps --template install -n myinit --enable

⇒ "myinit" directory will be created under "project-spec/meta-user/recipes-apps".

Copy and paste the following text to "myinit.bb" and "files/myinit".

【myinit.bb】

#
# This file is the myapp-init recipe.
#
SUMMARY = "Simple myinit application"
SECTION = "PETALINUX/apps"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COMMON_LICENSE_DIR}/MIT;md5=0835ade698e0bcf8506ecda2f7b4f302"

SRC_URI = "file://myinit \
 "

S = "${WORKDIR}"

FILESEXTRAPATHS_prepend := "${THISDIR}/files:"
inherit update-rc.d
INITSCRIPT_NAME = "myinit"
INITSCRIPT_PARAMS = "start 99 S ."

do_install() {
 install -d ${D}${sysconfdir}/init.d
 install -m 0755 ${S}/myinit ${D}${sysconfdir}/init.d/myinit
}

FILES_${PN} += "${sysconfdir}/*"

【files/myinit】

#!/bin/sh
cd ~/home/root
./run

These files make 'myinit' a start-up application that will execute "run" script at "home/root" directory.

■ Build Petalinux

Now we can build the Petalinux system by the following command.

* The "petalinux-build" command may take a long time.

petalinux-build

Error messages will appear the first time saying there are errors in the device tree. Then, open "pl.dtsi" in the following directory and delete entries of mipi_csi_rx_subsyst_0 and mipi_csi_rx_subsyst_1.

"[WORK_DIR]/U96-SLAM/petalinux/components/plnx_workspace/device-tree/device-tree/pl.dtsi"

The resulting file should look like this.

【pl.dtsi】

/ {
	amba_pl: amba_pl@0 {
		#address-cells = <2>;
		#size-cells = <2>;
		compatible = "simple-bus";
		ranges ;
		dvp_0: dvp@a0000000 {
			clock-names = "s00_axi_aclk", "m00_axi_aclk";
			clocks = <&zynqmp_clk 71>, <&zynqmp_clk 71>;
			compatible = "xlnx,dvp-1.0";
			interrupt-names = "intr";
			interrupt-parent = <&gic>;
			interrupts = <0 89 4>;
			reg = <0x0 0xa0000000 0x0 0x10000>;
			xlnx,m00-axi-addr-width = <0x20>;
			xlnx,m00-axi-aruser-width = <0x0>;
			xlnx,m00-axi-awuser-width = <0x0>;
			xlnx,m00-axi-burst-len = <0x10>;
			xlnx,m00-axi-buser-width = <0x0>;
			xlnx,m00-axi-data-width = <0x20>;
			xlnx,m00-axi-id-width = <0x1>;
			xlnx,m00-axi-ruser-width = <0x0>;
			xlnx,m00-axi-target-slave-base-addr = <0x40000000>;
			xlnx,m00-axi-wuser-width = <0x0>;
		};
		misc_clk_0: misc_clk_0 {
			#clock-cells = <0>;
			clock-frequency = <200000000>;
			compatible = "fixed-clock";
		};
		misc_clk_1: misc_clk_1 {
			#clock-cells = <0>;
			clock-frequency = <1500000000>;
			compatible = "fixed-clock";
		};
	};
};

CSI interfaces seem to be automatically added to the device tree, but we don't need them here because they are controlled by the bare-metal application.

* This file is auto-generated and not supposed to be edited manually but I couldn't find any other way. This problem persists every time when you edit "system-user.dtsi".

Then build the Petalinux system again.

petalinux-build

The project should be built successfully this time.

■ Create SDK

Type the following command to create SDK for the platform project.

petalinux-build --sdk

This will generate "sdk.sh" at "/images/linux/" directory.

Then type the following commands to unpack "sdk.sh" at its current location.

cd images/linux
petalinux-package --sysroot

Build Platform Project (on Ubuntu)

Create "linux.bif" file under "petalinux/images/linux/" directory. Then copy and paste the below text.

【linux.bif】

/* linux */
the_ROM_image:
{
    [fsbl_config] a53_x64
    [bootloader] <zynqmp_fsbl.elf>
    [pmufw_image] <pmufw.elf>
    [destination_device=pl] <bitstream>
    [destination_cpu=a53-0, exception_level=el-3, trustzone] <bl31.elf>
    [destination_cpu=a53-0, exception_level=el-2] <u-boot.elf>
}

Launch Vitis, and choose "[WORK_DIR]/U96-SLAM/vitis" as its workspace.

Click "File→New→Platform Project..." to open the "New platform project" dialog. Proceed with the following settings.

[Create new platform project]
  Platform project name: platform
[Platform]
  Choose "Create a new platform from hardware (XSA)".
  XSA File: [WORK_DIR]/U96-SLAM/vivado/fpga_top/design_1_wrapper.xsa
  Operating system: linux
  Processor: psu_cortexa53
  Architecture: 64-bit
  Generate boot components: checked
  Target processor to create FSBL: psu_cortexa53_0

Click "Finish".

Select "platform→psu_cortexa53→linux on psu_cortexa53" in the left pane.

Fill in the following information in the "Domain: linux_domain" dialog.

Bif File                 : [WORK_DIR]/U96-SLAM/petalinux/images/linux/linux.bif
Boot Components Directory: [WORK_DIR]/U96-SLAM/petalinux/images/linux/
Linux Image Directory    : [WORK_DIR]/U96-SLAM/petalinux/images/linux/
Linux Rootfs             : [WORK_DIR]/U96-SLAM/petalinux/images/linux/rootfs.tar.gz
Sysroot Directory        : [WORK_DIR]/U96-SLAM/petalinux/images/linux/sdk/sysroots/aarch64-xilinx-linux

Select "platform→psu_cortexa53_0→zynqmp_fsbl→Board Support Package"

Click "Modify BSP Settings...".

Select "Overview → standalone" and change as follows.

stdin : psu_uart_1
stdout : psu_uart_1

Click "OK".

Build the project by clicking the hammer icon.

⇒ A project named "platform" will be created in the Vitis workspace.

Now we are ready to build a Linux application that runs on this platform.

Build SLAM Application (Ubuntu)

Launch Vitis, and choose "[WORK_DIR]/U96-SLAM/vitis" as its workspace.

Click "File→New→Application Project..." to open "New Application Project" dialog.

Proceed with the following settings.

[Platform]
  Select a platform from repository: platform [custom]
[Application Project Details]
  Application project name: slam
[Domain]
  Remain as default.
[Templates]
  SW development templates: Empty Application (C++)

* The platform project may not be recognized if the Vitis workspace is created inside the git-controlled directory. If this happens try to create the Vitis workspace somewhere outside of the git-controlled directory.

Click "Finish".

⇒ A project named "slam" will be created.

Copy "src" and "include" directories in "vitis/slam" in the git repository to [WORK_DIR]/U96-SLAM/vitis/slam/" directory.

Click the arrow icon beside the hammer icon and choose "Release".

Right-click "slam" in the "Explorer" pain and choose "C/C++ Build Settings". Set as follows.

[ARM v8 Linux g++ compiler]
 ├─Directories
 │  └─Include Paths
 │     [WORK_DIR]/U96-SLAM/petalinux/images/linux/sdk/sysroots/aarch64-xilinx-linux/usr/include
 │     [WORK_DIR]/U96-SLAM/vitis/slam/include
 └─Miscellaneous
    -c -fmessage-length=0 -MT"$@" -ftemplate-backtrace-limit=0
[ARM v8 Linux g++ linker]
 └─Libraries
   └─Libraries
      opencv_core
      opencv_photo
      opencv_video
      opencv_videoio
      opencv_optflow
      opencv_tracking
      opencv_features2d
      opencv_imgcodecs
      opencv_highgui
      opencv_imgproc
      opencv_calib3d
      pthread

Right-click "slam" in the "Explorer" pain and click "Clean Project".

Right-click "slam" again and click "Build Project".

⇒ "Release/slam.elf" will be generated.

Prepare SD Card (onUbuntu)

SD card is formatted using GParted as shown in the below figure.

■ Boot Files

If you have changed the FPGA design, copy "design_1_wrapper.bit" from Windows to Ubuntu. The below command assumes the ".bit" file is located in the "/vivado" directory.

Type the following commands to create "BOOT.BIN".

cd [WORK_DIR]/U96-SLAM/petalinux
petalinux-package --boot --force --fsbl images/linux/zynqmp_fsbl.elf --fpga ../vivado/design_1_wrapper.bit --u-boot

⇒ "BOOT.BIN" will be generated in the "/petalinux/images/linux/" directory.

Copy the following 3 files to the BOOT directory on the SD card.

[WORK_DIR]/U96-SLAM/petalinux/images/linux/boot.scr
                                           BOOT.BIN
                                           image.ub

■ System Files

Extract "rootfs.tar.gz" under the "root" directory on the SD card by the following command.

sudo tar xzvf [WORK_DIR]/U96-SLAM/petalinux/imeges/linux/rootfs.tar.gz -C [SD_CARD_DIR]/root

Create "firmware" directory under "[SD_CARD_DIR]/root/lib/" directory.

Copy "StereoBM.elf" and "slam.elf" to the above directory.

Run Applications

According to the auto-run settings, "run" script in "root/home/root/" directory on the SD card will be executed automatically.

Create a file named "run" and give it execute permission.

chmod 774 run

Then, copy and paste the following content and move the file to "root/home/root/" directory on the SD card.

【home/root/run】

rm *.csv
rm *.bmp
rm *.png
rm *.jpg
rm *.txt
rm -rf work
echo StereoBM.elf > /sys/class/remoteproc/remoteproc0/firmware
echo start > /sys/class/remoteproc/remoteproc0/state
#/lib/firmware/slam.elf -app "STEREO_CAPTURE" -lc "calib_left.yml" -rc "calib_right.yml"
#/lib/firmware/slam.elf -app "FRAME_GRABBER"
#/lib/firmware/slam.elf -app "SLAM_BATCH" -dir "kitti/sequences/00" -l "image_0" -r "image_1" -t "times.txt" -gt "../../poses/00.txt" -lc "calib.txt" -n 100
/lib/firmware/slam.elf -app "SLAM_REALTIME" -lc "calib_left.yml" -rc "calib_right.yml"
shutdown -h now

The "echo" commands are related to OpenAMP that will start "StereoBM" application on the remote processor. "StereoBM.elf" must be located in "lib/firmware".

The next lines launch the SLAM application with some argument parameters. This file contains examples of every application type. Uncomment one of these and modify it appropriately.

The last line will shut down the OS. The output files may not be generated if the OS isn't shut down appropriately.

You may also need calibration files and test data in this directory depending on the application type.

Utility Programs

The git repository contains some utility programs.

They are written for Visual C++ Express 2015 on Windows. Source files are in their respective directory except "slam" project. Source files for "slam" project are identical to "slam" project on Petalinux that we have already built. Create the Visual C++ project and add source files to the project.

Here listed are other settings necessary to build successfully. They are meant for the "Release/x64" build.

For all of these programs, OpenCV 3.x is necessary. Install it and set the execution path to their DLLs. In this article, it is assumed that OpenCV 3.2.0 is installed in the following directory structure.

opencv-3.2.0
  └─build
    ├─include
    │ └─opencv2
    └─x64
      └─vc14
        ├─bin
        │ ├─opencv_world320.dll
        │ └─opencv_world320d.dll
        └─lib
          ├─opencv_world320.lib
          └─opencv_world320d.lib

■ capture_video

This program captures images from USB Video Class devices.

When the "Enter" key is hit, a received image will be split horizontally in half before being written to files. If used with U96-SLAM that runs in "Frame Grabber" mode, this program will split images to the left and right appropriately. Press "ESC" to quit the program.

"DEVICE_ID" in "main.cpp" determines which device to open. These indices are assigned automatically by the system in incremental order. You may need to change the value depending on the number of UVC devices already attached to your PC.

[Configuration Properties]
  C/C++ → General → Additional Include Directory: 
    [OPENCV_DIR]\opencv-3.2.0\build\include
  Linker → General → Additional Library Directories: 
    [OPENCV_DIR]\opencv-3.2.0\build\x64\vc14\lib
           Input → Additional Dependencies: opencv_world320.lib
[Argument parameters]
  None

■ stereo_calib

This program reads stereo image pairs of chessboard patterns, calculates stereo calibration parameters using OpenCV functions, then stores them into files.

[Configuration Properties]
  C/C++ → General → Additional Include Directory: 
    [OPENCV_DIR]\opencv-3.2.0\build\include
  Linker → General → Additional Library Directories: 
   [OPENCV_DIR]\opencv-3.2.0\build\x64\vc14\lib
           Input → Additional Dependencies: opencv_world320.lib
[Argument parameters]
  -w, -h : The number of the 'inner' intersections of the chessboard pattern.
  -s : The size of the grid in meters. The unit of this parameter is important as it determines all the subsequent units including the pose graph output of SLAM application.
[Example]
  -w=7 -h=5 -s=0.03 [FILE_PATH]/dataset.xml

■ slam

This is the Windows version of the SLAM application. Source files are the same as the SLAM application on Petalinux. Add all files under "src" directory to the project. Only the batch-process mode without FPGA acceleration is available on Windows.

[Configuration Properties]
  C/C++ → General → Additional Include Directory: 
    [OPENCV_DIR]\opencv-3.2.0\build\include
    [WORK_DIR]\U96-SLAM\vc\slam\include
          Advanced → Disable Specific Warnings: 4996;4819
  Linker → General → Additional Library Directories: 
    [OPENCV_DIR]\opencv-3.2.0\build\x64\vc14\lib
           Input → Additional Dependencies: opencv_world320.lib
[Argument parameters]
-app    : Application type, only "SLAM_BATCH" is available.
-dir    : Base directory path. All the below paths are relative to this directory.
-l/-r   : Image file paths.
-lc/-rc : Calibration file paths.
-t      : Path to timestamp file.
-gt     : Path to ground truth file.
-n      : Number of files to be preocessed, negative value means all files.
[Example]
-app "SLAM_BATCH" -dir "KITTI/odometry/dataset/sequences/00" -l "image_0" -r "image_1" -t "times.txt" -gt "../../poses/00.txt" -lc "calib.txt" -n -1

How to Operate

This chapter describes how to operate the "slam" application in various application types. Some modes rely on sensor board hardware to operate.

【Argument Parameters】

app: Application type. The details are described in the below example.
dir: Base directory path. All the below paths are relative to this directory.
l/-r: Image file paths for the left and right cameras.
lc/-rc: Calibration file paths for the left and right cameras. For the KITTI dataset, use "-lc" to specify the calibration file. "-rc" is optional in this case.
t: Path to timestamp file. Optional. When omitted, the current system time is used.
gt: Path to ground truth file. Optional. The benchmark errors will be computed only when the ground truth file is provided. Only the KITTI format is available.
n: Number of files to be processed, negative value means all files.
quiet: The log messages won't be displayed when this argument is set.
memory: Memory consumption will be recorded to a file when this argument is set.

【Output Files】

optimized_poses.csv: The optimized pose graph. The 3 x 4 transform matrices that represent camera poses are stored in row-major order.

slam.bt: The 3D occupancy grid map in binary tree format.

perf_time.csv: The processing times required for major functions are recorded in this file. Used for statistical purposes.

perf_memory.csv: Memory consumptions for large data structures are recorded in this file. Used for statistical purposes.

■ Batch-process SLAM

In this mode, rectified stereo image pairs are fed from the SD card, then perform SLAM operation. Input images must be stereo-rectified beforehand in this mode. Such data can be obtained from benchmark datasets such as KITTI or captured by yourself using this hardware in "Stereo Capture" mode.

This mode is a pure software solution and doesn't rely on the sensor board or FPGA.

[Example]
-app "SLAM_BATCH" -dir "kitti/sequences/00" -l "image_0" -r "image_1" -t "times.txt" -gt "../../poses/00.txt" -lc "calib.txt" -n 1700

■ Realtime SLAM

This mode performs SLAM operation for the image sensor inputs in real-time.

After the system boot, the LED will blink when the system is ready. Pressing the push switch at this state will start the operation.

Press the switch again to stop the application and the results will be stored in files. Then the Linux system is going to shut down.

[Example]
-app "SLAM_REALTIME" -lc "calib_left.yml" -rc "calib_right.yml"

■ Frame Grabber mode

This mode captures raw stereo image pairs and sends them over USB 3.0 connection. The stereo image pairs are horizontally concatenated.

When connected to Windows PC, this device is recognized as a UVC device like below so the images can be viewed with a generic camera application.

[Example]
-app "FRAME_GRABBER"

■ Stereo Capture mode (stand-alone)

This mode captures stereo-rectified image pairs into the SD card.

After the system boot, the LED will blink when the system is ready. Pressing the push switch at this state will start the operation.

Press the switch again to stop the application and the Linux system is going to shut down.

This mode is designed to operate without a PC connection and is suited to use with a mobile battery. Captured images can be processed later in "Batch-process SLAM" mode.

[Example]
-app "STEREO_CAPTURE" -lc "calib_left.yml" -rc "calib_right.yml"

Stereo calibration

This chapter describes how I obtained the stereo calibration files. Here is just an example as there are various other ways.

This repository contains a Visual C++ project that calculates stereo calibration parameters. This program utilizes the chessboard pattern method contained in OpenCV. Such a chessboard pattern is found in "misc/chessboard.pdf" in git repository. The numbers of inner grids are 7 and 5 in this pattern. The size of grids is 3cm when printed on A4 paper.

To capture stereo images, connect the hardware to the Windows PC with a USB3 cable, and launch the application in "Frame Grabber" mode with the following argument parameters.

/lib/firmware/slam.elf -app "FRAME_GRABBER"

Launch the "capture_video" program on Windows.

Press "Enter" to capture the images into ".jpg" files. The images are split horizontally, and the left images are stored in the "image_0" directory, and the right images in "image_1".

Press "ESC" to end the program.

Create "dataset.xml" to contain the captured images like below.

【dataset.xml】

<?xml version="1.0"?>
<opencv_storage>
<imagelist>
"data/image_0/000000.png"
"data/image_1/000000.png"
"data/image_0/000001.png"
"data/image_1/000001.png"
  |
"data/image_0/000010.png"
"data/image_1/000010.png"
</imagelist>
</opencv_storage>

Then, launch the "stereo_calib" program and specify the above "dataset.xml" like this.

-w=7 -h=5 -s=0.03 [FILE_PATH]/dataset.xml

The stereo calibration parameters will be calculated and stored in files as 'calib_left.yml' and 'calib_right.yml'.

These files can be used to operate the SLAM application in the previous example.

Limitations

Visual odometry will be lost rather easily when the camera is rotated especially when the objects are near. This is due to the characteristics of the image sensors in use. I have tried changing the parameters of the image sensors but unable to fix the problem so far.
Real-time mode is not sufficiently tested because it is difficult for me to acquire reliable test data. If you try to test this project, please do so with caution.

Future Plans

Replace with computer-vision-oriented image sensors
Auto-calibration
Save/load features for multi-map session
Integration with other sensors such as IMU and GNSS
Migration to a smaller device such as this

License

The design files are contained in the "src" directory in the git repository. They are distributed under the following licenses.

src
 ├─slam
 │  └─src/include
 │     ├─core     ┄┄ MIT
 │     ├─octomap  ┄┄ New BSD
 │     ├─flann    ┄┄ BSD
 │     ├─rtabmap  ┄┄ BSD
 │     └─opencv   ┄┄ New BSD
 ├─StreoBM        ┄┄ MIT
 ├─dvp            ┄┄ MIT
 ├─capture_video  ┄┄ MIT
 └─stereo_calib   ┄┄ MIT