•

Yankui Wang

Published March 30, 2022 © MIT

Self-driving Car Based on Learning from Vision Demonstration

A model car driven by KV260 and a USB camera mounted on the front.

AdvancedFull instructions provided20 hours1,665

Edge Computing 3rd Place

Adaptive Computing Challenge 2021

Self-driving Car Based on Learning from Vision Demonstration

Things used in this project

Hardware components

AMD Kria™ KV260 Vision AI Starter Kit

USB Webcam

Model car with ROS support

an ackermann-steering model car that supports ROS

Software apps and online services

AMD Vitis Unified Software Platform

ROS Robot Operating System

Ubuntu 20.04

Ubuntu 20.04 with ROS Noetic and Docker installed.

Story

In many situations, we only want a car to be able to complete some specific tasks in a new scenario. In traditional self-driving tasks, a new walking or patrolling task is generally required to be coordinated by LIDAR, camera and other devices. Such solution requires tuning the parameters for each new environment and is not flexible enough. To be able to make the car adapt to new scenarios quickly, we got inspiration from Learning from Demonstrations. We perform imitation learning with flexible and easy-to-collect visual data and neural networks. With such approach, we only need to manually drive the car to collect image data as well as driving parameters as labels to build the dataset and train the network model. At application, the car reads the camera and then outputs the steering angle through the model to complete the task.

Here's what the final result looks like:

Just to clarify: You may notice the car shown in later pictures has a LiDAR mounted. Our model car happens to come with a LiDAR and it's unrelated to this project. We later removed it when shooting the final demonstration video.

Prerequisite: familiarity with Linux, a deep-learning framework of your choice and ROS (Robotic Operating System)

All the code used in this project can be found in the code section or this bitbucket repo.

Part 1: Collect the Training Data

The model car we used was originally equipped with a FriendlyElec NanoPC-T4 SBC, so we collected the dataset with that and trained the network before equipping the car with a KV260 board. You can skip to part 3 and setup KV260 with ROS first if your car doesn't come with a SBC.

We are using a supervised learning approach. And the most important part of it is a labeled dataset.

For this project, we'll drive the car manually, record the image collected by the camera in 640x480 resolution and label them with the steering angle input from us.

Step 1: prepare the ROS workspace

We assume that ROS is already installed on the SBC controlling the car. You can follow part 3 in this tutorial to create a KV260 image with ROS or follow the installation guide on wiki.ros.org if you are using a different SBC for this step.

Connect to the car via SSH and create our workspace:

mkdir -p ~/ros_ws/src

The ROS driver for our car accepts standard AckermannDriveStamped message as the control input. In this part, we need three nodes: car driver, a keyboard controlling node and a USB camera node.

Copy the car driver to ros_ws/src via scp or other tools you familiar with, and download the other two by executing the following command on the SBC:

cd ~/ros_ws/src
# download the ackermann-drive-teleop project:
git clone https://github.com/gkouros/ackermann-drive-teleop ackermann_drive_teleop
# install the usb_cam from ROS with apt. Skip the following two commands if you are using KV260.
sudo apt update
sudo apt install ros-noetic-usb-cam

Now build the workspace:

source /opt/ros/noetic/setup.bash
cd ~/ros_ws
catkin_make

Step 2: drive the car and record some data

We first used a black carpet with white lane drawn on it as an example scene for proof of concept:

The lane carpet used in this tutorial

That's where the pictures in this tutorial comes from. We later drived the car around in our lab to gather a different dataset for the final result in our cover video.

In ROS, all nodes communicate with each other using messages. ROS comes with a utility called rosbag which can record all the messages sent by nodes in a timely order, and we'll use that for this step.

Open four ssh sessions to the car for various nodes.

First node is the car driver:

cd ~/ros_ws
source ./devel/setup.bash
# the exact package and node name in the command below depends on the actual car used
# consult your car manual for the actual command.
roslaunch base_control base_control.launch

The second one is for USB camera:

cd ~/ros_ws
source ./devel/setup.bash
# This launch file comes with the official ROS package and is conveniently configured to capture 640x480 on /dev/video0. We'll just use this one.
roslaunch usb_cam usb_cam-test.launch

The third one is for recording a rosbag:

cd ~/ros_ws
source ./devel/setup.bash
rosbag record

Launch the keyboard control node in the final one:

cd ~/ros_ws
source ./devel/setup.bash
rosrun ackermann_drive_teleop keyop.py

Now you can control your car with the arrow keys on the 4th terminal. Put your car on the field and drive it around on the desired path:

After that, terminate the rosbag command with Ctrl+C and you'll find a {date}-{time}.bag file in ~/ros_ws. Copy this file back to your linux desktop and shutdown the car.

Step 3: extract the images and labels

Now let's go back to our Linux PC and extract our dataset. We'll be labelling each image with the last recorded steering angle using a python script.

Open a terminal and create a ros workspace:

source /opt/ros/noetic/setup.bash
mkdir -p ~/ros_ws/src
cd ~/ros_ws/src
# create our package
catkin_create_pkg data_extractor std_msgs ackermann_msgs cv_bridge rospy
cd data_extractor
mkdir scripts

Create ~/ros_ws/src/data_extractor/scripts/extractor.py with the source code provided in the code section and add executable permission to it:

chmod +x extractor.py

Build the workspace:

source /opt/ros/noetic/setup.bash
cd ~/ros_ws
catkin_make
source ./devel/setup.bash

Copy the previously recorded rosbag to ~/ros_ws and extract our images:

mkdir dataset
# replace {date}-{time}.bag with the filename of your rosbag
rosrun data_extractor extractor.py labels.csv {date}-{time}.bag dataset

The dataset folder contains the extracted image and the corresponding labels are stored in labels.csv. We'll use these to train our neural network in the next part.

Part 2: Build and Train Our Network

In this part, we'll build our neural network steering the car. As a demonstration, we'll be using PyTorch to build a simple CNN for the lane carpet scene. You can pick any deep-learning framework supported by Vitis-AI and follow the corresponding Tutorial from Xilinx to create your quantized model.

My team mate later traind a ResNet18 for the lab scene in the cover video, but the procedure is the same so we won't duplicate it here.

Step 1: prepare the Vitis-AI docker environment

We'll use Vitis-AI 2.0 in this tutorial. Let's first download the source code:

git clone --recurse-submodules https://github.com/Xilinx/Vitis-AI
cd Vitis-AI
git checkout v2.0

You can only use CPU for training but it'll be faster if a NVIDIA GPU is used instead.

Unfortunately the GPU docker is too big and Xilinx doesn't provide a prebuilt image for us. We need to build it following the README.md in Vitis-AI repository:

cd setup/docker
./docker_build_gpu.sh

Step 2: code and train our network

The network used in this part is based on the PyTorch CIFAR10 Tutorial and the custom dataset is created following PyTorch Data Loading Tutorial. I highly recommend you to read through both tutorials to understand what's going on in the code provided.

There are three scripts used in this step:

dataset.py: the utility to load our dataset.
model_reg.py: the neural network we used.
train.py: the training script

Copy these created files and the dataset into the Vitis-AI directory and enter our container:

cd Vitis-AI
./docker_run.sh xilinx/vitis-ai-gpu:latest

We'll be greeted with the banner below:

Activate the PyTorch envionment as shown in the picture, and execute our training script:

conda activate vitis-ai-pytorch
python3 training/train_reg.py

After the 'Finished Training' output, we'll get the trained network state dictionary file named state_dict_rgb.pth.

Step 3: Quantizing the network

Vitis-AI runs AI inference with int8 while our neural network is trained with float point. We need to convert the model using tools from Xilinx Vitis-AI docker container. It first converts the model, then determines various model parameters with supplied input data and save the result for Vitis-AI compiler.

The script for quantizing the model is provided as quantize_xlnx.py. It's written based on the Xilinx example.

Call the script to get the quantized model:

python3 training/quantize_xlnx.py calib
python3 training/quantize_xlnx.py test

Calibration

Test

We now have the quantized model in quantize_result/Net_int.xmodel. The exact file name will be different depending on the deep-learning framework you choose. This model needs to be compiled for the specific DPU we deploy on KV260. We'll do this after we obtain the arch.json describing our DPU.

Part 3: Setup KV260

In this part, we'll be building a custom PL bitstream with Vivado and Xilinx provided tcl scripts, as well as a PetaLinux image with all the software packages we need in later steps.

Step 1: create the base hardware platform

Our model car is controlled over UART, so I need to wire a UART to the pmod interface on the KV260. Xilinx has a tutorial on building a custom hardware design: Using Vivado to Build the Hardware Design. I'll just list the exact customization I made, instead of describing the whole process.

At the time of writing, the kv260-vitis repository is only suitable for Vivado 2021.1. To use Vivado 2021.2, we need to change the board name for KV260 SOM and carrier board. We need to rename "*:kv260:*" to "*:kv260_som:*" and change "xilinx.com:som240:som240_1_connector:1.0" to "xilinx.com:kv260_carrier:som240_1_connector:1.2".

You can change it manually after cloning the kv260-vitis repository, or you can apply the provided patch on top of release-2021.1 branch.

Generate an Extensible XSA using these commands:

source {your_vitis_install_dir}/Vitis/2021.2/settings64.sh
cd platforms/vivado/kv260_ispMipiRx_vcu_DP
make xsa

Now let's open the project in Vivado. First, open the Vivado GUI, then run the following command from the Vivado tcl console:

open_project ./project/kv260_ispMipiRx_vcu_DP.xpr

Here's a part where the Xilinx tutorial doesn't mention: we need to disable incremental synthesis, otherwise Vitis will complain about missing files when building the overlay later.

In the Flow Navigator panel on the left-hand side, click Settings under Project Manager. In the pop-up window, click Synthesis under Project Settings:

Click the three dot on the right of Incremental synthesis and choose Disable incremental synthesis:

Click OK to close the settings window. Under IP integrator, click on Open Block Design. An IP integrator block design becomes visible that contains the Processing System (PS) IP and other PL IPs.

Double click the ZynqMP in the center of the block diagram:

Click the UART0 as shown in the picture above. You'll be broght to the screen below:

Check the UART0 box and select EMIO in the IO field. Then click OK. You'll notice there's a new UART_0 in the ZynqMP block:

Right click on that and select Make external. I also decided to rename the external connection to uart_ps0 here. While at it, I removed the PL I2C and connected the PS I2C0 for the onboard camera, added a second PL uart for future use and incoporated this awesome temperature-controlled fan. Here's my final block diagram:

Now click Run Synthesis to complete the synthesis and click Open Elaborated design:

Click the '19 I/O ports' shown in picture and you'll find a port list below the main window:

tomverbeure on GitHub organized a package pin list here: kv260_pinout.py. We can find the pins we need and assign the ports to it, as shown in the picture above.

Now our hardware platform is ready. Click "Generate Bitstream" in the Flow Navigator and complete the steps, then click Export Platform to get our new.xsa file.

Step 2: build the overlay

First, let's replace overlays/dpu_ip in kv260-vitis with dsa/DPU-TRD/dpu_ip from Vitis-AI repository to use the latest DPU.

Then, follow the Xilinx tutorial to Create a Vitis Platform and Integrate the overlay into the Platform. I'll use OpenCV for image preprocessing later so I built overlays/examples/benchmark to incoporate a larger DPU in PL. After building, you'll find arch.json, dpu.xclbin and kv260_ispMipiRx_vcu_DP_wrapper.bit in overlays/examples/benchmark/binary_container_1/sd_card/

Step 3: write a device tree for our overlay

Since I'm using the smartcam base platform, I can modify the kv260-smartcam.dtsi for our use. Enable uart0, add the PL uart we wrote before and move the I2C definition from the PL node into the PS i2c0. You can find vai20-2uart.dtsi in the provided source code.

Step 4: Build PetaLinux

First, download PetaLinux 2021.1 and the K26 Starter Kit BSP from Xilinx download page.

Install petalinux and create our base project:

source {your_petalinux_install}/PetaLinux/2021.1/settings.sh
# update petalinux for kv260
petalinux-upgrade -u http://petalinux.xilinx.com/sswreleases/rel-v2021/sdkupdate/2021.1_update1/ -p "aarch64" --wget-args "--wait 1 -nH --cut-dirs=4"
petalinux-create -t project -s /<path to>/xilinx-k26-starterkit-v2021.1-final.bsp -n xilinx-k26-starterkit-2021.1
cd xilinx-k26-starterkit-2021.1
# import our xsa from step 1
petalinux-config --get-hw-description /path/to/xsa
# build it first. This will take a while to finish:
petalinux-build
# set board variant for KV260
echo 'BOARD_VARIANT = "kv"' >>  project-spec/meta-user/conf/petalinuxbsp.conf

Create a package for our overlay using files from step 2 and 3 following Add New FPGA Firmware:

# I've renamed the files from previous steps to vai20-2uart.* here.
petalinux-create -t apps --template fpgamanager -n user-firmware --enable --srcuri "vai20-2uart.bit vai20-2uart.dtsi vai20-2uart.xclbin shell.json"

Add ROS into our image following ROS 2 in Kria kv260 with Petalinux 2021.2. In Step 2 of this tutorial,, instead of adding ROS 2 Rolling, we add ROS Noetic, which means adding the following layers:

${PROOT}/project-spec/meta-ros/meta-ros-backports-hardknott
${PROOT}/project-spec/meta-ros/meta-ros-common
${PROOT}/project-spec/meta-ros/meta-ros2
${PROOT}/project-spec/meta-ros/meta-ros2-rolling

And in step 3 we create project-spec/meta-user/recipes-core/images/petalinux-image-minimal.bbappend with the following content:

inherit ros_distro_${ROS_DISTRO}
inherit ${ROS_DISTRO_TYPE}_image

To make other ROS packages available for selection, we add the packages we need into project-spec/meta-user/conf/user-rootfsconfig:

CONFIG_ros-core
CONFIG_usb-cam
CONFIG_ackermann-msgs
CONFIG_cv-bridge
CONFIG_cv-bridge-dev
CONFIG_catkin-dev
CONFIG_ackermann-msgs-dev
CONFIG_sensor-msgs-dev
CONFIG_roscpp-dev
CONFIG_vai20-2uart
CONFIG_tf2
CONFIG_tf2-dev

Add Vitis-AI 2.0 libraries following How to use recipes-vitis-ai.

After that, open petalinux rootfs config:

petalinux-config -c rootfs

select the packages we need in user packages menu:

We'll be building the ROS packages on the KV260 board, so we need to select all the -dev packages and packagegroup-petalinux-self-hosted from Petalinux Package Groups menu.

Finally, let's build our images:

# compile everything
petalinux-build
# package BOOT.BIN. Add --force if you've previously generated it.
petalinux-package --boot --u-boot --dtb images/linux/u-boot.dtb
# package the sdcard image
petalinux-package --wic --bootfiles "ramdisk.cpio.gz.u-boot boot.scr Image system.dtb"

Step 5: Flash the images

Write the generated petalinux image to the sd card and power on the board. After that, upgrade the BOOT.BIN following Boot Firmware Updates.

We now have our KV260 board ready. Install it on the model car and continue following the next part of the tutorial.

Part 4: Deploy the model

In this part, we'll finally get KV260 to steer the model car.

Step 1: compile the network

Copy the arch.json from part 3 and Net_int.xmodel from part 2 to the Vitis-AI directory and enter the Vitis-AI Docker container:

cd Vitis-AI
./docker_run.sh xilinx/vitis-ai-gpu:latest
conda activate vitis-ai-pytorch

Run the following command to compile the network:

vai_c_xir -x /PATH/TO/quantized.xmodel -a /PATH/TO/arch.json -o /OUTPUTPATH -n netname

The output file are in the specified output path as shown in the picture above.

Step 2: write the deloyment code

If you haven't done it, prepare a ROS workspace on KV260 as demonstrated in the step 1 of part 1.

Create our package for the deployment code:

cd ~/ros_ws/src
source ./devel/setup.bash
# create our package
catkin_create_pkg lane_dl_xlnx std_msgs ackermann_msgs cv_bridge roscpp
cd lane_dl_xlnx
mkdir launch model

And now it's the time for more code. We'll be running the model using the VART API. You can find the Xilinx sample code for using VART API from resnet50.cpp in Vitis-AI repository.

lane_dl_xlnx.cpp is our code for using VART API with ROS. Save it as ~/ros_ws/src/lane_dl_xlnx/src/main.cpp.

To build our code, we need to add it to the CMakeLists.txt in the package. The finished script is provided in the source code.

We also need a launch file to start the node. Save lane_dl_xlnx.launch as ~/ros_ws/src/lane_dl_xlnx/src/.Copy the compiled model from step 1 to ~/ros_ws/src/lane_dl_xlnx/model/lane.xmodel and build our workspace:

cd ~/ros_ws
catkin_make

Here comes the most exciting part. Put your car in the previously trained scene and connect two ssh sessions.

In the first session, load the PL bitstream and start the neural network node:

# unload the default firmware
sudo xmutil unloadapp
# load the firmware we created
sudo xmutil loadapp vai20-2uart
# start the node
cd ~/ros_ws
source ./devel/setup.bash
roslaunch lane_dl_xlnx lane_dl_xlnx.launch

Start the car driver in the second session:

cd ~/ros_ws
source ./devel/setup.bash
# the exact package and node name in the command below depends on the actual car used
# consult your car manual for the actual command.
roslaunch base_control base_control.launch1

The car should be following our previous driving path on itself!

Result

Here's our car running on the lane carpet:

And you can try some different scenes and neural networks like what we've done in the cover video.

#!/usr/bin/env python3
import rosbag
from cv_bridge import CvBridge
import cv2
import csv
import sys

# CvBridge is the utility to convert a ROS image message into an OpenCV matrix.
bridge = CvBridge()

# parameters: extractor.py {label.csv} {extracted_image_path}
with rosbag.Bag(sys.argv[2], 'r') as bag, open(sys.argv[1], 'a', newline='') as csvfile:
        speed = 0
        angle = 0
        img_counter = 0
        # use CSV writer to write our CSV file
        writer = csv.writer(csvfile)
        # only read the camera topic and command topic
        for topic, msg, t in bag.read_messages(topics=['/usb_cam/image_raw', '/ackermann_cmd']):
                if topic == '/ackermann_cmd':
                        # record the current speed and steering angle
                        print("command: speed {} angle {}".format(msg.drive.speed,msg.drive.steering_angle))
                        speed = msg.drive.speed
                        angle = msg.drive.steering_angle
                        # if the speed is positive, meaning the car is actually driving,
                        # save the image and steering angle as label:
                elif speed > 0: # usb_cam/image_raw
                        cv_image = bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')
                        # This is used for the lane scene. The car should be following drawn lanes,
                        # which only appears on the lower half of the image. Let's crop off the upper
                        # half because we don't need it here.
                        img_roi = cv_image[240:480, 0:640]
                        img_name = '{}/{}.png'.format(sys.argv[3], img_counter)
                        # save the image
                        cv2.imwrite(img_name, img_roi)
                        # save the label into CSV.
                        writer.writerow([img_name, angle])
                        img_counter += 1

import os
import csv
import torch
from torchvision.io import read_image
from torch.utils.data import Dataset

class LaneDataset(Dataset):
    def __init__(self, label_path, transform=None, target_transform=None):
        self.transform = transform
        self.target_transform = target_transform
        self.labels = []
        with open(label_path, 'r', newline='') as csvfile:
            csv_reader = csv.reader(csvfile)
            for row in csv_reader:
                self.labels.append(row)

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.labels[idx][0])
        image = read_image(img_path)
        label = float(self.labels[idx][1])
        label = torch.tensor([label])
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

import torch
from torch import nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(3, 3)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 24 * 69, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 7)
        self.fc4 = nn.Linear(7, 1)

    def forward(self, x):
        #print(x.shape)
        x = self.pool(F.relu(self.conv1(x)))
        #print(x.shape)
        x = self.pool(F.relu(self.conv2(x)))
        #print(x.shape)
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        #print(x.shape)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        x = self.fc4(x)
        return x

#!/usr/bin/env python3
import torch
from torch.utils.data import DataLoader
from torch import nn
import torchvision.transforms as transforms
import torch.optim as optim
from dataset import LaneDataset
from model_reg import Net

transform = transforms.Compose([
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train_dataset = LaneDataset('labels.csv', transform=transform)
dataloader = DataLoader(train_dataset, batch_size=500, shuffle=True, num_workers=6)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

net = Net()
net.to(device)

criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)

for epoch in range(20):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(dataloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()

        # print statistics
        running_loss += loss.item()
        print('[%d, %d] loss: %.5f' % (epoch, i, running_loss))
        running_loss = 0.0
    torch.save(net.state_dict(), 'state_dict_rgb.pth')

print('Finished Training')

import pytorch_nndct

import torch
from torch.utils.data import DataLoader
from torch import nn
import torchvision.transforms as transforms
import torch.optim as optim
from dataset import LaneDataset
from model_reg import Net

from pytorch_nndct.apis import torch_quantizer, dump_xmodel
import sys

quant_mode = sys.argv[1]
if quant_mode == 'calib':
  batch_size = 100
elif quant_mode == 'test':
  batch_size = 1

transform = transforms.Compose([
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

test_dataset = LaneDataset('dataset.csv', transform=transform)
testloader = DataLoader(test_dataset, batch_size=batch_size)

net = Net()

net.load_state_dict(torch.load('state_dict_rgb.pth'))
#net.eval()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')
device = torch.device(device)

input = torch.randn([batch_size, 3, 240, 640])
quantizer = torch_quantizer(quant_mode, net, (input), device=device)
quant_model = quantizer.quant_model

quant_model.eval()
quant_model.to(device)


def evaluate(model, val_loader, loss_fn):
  correct = 0.0
  total = 0
  for data in val_loader:
      inputs, labels = data
      inputs = inputs.to(device)
      labels = labels.to(device)
      # calculate outputs by running images through the network
      outputs = model(inputs)
      total += labels.size(0)
      correct += (labels - outputs).abs().sum()
      #print("Group ",(labels - outputs))
  print(f'avg err {correct / total}')

loss_fn = nn.MSELoss().to(device)

evaluate(quant_model, testloader, loss_fn)

if quant_mode == 'calib':
  quantizer.fast_finetune(evaluate, (quant_model, testloader, loss_fn))
elif quant_mode == 'test':
  quantizer.load_ft_param()

# export config
if quant_mode == 'calib':
  quantizer.export_quant_config()
if quant_mode == 'test':
  quantizer.export_xmodel()

--- a/platforms/vivado/kv260_ispMipiRx_vcu_DP/scripts/main.tcl
+++ b/platforms/vivado/kv260_ispMipiRx_vcu_DP/scripts/main.tcl
@@ -3,7 +3,7 @@
 
 set proj_name kv260_ispMipiRx_vcu_DP
 set proj_dir ./project
-set proj_board [get_board_parts "*:kv260:*" -latest_file_version]
+set proj_board [get_board_parts "*:kv260_som:*" -latest_file_version]
 set bd_tcl_dir ./scripts
 set board vision_som
 set device k26
@@ -28,7 +28,7 @@ set_property board_part $proj_board [current_project]
 
 import_files -fileset constrs_1 $xdc_list
 
-set_property board_connections {som240_1_connector xilinx.com:som240:som240_1_connector:1.0}  [current_project]
+set_property board_connections {som240_1_connector xilinx.com:kv260_carrier:som240_1_connector:1.2}  [current_project]
 
 
 set_property ip_repo_paths $ip_repo_path [current_project]

// SPDX-License-Identifier: GPL-2.0
/*
 * dts file for Xilinx KV260 smartcam
 *
 * (C) Copyright 2020 - 2021, Xilinx, Inc.
 *
 */

/dts-v1/;
/plugin/;
#include <dt-bindings/thermal/thermal.h>

&fpga_full {
	#address-cells = <2>;
	#size-cells = <2>;
	firmware-name = "vai20-2uart.bit.bin";
	resets = <&zynqmp_reset 116>, <&zynqmp_reset 117>, <&zynqmp_reset 118>, <&zynqmp_reset 119>;
};

&uart0 {
	status = "okay";
};

&zynqmp_dpsub {
	status = "okay";
};

&zynqmp_dp_snd_pcm0 {
	status = "okay";
};

&zynqmp_dp_snd_pcm1 {
	status = "okay";
};

&zynqmp_dp_snd_card0 {
	status = "okay";
};

&zynqmp_dp_snd_codec0 {
	status = "okay";
};

&amba {
	afi0: afi0 {
		compatible = "xlnx,afi-fpga";
		config-afi = <0 0>, <1 0>, <2 0>, <3 0>, <4 0>, <5 0>, <6 0>, <7 0>, <8 0>, <9 0>, <10 0>, <11 0>, <12 0>, <13 0>, <14 0x0>, <15 0x000>;
	};

	clocking0: clocking0 {
		#clock-cells = <0>;
		assigned-clock-rates = <99999001>;
		assigned-clocks = <&zynqmp_clk 71>;
		clock-output-names = "fabric_clk";
		clocks = <&zynqmp_clk 71>;
		compatible = "xlnx,fclk";
	};

	clocking1: clocking1 {
		#clock-cells = <0>;
		assigned-clock-rates = <99999001>;
		assigned-clocks = <&zynqmp_clk 72>;
		clock-output-names = "fabric_clk";
		clocks = <&zynqmp_clk 72>;
		compatible = "xlnx,fclk";
	};

	/* fpga clocks */
	misc_clk_0: misc_clk_0 {
		#clock-cells = <0x0>;
		clock-frequency = <99999000>;
		compatible = "fixed-clock";
	};

	misc_clk_1: misc_clk_1 {
		#clock-cells = <0x0>;
		clock-frequency = <199998000>;
		compatible = "fixed-clock";
	};

	misc_clk_2: misc_clk_2 {
		#clock-cells = <0x0>;
		clock-frequency = <299997000>;
		compatible = "fixed-clock";
	};

	misc_clk_5: misc_clk_5 {
		#clock-cells = <0x0>;
		clock-frequency = <49999500>;
		compatible = "fixed-clock";
	};

	misc_clk_6: misc_clk_6 {
		#clock-cells = <0x0>;
		clock-frequency = <18432019>;
		compatible = "fixed-clock";
	};

	/* ar1335 isp mipi rx pipeline */
	ap1302_clk: sensor_clk {
		#clock-cells = <0x0>;
		compatible = "fixed-clock";
		clock-frequency = <0x48000000>;
	};

	ap1302_vdd: fixedregulator@0 {
		compatible = "regulator-fixed";
		regulator-name = "ap1302_vdd";
		regulator-min-microvolt = <2800000>;
		regulator-max-microvolt = <2800000>;
		enable-active-high;
	};

	ap1302_vaa: fixedregulator@1 {
		compatible = "regulator-fixed";
		regulator-name = "ap1302_vaa";
		regulator-min-microvolt = <1800000>;
		regulator-max-microvolt = <1800000>;
	};

	ap1302_vddio: fixedregulator@2 {
		compatible = "regulator-fixed";
		regulator-name = "ap1302_vddio";
		regulator-min-microvolt = <1200000>;
		regulator-max-microvolt = <1200000>;
	};

	serial@80010000 {
		compatible = "xlnx,xps-uartlite-1.00.a";
		reg = <0 0x80010000 0 0x10000>;
		interrupt-parent = <&gic>;
		interrupts = <0 107 4>;
		clock = <100000000>;
	};

	isp_csiss: csiss@80000000 {
		compatible = "xlnx,mipi-csi2-rx-subsystem-5.0";
		reg = <0x0 0x80000000 0x0 0x10000>;
		clock-names = "lite_aclk", "dphy_clk_200M", "video_aclk";
		clocks = <&misc_clk_0>, <&misc_clk_1>, <&misc_clk_2>;
		interrupt-parent = <&gic>;
		interrupts = <0 104 4>;
		xlnx,csi-pxl-format = <0x18>;
		xlnx,axis-tdata-width = <32>;
		xlnx,max-lanes = <4>;
		xlnx,en-active-lanes;
		xlnx,vc = <4>;
		xlnx,ppc = <2>;
		xlnx,vfb;

		ports {
			#address-cells = <0x1>;
			#size-cells = <0x0>;

			port@1 {
				reg = <0x1>;
				xlnx,video-format = <0x3>;
				xlnx,video-width = <0x8>;

				isp_csiss_out: endpoint {
					remote-endpoint = <&isp_vcap_csi_in>;
				};
			};
			port@0 {
				reg = <0x0>;
				xlnx,video-format = <0x3>;
				xlnx,video-width = <0x8>;

				isp_csiss_in: endpoint {
					data-lanes = <1 2 3 4>;
					remote-endpoint = <&isp_out>;
				};
			};
		};
	};

	isp_fb_wr_csi: fb_wr@b0010000 {
		compatible = "xlnx,axi-frmbuf-wr-v2.1";
		reg = <0x0 0xb0010000 0x0 0x10000>;
		#dma-cells = <1>;
		interrupt-parent = <&gic>;
		interrupts = <0 105 4>;
		xlnx,vid-formats = "nv12";
		reset-gpios = <&gpio 78 1>;
		xlnx,dma-addr-width = <32>;
		xlnx,pixels-per-clock = <2>;
		xlnx,max-width = <3840>;
		xlnx,max-height = <2160>;
		clocks = <&misc_clk_2>;
		clock-names = "ap_clk";
	};

	isp_vcap_csi {
		compatible = "xlnx,video";
		dmas = <&isp_fb_wr_csi 0>;
		dma-names = "port0";

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
				reg = <0>;
				direction = "input";

				isp_vcap_csi_in: endpoint {
					remote-endpoint = <&isp_csiss_out>;
				};
			};
		};
	};

	/* vcu encode/decode */
	vcu: vcu@80100000 {
		#address-cells = <2>;
		#clock-cells = <1>;
		#size-cells = <2>;
		clock-names = "pll_ref", "aclk", "vcu_core_enc", "vcu_core_dec", "vcu_mcu_enc", "vcu_mcu_dec";
		clocks = <&misc_clk_5>, <&misc_clk_0>, <&vcu 1>, <&vcu 2>, <&vcu 3>, <&vcu 4>;
		compatible = "xlnx,vcu-1.2", "xlnx,vcu";
		interrupt-names = "vcu_host_interrupt";
		interrupt-parent = <&gic>;
		interrupts = <0 106 4>;
		ranges;
		reg = <0x0 0x80140000 0x0 0x1000>, <0x0 0x80141000 0x0 0x1000>;
		reg-names = "vcu_slcr", "logicore";
		reset-gpios = <&gpio 80 0>;
		xlnx,skip-isolation;

		al5e: al5e@80100000 {
			compatible = "al,al5e-1.2", "al,al5e";
			interrupt-parent = <&gic>;
			interrupts = <0 106 4>;
			reg = <0x0 0x80100000 0x0 0x10000>;
		};

		al5d: al5d@80120000 {
			compatible = "al,al5d-1.2", "al,al5d";
			interrupt-parent = <&gic>;
			interrupts = <0 106 4>;
			reg = <0x0 0x80120000 0x0 0x10000>;
		};
	};

	/* zocl */
	zocl: zyxclmm_drm {
		compatible = "xlnx,zocl";
		status = "okay";
		interrupt-parent = <&gic>;
		interrupts = <0 89  4>, <0 90  4>, <0 91  4>, <0 92  4>,
			     <0 93  4>, <0 94  4>, <0 95  4>, <0 96  4>;
	};
};

/*
 * This must come after ap1302_clk!
 * The ap1302 driver is buggy and can't handle probe deferring.
 */
&i2c0 {
	status = "okay";
	i2c_mux: i2c-mux@74 {
		compatible = "nxp,pca9546";
		#address-cells = <1>;
		#size-cells = <0>;
		reg = <0x74>;
		i2c@0 {
			#address-cells = <1>;
			#size-cells = <0>;
			reg = <0>;
			ap1302: isp@3c {
				compatible = "onnn,ap1302";
				reg = <0x3c>;
				#address-cells = <1>;
				#size-cells = <0>;
				reset-gpios = <&gpio 79 1>;
				clocks = <&ap1302_clk>;
				sensors {
					#address-cells = <1>;
					#size-cells = <0>;
					onnn,model = "onnn,ar1335";
					sensor@0 {
						reg = <0>;
						vdd-supply = <&ap1302_vdd>;
						vaa-supply = <&ap1302_vaa>;
						vddio-supply = <&ap1302_vddio>;
					};
				};
				ports {
					#address-cells = <1>;
					#size-cells = <0>;
					port@0 {
						reg = <2>;
						isp_out: endpoint {
							remote-endpoint = <&isp_csiss_in>;
							data-lanes = <1 2 3 4>;
						};
					};
				};
			};
		};
	};
};

#include <iostream>
#include <xir/graph/graph.hpp>
#include <vart/runner.hpp>
#include <vart/runner_ext.hpp>
#include <cmath>

#include <opencv2/opencv.hpp>

#include <ros/ros.h>
#include <ackermann_msgs/AckermannDriveStamped.h>

int main(int argc, char *argv[]) {
    ros::init(argc, argv, "lane_dl_xlnx");
    ros::NodeHandle n;
    ros::NodeHandle pn("~");

    std::string cmd_topic, cam_path, model_path;
    // node parameters
    pn.param<std::string>("cmd_topic", cmd_topic, "ackermann_cmd");
    pn.param<std::string>("cam", cam_path, "/dev/video0");
    pn.param<std::string>("model", model_path, "net.xmodel");
    // publisher for the speed command
    ros::Publisher speed_pub = n.advertise<ackermann_msgs::AckermannDriveStamped>(cmd_topic, 1);

    // load the compiled mode.
    auto graph = xir::Graph::deserialize(model_path);
    auto root = graph->get_root_subgraph();
    xir::Subgraph *subgraph = nullptr;
    // and get the DPU subgraph
    for (auto c: root->children_topological_sort()) {
        CHECK(c->has_attr("device"));
        if (c->get_attr<std::string>("device") == "DPU") {
            subgraph = c;
            break;
        }
    }
    if (!subgraph) {
        ROS_ERROR_STREAM("no dpu subgraph in xmodel.");
        exit(-1);
    }
    auto attrs = xir::Attrs::create();
    std::unique_ptr<vart::RunnerExt> runner = vart::RunnerExt::create_runner(subgraph, attrs.get());

    // get input & output tensor buffers
    auto input_tensor_buffers = runner->get_inputs();
    auto output_tensor_buffers = runner->get_outputs();
    ROS_ERROR_COND(input_tensor_buffers.size() != 1u, "unsupported model");
    ROS_ERROR_COND(output_tensor_buffers.size() != 1u, "unsupported model");

    // get input_scale & output_scale
    auto input_tensor = input_tensor_buffers[0]->get_tensor();
    auto input_fixpos = input_tensor->template get_attr<int>("fix_point");
    //ROS_ERROR_COND(input_fixpos != 7, "hack doesn't work. Write it properly...");
    auto input_scale = std::exp2f(1.0f * (float)input_fixpos);

    auto output_tensor = output_tensor_buffers[0]->get_tensor();
    auto output_fixpos = output_tensor->template get_attr<int>("fix_point");
    auto output_scale = std::exp2f(-1.0f * (float) output_fixpos);

    auto n_batch = input_tensor->get_shape().at(0);
    ROS_ERROR_COND(n_batch != 1, "unsupported batch size");
    auto n_height = input_tensor->get_shape().at(1);
    auto n_width = input_tensor->get_shape().at(2);
    auto n_channel = input_tensor->get_shape().at(3);

    ROS_WARN_STREAM("Model H" << n_height << " W" << n_width);

    // open the camera
    cv::VideoCapture cap;
    cap.open(cam_path.c_str());

    if (!cap.isOpened()) {
        ROS_ERROR_STREAM("cannot open " << cam_path);
        exit(-1);
    }

    cap.set(cv::CAP_PROP_FRAME_WIDTH, 640);
    cap.set(cv::CAP_PROP_FRAME_HEIGHT, 480);

    ackermann_msgs::AckermannDriveStamped pub_msg;
    double drive_speed;
    n.param<double>("speed", drive_speed, 0.3);  
    pub_msg.drive.speed = drive_speed;

    while(ros::ok()) {
        // get a frame
        cv::Mat frame;
        if (!cap.read(frame)) {
            ROS_ERROR("failed to capture a frame.");
            break;
        }
        // crop off the top half
        cv::Mat cropped_ref(frame, cv::Rect(0, 240, 640, 240));

        // preprocessing
        uint64_t data_in = 0u;
        size_t size_in = 0u;
        // set the input image and preprocessing
        std::tie(data_in, size_in) = input_tensor_buffers[0]->data(std::vector<int>{0, 0, 0, 0});
        CHECK_NE(size_in, 0u);
        auto *data_in_arr = (int8_t *) data_in;
        int parr = 0;
        for (auto row = 0; row < cropped_ref.rows; row++) {
            for (auto col = 0; col < cropped_ref.cols; col++) {
                auto pixel = cropped_ref.at<cv::Vec3b>(row, col);
                // CV image is BGR while our model input is RGB. Flip it.
                for (int pix = 2; pix >= 0; pix--) {
                    auto val = pixel[pix];
                    float val_float = ((((float)val / 255.0f) - 0.5f) / 0.5f) * input_scale;
                    val_float = std::max(std::min(val_float, 127.0f), -128.0f);
                    data_in_arr[parr++] = (int8_t) val_float;
                }
            }
        }
        // sync data for input
        for (auto &input: input_tensor_buffers) {
            input->sync_for_write(0, input->get_tensor()->get_data_size() /
                                     input->get_tensor()->get_shape()[0]);
        }
        // start the dpu
        auto v = runner->execute_async(input_tensor_buffers, output_tensor_buffers);

        // do ROS work while DPU is running
        ros::spinOnce();

        auto status = runner->wait((int) v.first, -1);
        ROS_ERROR_COND(status != 0, "failed to run dpu");
        // sync data for output
        for (auto &output: output_tensor_buffers) {
            output->sync_for_read(0, output->get_tensor()->get_data_size() /
                                     output->get_tensor()->get_shape()[0]);
        }
        uint64_t data_o = 0u;
        size_t size_o = 0u;
        std::tie(data_o, size_o) = output_tensor_buffers[0]->data(std::vector<int>{0, 0});
        // convert the int value back to float
        int8_t val = *(int8_t *) data_o;
        float val_fp = (float) val * output_scale;
        ROS_DEBUG_STREAM("Deg int :" << (int)val << " dec " << val_fp);
        // publish the speed command
        pub_msg.drive.steering_angle = val_fp;
        speed_pub.publish(pub_msg);
    }
    speed_pub.publish(ackermann_msgs::AckermannDriveStamped());
    ros::spinOnce();
    return 0;
}

cmake_minimum_required(VERSION 3.0.2)
project(lane_dl_xlnx)

## Compile as C++11, supported in ROS Kinetic and newer
# add_compile_options(-std=c++11)

## Find catkin macros and libraries
## if COMPONENTS list like find_package(catkin REQUIRED COMPONENTS xyz)
## is used, also find other catkin packages
find_package(catkin REQUIRED COMPONENTS
  roscpp
  ackermann_msgs
  sensor_msgs
)

## System dependencies are found with CMake's conventions
# find_package(Boost REQUIRED COMPONENTS system)

find_package(OpenCV REQUIRED)
find_package(xir REQUIRED)
find_package(unilog REQUIRED)
find_package(vart REQUIRED COMPONENTS util runner dpu-controller xrt-device-handle buffer-object runner-assistant)

catkin_package()

## Specify additional locations of header files
include_directories(
  ${catkin_INCLUDE_DIRS}
  ${OpenCV_INCLUDE_DIRS}
  ${xir_INCLUDE_DIRS}
  ${vart_INCLUDE_DIRS}
)

## Declare a C++ executable
add_executable(${PROJECT_NAME}_node src/main.cpp)

## Add cmake target dependencies of the executable
## same as for the library above
add_dependencies(${PROJECT_NAME}_node ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})

## Specify libraries to link a library or executable target against
target_link_libraries(${PROJECT_NAME}_node
  ${catkin_LIBRARIES} ${OpenCV_LIBS} vart::runner
)

## Mark executables for installation
install(TARGETS ${PROJECT_NAME}_node
  RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
)

Credits

Chuanhong Guo

1 project • 0 followers

Contact

Yankui Wang

0 projects • 0 followers

Contact

Comments

Please log in or sign up to comment.

Awards

Edge Computing 3rd Place

Adaptive Computing Challenge 2021

Self-driving Car Based on Learning from Vision Demonstration

Things used in this project

Hardware components

Software apps and online services

Story

Part 1: Collect the Training Data

Part 2: Build and Train Our Network

Part 3: Setup KV260

Part 4: Deploy the model

Result

Code

extractor.py

lane_dl_xlnx.launch

dataset.py

model_reg.py

train.py

quantize_xlnx.py

kv260_vitis_2021.2.patch

vai20-2uart.dtsi

lane_dl_xlnx.cpp

CMakeLists.txt

Credits

Chuanhong Guo

Yankui Wang

Comments

Awards

Embed the widget on your own site

Self-driving Car Based on Learning from Vision Demonstration

Self-driving Car Based on Learning from Vision Demonstration

Things used in this project

Hardware components

Software apps and online services

Story

Part 1: Collect the Training Data

Part 2: Build and Train Our Network

Part 3: Setup KV260

Part 4: Deploy the model

Result

Code

extractor.py

lane_dl_xlnx.launch

dataset.py

model_reg.py

train.py

quantize_xlnx.py

kv260_vitis_2021.2.patch

vai20-2uart.dtsi

lane_dl_xlnx.cpp

CMakeLists.txt

Credits

Chuanhong Guo

Yankui Wang

Comments

Awards

Related channels and tags