In this project, I will document the build of a Donkey Car enhanced with an advanced vision system using FPGA accelerated stereo vision and LiDAR.
The project consists of the following main components:
- Raspberry Pi 3 featuring 4 x Cortex-A53 cores - runs the AI that drives the car
- Avnet / Xilinx Ultra96 Board - used for video acquisition and processing. The high speed MIPI CSI-2 input of two OV5647 cameras are captured using 2 video pipelines implemented in the programmable logic (FPGA) part of the Xilinx Zynq UltraScale+ MPSoC. The captured images are processed by an OpenCV / xfOpenCV application. Using a hardware accelerated stereo matching algorithm (xfOpenCV StereoBM) is applied on the input images and the depth of the scene is computed in real-time
- Donkey Car - an RC car used for PoC
- 2 x OV5647 5 MegaPixel NoIR Camera
- TFMini LiDAR Sensor
The build of a Donkey Car is pretty well documented.
We can start with hardware part:
- First from our RC car we need to remove the decorative plastic shell and some parts from casing. For RC cars from the kits provided by ARM we can follow this video.
- Then we can continue with the standard build process.
- The result should look like:
The software part consists of:
- Installing software to the SD card.
- Calibrating the PWM values for the throttle and steering.
- Starting the web interface and getting a test drive.
The first thing I tried with the Avnet Ultra96 board, (of sure without reading the getting started guide :D), was to plug in a cable in its micro USB source and see if it powers up. It didn't.
It turned out the Ultra96 must be powered trough the jack connector with 12V 2A power supply. The jack connector is a EIAJ-03 (with 4.75mm outside and 1.7mm inside diameter), which apparently is not a very usual one. As I didn't found such a connector in the house, and (being August) the local electronics store is closed, I decided to "temporally" just solder two wires on the connector pins from the bottom of the board.
After double checking that nothing is shorted out, I was able to power the board from a bench supply (with a 12V and 2.10A current limit):
A powering it up, the board Ultra96 boots in about 30-40 and creates a WiFi network with the name Ultra96_<Mac_Address>
.
To access the Ultra96 we need connect this network and access 192.168.2.1 from a browser. An user interface as bellow should be shown. This allows:
- Running examples - for example, controlling the on-board LEDs
- Changing settings - for example, changing the WiFi settings and connecting to my home network:
But, the Ultra96 has more interesting features than this. It has an FPGA!
Vivado Design Suite
To create an FPGA-based project, we need to install the Xilinx Vivado Design Suite. This takes some time, but the result should look like:
Now we can start creating a sample project. I followed this video tutorial, which shows how to create the hardware variant of blinking an LED.
The Xilinx SDK is used to build the Boot image:
The result will be a BOOT.bin file, which should be copied to the SD card. The Ultra96 will boot from the SD card and will program the FPGA with the provided image.
The Blinky LED example should output an about 2-3 Hz square wave to the pin 3 of the low speed header.
As the output is at 1.8V logic level and I didn't have an extension board with level shifter yet, I used my multi-meter and oscilloscope to inspect the signal.
The LiDAR(Light Detection and Ranging) sensor I used is a TFMini from Benewake.
The sensor measures distance by measuring the time-of-flight of a light signal reflected by the measured object. The distance to the object is then calculated using the speed of light.
Benewake has a demo app that can be used to test the sensor:
Also, it is recommended to update the firmware to the latest version (if not already):
The sensor measures distance in a single direction. This is not so useful, so I decided to build a simple rotating platform to allows measuring distance in 180° degrees. The sensor mounted on a servo motor with allows 180° rotation. In this way the LiDAR sensor is able to measure the distance to the objects surrounding the Donkey Car. Using a collected data a floor map can be built.
The TFmini sensor communicates over Serial connection(115200 baud) using a custom protocol. The VCC (5V), GND, TX and RX (3.3V) cables of the sensor are connected to the appropriate pins of the Raspberry Pi.
After the hardware serial port of the Raspberry Pi is enabled from raspi-config
, we should be able to receive data over the /dev/ttyS0
port.
The protocol and data format used by the TFmini is a custom one. The readings come in packets of 8 bytes with the following format:
There is an example code on this instructable that can be used to test the sensor.
The servo motor is connected to the channel 15 of the servo controller board. The donkey calibrate
command can be used to test the servo and find out the min and max pulse values:
$ donkey calibrate --channel 15
Having the min and max values, I created a small script that moves the servo and prints out the values read from the TFMini sensor.
5. Creating a Donkey Car Software PartThe next step is to integrate our new sensor into the Donkey car software. To do this we need to implement a new Vehicle part (a threaded one), which is basically a Python class, TFMiniLidar
in my case, with the following methods:
update(self)
- this method is called at start and it will the entry point of the sensor's thread. It's basically an infinite loop the controls the servo motor and collects the sensor data. The sensor data is stored in an array of 90 elements calledframe
. Each values represents the distance reading for a 2° slice.run_threaded(self)
- this method is called periodically an should return the output of the part. In our case we return a copy of the fullframe
array. With this technique the Donkey car software is always getting values for the full 180° viewport, regardless of the current rotation of the sensor.
Having our TFMiniLidar
part, we can now update manage.py
to use the new sensor:
lidar = TFMiniLidar()
V.add(lidar, outputs=['lidar/dist_array'], threaded=True)
The output of the LiDAR sensor can be added as inputs for the another Vehicle
part:
- the Web Controller
- the Auto Pilot
# Run the pilot if the mode is not user.
kl = KerasCategorical()
if model_path:
kl.load(model_path)
V.add(kl, inputs=['cam/image_array', 'lidar/dist_array'],
outputs=['pilot/angle', 'pilot/throttle'],
run_condition='run_pilot')
- the Tub writer - (note: the
TubWritter
needs to beenhanced to be able to work with thendarray
data type)
# add tub to save data
inputs = ['cam/image_array', 'lidar/dist_array', 'user/angle', 'user/throttle', 'user/mode', 'timestamp']
types = ['image_array', 'ndarray', 'float', 'float', 'str', 'str']
# single tub
tub = TubWriter(path=cfg.TUB_PATH, inputs=inputs, types=types)
V.add(tub, inputs=inputs, run_condition='recording')
The Web Controller was also updated to show the LiDAR sensor data:
- the
WebController
part was updated to takelidar/dist_array
as input - a
LidarHandler
was added to serve the/lidar
path - this is implemented a Web Socket handler, which sends the sensor data as a JSON array periodically vehicle.html
andstyle.css
were modified to add a transparent HTML5 canvas over the camera imagemain.js
was updated to open a WebSocket connection to/lidar
and consume the sensor data. Then the sensor data is plotted on the HTML5 canvas in a semicircular pattern
Now we can start driving to record some data:
6. Training an Auto PilotNow, we want to create an auto-pilot that uses data collected by our LiDAR sensor. We have the sensor data saved in the tub files, but we also need to adjust the training model to use the new data.
The default model used by the Donkey car is the default_categorical
. We will add to this a new input. The default_categorical
consists of:
- an input layer, the camera image
- four 2D convolution layers - these does the pattern recognition in the images
- a flatten layer - converts the output of the previous layers to 1D
- two densely-connected NN layer, each combined with a dropout (10%) layer
We can add a LiDAR input layer after the flatten layer. We can use the concatenate()
function to append the LiDAR input to the output of the previous layers:
...x = Flatten(name='flattened')(x) # Flatten to 1D (Fully connected)
lidar_in = Input(shape=(90,), name='lidar_in')
x = concatenate([x, lidar_in])...model = Model(inputs=[img_in, lidar_in], outputs=[angle_out, throttle_out])
Now we can include the LiDAR data to the input of the training algorithm:
def train(cfg, tub_names, new_model_path, base_model_path=None ):
"""
use the specified data in tub_names to train an artifical neural network
saves the output trained model as model_name
"""
X_keys = ['cam/image_array', 'lidar/dist_array'] ...
After this, we can run the training on the recorded data:
$ python ./mycar/manage.py train --tub mycar/tub/ --model ./mycar/models/lidar_pilot
using donkey v2.5.1 ...
loading config file: /home/bluetiger/dev/DonkeyCar/mycar/config.py
config loaded
tub_names mycar/tub/
train: 8896, validation: 2224
steps_per_epoch 69
Epoch 1/100...69/69 [==============================] - 65s 938ms/step - loss: 11.5231 - angle_out_loss: 12.7967 - throttle_out_loss: 0.6057 - val_loss: 11.6602 - val_angle_out_loss: 12.9490 - val_throttle_out_loss: 0.6021
Epoch 00007: early stopping
PART 3: Stereo Vision7. Second Camera ModuleThe Donkey Car Kit comes with a with a No IR Wide Angle FOV160° 5-Megapixel Camera Module, a camera module based on the Omnivision OV5647 sensor.
For stereo vision, we need two cameras, so I purchased one more module from eBay.
The OV5647 sensor uses MIPI Camera Serial Interface (version 2) for communication. The Ultra96 boards supports up to two CSI-2 cameras. This seems to be confirmed by the Avnet's Ultra96 Hardware User’s Guide too.
To connect the two cameras, I used the AISTARVISION MIPI Adapter (v2.1)96Boards adapter board found on eBay.
The board allows connecting two CSI-2 cameras. It accepts cameras with different types of connectors, including the flex cable type used by the Raspberry Pi cameras.
For stereo vision, two cameras need to be placed side by side. For a first test, I decided to try a 50 mm distance between them.
To mount two cameras on the Donkey Car, I designed and printed a dual camera mount that fits onto the original camera mount.
The MIPI CSI-2 is an industry standard for camera devices used in mobile devices. Its psychical interface, the so called D-PHY uses high speed differential signalling. A clock lane and up to 4 data lanes (the OV5647 uses two) are used to transmit the image frames. The configuration of the camera is done using an I2C compatible interface.
The Xilinx UltraScale+ MPSoC-s, including the ZU3EG used in the Ultra96 board, has I/O pins capable of differential signaling. On the Ultra96 these are routed to the High Speed connector's MIPI CSI-2 and DSI pins, as defined in the 96boards CE specs.
To be able to receive images we need to implement a video pipeline in the Programmable Logic. To build the video pipeline we will use some pre-built Video IP-s from Xilinx. The image data is transmitted between the components using the AXI Stream links, while for configuration each component has an AXI Lite slave interface.
The video pipeline has the following components:
- MIPI CSI-2 Rx Subsystem - a full MIPI CSI-2 receiver implementation, that provides the D-PHY signal processing and manages all the MIPI related stuff - it outputs the raw image (RAW8 format in the case of OV5647) data received by image sensor
- Sensor Demosaic - does the debayering on the RAW8 data - outputs 24-bit RGB image - the image sensors have pixels detecting red, green or blue light. These arranged in a so called "bayer" (chessboard like) pattern. The sensors outputs raw data, in RAW8 format. This means each pixel is has information only about a single color (red, green or blue). This is not too convenient for image processing, so may want to interpolate the image data to have all the tree colors for each pixel. This process is called debayering.
- Gamma LUT - does Gamma correction based on look-up-table - uses 24-bit RGB for both in input and output
- a Video Processing Subsystem doing Color Space Conversion - capable to color correction task including contrast, brightness, and red/green/blue gain control - also uses 24-bit RGB for both in input and output
- a Video Processing Subsystem in Scale Only configuration - provides scaling, color space conversion, and chroma re-sampling functionality - also converts the pixels from 24-bit RGB to the more compact YUV 4:2:2 format (24 bit vs. 8 bit per pixel)
- Frame Buffers Write - streams the image to the DDR memory
- additionally the reset pins to video pipeline components are connected to EMIO GPIO pins
The components are configured as follows:
Note: the UG1221 - Zynq UltraScale+ MPSoC Base Targeted Reference Design was used as reference to implement the video pipeline.
11. Building PetaLinuxTo be able to use the Video Pipeline we need to use PetaLinux, and we need to build it with a custom hardware.
We will use the V4L2 (Video for Linux) driver infrastructure to control our video pipeline and expose the captured image in a standard interface as a /dev/video0
node.
To create the custom hardware, I started with the official Ultra96 BSP and along with some other changes, added the above video pipeline. The design looks like:
After this, we need to open the Elaborated Design and in the I/O Ports panel we need to configure the MIPI D-PHY pins as clk_lane = N2
, data_lane_0 = N5
, data_lane_1 = M2
.
After this we can run the Synthesis and Implementation. The result should look like:
If these were successful we can generate a Bitstream and we can Export hardware platform (including bitstream) for PetaLinux.
The next step is to import the new hardware platform in the Ultra96 PetaLinux project:
$ petalinux-config --get-hw-description=../ultra96-mipi/ultra96-mipi.sdk
To be recognized by Linux, we need to add our Video Pipeline components to the Device Tree. In a PetaLinux build we do this by editing system-user.dtsi
:
/*
Notes:
- EMIO = &gpio0 + 78
- AXI Clock = clocking_wizard_clk2
*/
/{
cam_clk: cam_clk {
#clock-cells = <0>;
compatible = "fixed-clock";
clock-frequency = <25000000>;
};
clocking_wizard_clk2: clocking_wizard_clk2@0 {
#clock-cells = <0>;
compatible = "fixed-factor-clock";
clocks = <&clk 71>; /* fclk0 */
clock-div = <6>;
clock-mult = <12>;
};
};
&fclk0 {
status = "okay";
};
&i2csw_2 {
ov5647_0: camera@36 {
compatible = "ovti,ov5647";
reg = <0x36>;
clocks = <&cam_clk>;
status = "okay";
port {
ov5647_0_to_mipi_csi2_rx_0: endpoint {
remote-endpoint = <&mipi_csi2_rx_0_from_ov5647_0>;
clock-lanes = <0>;
data-lanes = <1 2>;
};
};
};
};
&mipi_csi2_rx0_mipi_csi2_rx_subsyst_0 {
compatible = "xlnx,mipi-csi2-rx-subsystem-3.0";
reset-gpios = <&gpio 90 GPIO_ACTIVE_LOW>;
xlnx,max-lanes = <0x2>;
xlnx,vc = <0x4>;
xlnx,csi-pxl-format = "RAW8";
xlnx,vfb;
xlnx,dphy-present;
xlnx,ppc = <0x2>;
xlnx,axis-tdata-width = <0x20>;
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
xlnx,video-format = <XVIP_VF_RBG>;
xlnx,video-width = <8>;
mipi_csi2_rx_0_to_demosaic_0: endpoint {
remote-endpoint = <&demosaic_0_from_mipi_csi2_rx_0>;
};
};
port@1 {
reg = <1>;
xlnx,video-format = <XVIP_VF_RBG>;
xlnx,video-width = <8>;
mipi_csi2_rx_0_from_ov5647_0: endpoint {
data-lanes = <1 2>;
remote-endpoint = <&ov5647_0_to_mipi_csi2_rx_0>;
};
};
};
};
&mipi_csi2_rx0_v_demosaic_0 {
compatible = "xlnx,v-demosaic";
clocks = <&clocking_wizard_clk2>;
reset-gpios = <&gpio 85 GPIO_ACTIVE_LOW>;
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
xlnx,video-width = <8>;
demosaic_0_from_mipi_csi2_rx_0: endpoint {
remote-endpoint = <&mipi_csi2_rx_0_to_demosaic_0>;
};
};
port@1 {
reg = <1>;
xlnx,video-width = <8>;
demosaic_0_to_gamma_lut_0: endpoint {
remote-endpoint = <&gamma_lut_0_from_demosaic_0>;
};
};
};
};
&mipi_csi2_rx0_v_gamma_lut_0 {
compatible = "xlnx,v-gamma-lut";
clocks = <&clocking_wizard_clk2>;
reset-gpios = <&gpio 86 GPIO_ACTIVE_LOW>;
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
xlnx,video-width = <8>;
gamma_lut_0_from_demosaic_0: endpoint {
remote-endpoint = <&demosaic_0_to_gamma_lut_0>;
};
};
port@1 {
reg = <1>;
xlnx,video-width = <8>;
gamma_lut_0_to_csc_0: endpoint {
remote-endpoint = <&csc_0_from_gamma_lut_0>;
};
};
};
};
&mipi_csi2_rx0_v_proc_ss_csc_0 {
compatible = "xlnx,v-vpss-csc";
clocks = <&clocking_wizard_clk2>;
reset-gpios = <&gpio 87 GPIO_ACTIVE_LOW>;
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
xlnx,video-format = <XVIP_VF_RBG>;
xlnx,video-width = <8>;
csc_0_from_gamma_lut_0: endpoint {
remote-endpoint = <&gamma_lut_0_to_csc_0>;
};
};
port@1 {
reg = <1>;
xlnx,video-format = <XVIP_VF_RBG>;
xlnx,video-width = <8>;
csc_0_to_scaler_0: endpoint {
remote-endpoint = <&scaler_0_from_csc_0>;
};
};
};
};
&mipi_csi2_rx0_v_proc_scaler_0 {
compatible = "xlnx,v-vpss-scaler";
clocks = <&clocking_wizard_clk2>;
reset-gpios = <&gpio 88 GPIO_ACTIVE_LOW>;
xlnx,num-hori-taps = <8>;
xlnx,num-vert-taps = <8>;
xlnx,pix-per-clk = <2>;
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
xlnx,video-format = <XVIP_VF_RBG>;
xlnx,video-width = <8>;
scaler_0_from_csc_0: endpoint {
remote-endpoint = <&csc_0_to_scaler_0>;
};
};
port@1 {
reg = <1>;
xlnx,video-format = <XVIP_VF_YUV_422>;
xlnx,video-width = <8>;
scaler_0_to_vcap_0: endpoint {
remote-endpoint = <&vcap_0_from_scaler_0>;
};
};
};
};
&mipi_csi2_rx0_v_frmbuf_wr_0 {
#dma-cells = <1>;
compatible = "xlnx,axi-frmbuf-wr-v2.1";
reset-gpios = <&gpio 89 GPIO_ACTIVE_LOW>;
xlnx,dma-addr-width = <32>;
xlnx,vid-formats = "yuyv","uyvy","y8";
xlnx,pixels-per-clock = <2>;
};
&amba_pl {
vcap0: video_cap {
compatible = "xlnx,video";
dmas = <&mipi_csi2_rx0_v_frmbuf_wr_0 0>;
dma-names = "port0";
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
direction = "input";
vcap_0_from_scaler_0: endpoint {
remote-endpoint = <&scaler_0_to_vcap_0>;
};
};
};
};
};
Based on the Device Tree the Linux will load the appropriate kernel modules for the OV5647 and Xilinx Video IP components. We can build PetaLinux using:
$ petalinux-build
and we can create a boot-able image with:
$ petalinux-package --boot --fsbl components/plnx_workspace/fsbl/fsbl/Release/fsbl.elf --fpga ./project-spec/hw-description/design_1_wrapper.bit --pmufw components/plnx_workspace/pmu-firmware/pmu-firmware/Release/pmu-firmware.elf --u-boot --force
A successful build generated the following files in the images/linux
folder:
- the
BOOT.BIN
and theimage.ub
are the U-boot boot loader and the Linux kernel image - the files should be copied to the boot partition of the SD card - the
rootfs.ext4
is the root file system image - we should write it to the root partition of the SD card with thesudo dd if=images/linux/rootfs.ext4 of=/dev/mmcblk0p2
command
Now, we can boot in the Ultra96. I all went OK, we should see a /dev/video0
, a /dev/media0
and some /dev/v4l-subdev-*
device files.
Using media-ctl
we can check our video pipeline:
root@Ultra96:~# media-ctl -p
Media controller API version 4.14.0
Media device information
------------------------
driver xilinx-video
model Xilinx Video Composite Device
serial
bus info
hw revision 0x0
driver version 4.14.0
Device topology
- entity 1: video_cap output 0 (1 pad, 1 link)
type Node subtype V4L flags 0
device node name /dev/video0
pad0: Sink
<- "b0000000.v_proc_ss":1 [ENABLED]
- entity 5: ov5647 4-0036 (1 pad, 1 link)
type V4L2 subdev subtype Sensor flags 0
device node name /dev/v4l-subdev0
pad0: Source
-> "80120000.mipi_csi2_rx_subsystem":1 [ENABLED]
- entity 7: 80120000.mipi_csi2_rx_subsystem (2 pads, 2 links)
type V4L2 subdev subtype Unknown flags 0
device node name /dev/v4l-subdev1
pad0: Source
[fmt:RBG24/1920x1080 field:none]
-> "b0050000.v_demosaic":0 [ENABLED]
pad1: Sink
[fmt:RBG24/1920x1080 field:none]
<- "ov5647 4-0036":0 [ENABLED]
- entity 10: b0050000.v_demosaic (2 pads, 2 links)
type V4L2 subdev subtype Unknown flags 0
device node name /dev/v4l-subdev2
pad0: Sink
[fmt:SRGGB8/1280x720 field:none]
<- "80120000.mipi_csi2_rx_subsystem":0 [ENABLED]
pad1: Source
[fmt:RBG24/1280x720 field:none]
-> "b0070000.v_gamma_lut":0 [ENABLED]
- entity 13: b0070000.v_gamma_lut (2 pads, 2 links)
type V4L2 subdev subtype Unknown flags 0
device node name /dev/v4l-subdev3
pad0: Sink
[fmt:RBG24/1280x720 field:none]
<- "b0050000.v_demosaic":1 [ENABLED]
pad1: Source
[fmt:RBG24/1280x720 field:none]
-> "b0040000.v_proc_ss":0 [ENABLED]
- entity 16: b0040000.v_proc_ss (2 pads, 2 links)
type V4L2 subdev subtype Unknown flags 0
device node name /dev/v4l-subdev4
pad0: Sink
[fmt:RBG24/1280x720 field:none]
<- "b0070000.v_gamma_lut":1 [ENABLED]
pad1: Source
[fmt:RBG24/1280x720 field:none]
-> "b0000000.v_proc_ss":0 [ENABLED]
- entity 19: b0000000.v_proc_ss (2 pads, 2 links)
type V4L2 subdev subtype Unknown flags 0
device node name /dev/v4l-subdev5
pad0: Sink
[fmt:RBG24/1280x720 field:none]
<- "b0040000.v_proc_ss":1 [ENABLED]
pad1: Source
[fmt:UYVY/1920x1080 field:none]
-> "video_cap output 0":0 [ENABLED]
The V4L2 a successfully initialized. Initially, they have wrong formats / resolutions configured, so we need set the correct ones:
root@Ultra96:~# # MIPI RX:
root@Ultra96:~# media-ctl -v -d /dev/media0 -V '"80120000.mipi_csi2_rx_subsystem":0 [fmt:SBGGR8/640x480]'
Opening media device /dev/media0
Enumerating entities
Found 7 entities
Enumerating pads and links
Setting up format SBGGR8 640x480 on pad 80120000.mipi_csi2_rx_subsystem/0
Format set: SBGGR8 640x480
Setting up format SBGGR8 640x480 on pad b0050000.v_demosaic/0
Format set: SBGGR8 640x480
root@Ultra96:~# media-ctl -v -d /dev/media0 -V '"80120000.mipi_csi2_rx_subsystem":1 [fmt:SBGGR8/640x480]'
Opening media device /dev/media0
Enumerating entities
Found 7 entities
Enumerating pads and links
Setting up format SBGGR8 640x480 on pad 80120000.mipi_csi2_rx_subsystem/1
Format set: SBGGR8 640x480
root@Ultra96:~#
root@Ultra96:~# # Demosaic
root@Ultra96:~# media-ctl -v -d /dev/media0 -V '"b0050000.v_demosaic":1 [fmt:RBG24/640x480]'
Opening media device /dev/media0
Enumerating entities
Found 7 entities
Enumerating pads and links
Setting up format RBG24 640x480 on pad b0050000.v_demosaic/1
Format set: RBG24 640x480
Setting up format RBG24 640x480 on pad b0070000.v_gamma_lut/0
Format set: RBG24 640x480
root@Ultra96:~#
root@Ultra96:~# # Gamma LUT
root@Ultra96:~# media-ctl -v -d /dev/media0 -V '"b0070000.v_gamma_lut":1 [fmt:RBG24/640x480]'
Opening media device /dev/media0
Enumerating entities
Found 7 entities
Enumerating pads and links
Setting up format RBG24 640x480 on pad b0070000.v_gamma_lut/1
Format set: RBG24 640x480
Setting up format RBG24 640x480 on pad b0040000.v_proc_ss/0
Format set: RBG24 640x480
root@Ultra96:~#
root@Ultra96:~# # SS CSC
root@Ultra96:~# media-ctl -v -d /dev/media0 -V '"b0040000.v_proc_ss":1 [fmt:RBG24/640x480]'
Opening media device /dev/media0
Enumerating entities
Found 7 entities
Enumerating pads and links
Setting up format RBG24 640x480 on pad b0040000.v_proc_ss/1
Format set: RBG24 640x480
Setting up format RBG24 640x480 on pad b0000000.v_proc_ss/0
Format set: RBG24 640x480
root@Ultra96:~#
root@Ultra96:~# # SS SCALER
root@Ultra96:~# media-ctl -v -d /dev/media0 -V '"b0000000.v_proc_ss":1 [fmt:UYVY/640x480]'
Opening media device /dev/media0
Enumerating entities
Found 7 entities
Enumerating pads and links
Setting up format UYVY 640x480 on pad b0000000.v_proc_ss/1
Format set: UYVY 640x480
Now we can try to capture some frames using the yavta
utility:
root@Ultra96:~# width=640
root@Ultra96:~# height=480
root@Ultra96:~# size=${width}x${height}
root@Ultra96:~# frames=8
root@Ultra96:~# skip=0
root@Ultra96:~# root@Ultra96:~# yavta -c$frames -p -F --skip $skip -f UYVY -s $size /dev/video0
Device /dev/video0 opened.
Device `video_cap output 0' on `platform:video_cap:0' is a video output (without mplanes) device.
Video format set: UYVY (59565955) 640x480 field none, 1 planes:
* Stride 1280, buffer size 614400
Video format: UYVY (59565955) 640x480 field none, 1 planes:
* Stride 1280, buffer size 614400
8 buffers requested.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 0/0 mapped at address 0x7f9e00b000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 1/0 mapped at address 0x7f9df75000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 2/0 mapped at address 0x7f9dedf000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 3/0 mapped at address 0x7f9de49000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 4/0 mapped at address 0x7f9ddb3000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 5/0 mapped at address 0x7f9dd1d000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 6/0 mapped at address 0x7f9dc87000.
length: 1 offset: 4278322640 timestamp type/source: mono/EoF
Buffer 7/0 mapped at address 0x7f9dbf1000.
Press enter to start capture
0 (0) [-] none 0 0 B 209.798257 209.798333 23.472 fps ts mono/EoF
1 (1) [-] none 1 0 B 209.823763 209.823780 39.206 fps ts mono/EoF
2 (2) [-] none 2 0 B 209.849348 209.849361 39.085 fps ts mono/EoF
3 (3) [-] none 3 0 B 209.874935 209.874947 39.082 fps ts mono/EoF
4 (4) [-] none 4 0 B 209.900520 209.900532 39.085 fps ts mono/EoF
5 (5) [-] none 5 0 B 209.926106 209.926185 39.084 fps ts mono/EoF
6 (6) [-] none 6 0 B 209.951693 209.951764 39.082 fps ts mono/EoF
7 (7) [-] none 7 0 B 209.977279 209.977293 39.084 fps ts mono/EoF
Captured 8 frames in 0.221639 seconds (36.094572 fps, 0.000000 B/s).
8 buffers released.
This saves raw (UYVY) files, with frame-000xx.bin
names. I checked their content using an online tool:
Status: - WORKING :)- some kernel patches were also needed to get this work- more details about how we get this working can be found on the following forum post:MIPI CSI-2 RX Subsystem + OV5647 problem on Ultra96 (ZU3EG)
12. Adding a 2nd Video PipelineTo be able to use the second camera, we need to add a second MIPI CSI-2 interface to our Vivado design. To do this:
- copy and paste the
mipi_phy_if_0
input port - (anmipi_phy_if_1
port will be automatically added) - copy and paste the
mipi_csi2_rx0
block (anmipi_csi2_rx1
blockwill be automatically be added) - connect the
mipi_phy_if_1
to themipi_phy_if
pin of themipi_csi2_rx1
- connect the clock, reset, AXI, interrupt and gpio ports of
mip_csi2_rx1
, similar to themipi_csi2_rx0
ones - open
mipi_csi2_rx1
and edit MIPI CSI-2 Rx Subsystem and change D-PHY pins asclk_lane = T3
,data_lane_0 = P3
,data_lane_1 = U2
- change the Slice blocks for the reset pins to use the GPIO-s pins 12-16 (instead of 7-11)
After the changes the design should look like:
In the Address editor tab, we need to assign address ranges for the new entries.
After double checking that I/O ports are correct, we can run Synthesis and Implementation.
If these are successful we need to Generate a Bitstream and Export the Hardware.
Then we need to update the PetaLinux project with the new hardware definition file. In the Device Tree we will add the necessary nodes for the second video pipeline. After this we can build the project.
Status:- the Vivado and PetaLinux builds are both successful- the PetaLinux boots, one of the video pipeline works, but the another one NOT - it's unclear why. I'm are still trying to figure this out on the Xilinx forums:PetaLinux hangs on custom Ultra96 platformHow to keep a hardware platform stable between changes?
Analyzing some RAW frames (UYVY format) show that the noise is present in the Chroma (U, V) channels, but the Luma (Y) channel seems to be clear:
As the noise was present only in the Chroma (U, V) channels, it means we should be able to get some clear gray-scale images keeping just the Luma (Y) channel. To do this, I wrote an V4L2image capture module, that captures UYVY frames from the camera, drops the U, V channels and outputs gray-scale images:
So, this way I got some images to work with.
13. OpenCVTo do the image processing we will use OpenCV. To install it on the Ultra96 we can use the smart package manager:
$ smart install opencv
Then we can use the following Python snippet (source) to capture some frames from the camera:
import cv2
camera = cv2.VideoCapture(0)
for i in range(10):
return_value, image = camera.read()
cv2.imwrite('capture' + str(i) + '.png', image);
del(camera)
This saves 10 frames in PNG format:
Also, we can do image processing on the captured image. For example we can apply a Sobel filter pretty easily:
return_value, image = camera.read()
sobel_x = cv2.Sobel(image, cv2.CV_64F, 1, 0, ksize=5)
cv2.imwrite('sobel.png', sobel_x)
Doing some tests with OpenCV, I observed that the output image from the OV5647 was not aligned with the axis of the lens. I felt like the camera always looks up-right.
The OV5647 kernel driver turned out to be a little bit rudimentary. The only resolution supported was the 640x480 and this used just the upper left corner of the sensor area. This means the center of output image and the lens axis were offset from each other, making the impression the camera always looks up-right.
(Note: when connected to the Raspberry Pi the OV5647 camera is controlled by the video core of the Broadcom SoC's video core. Unfortunately, the video core runs proprietary code, sowe can't inspire from it)
To fix this, I tried to extend the OV5647 kernel drivers with additional configurations. The output resolution and sensor used area of the 5647 is controlled by a set of registers:
and a couple of other registers controlling things like sub-sampling, clock speeds and others.
I tried multiple resolutions, but not all of them worked, mainly because of the stability problems with the video pipeline:
- 1920x1080 (Full HD) - frames received by the MIPI RX, but wont got through the video pipeline
- 2560x1920, scaled down to 1280x960 by the VPSS Scaler - same problem as above
- 640x480, with the imaging window moved to center (blue box) - this worked after a couple of tries (note: the resolution is sub-sampled in the sensor from 1280x960)
- 1280x960, sub-sampled from 2560x1920 scaled down to 640x480 by the VPSS Scaler - this works
The following diagram difference between the above configurations:
The view port difference between the initial 640x480 resolution and the 640x480 resolution (down-scaled from 1280x960) looks like this:
The latest configuration the output image and the lens are aligned, and almost the full sensor area is used. This means a Donkey car has a large view port (a little bit better than the Raspberry Pi setup).
Note: the kernel patches are present in the PetaLinux files.
15. Stereo VisionIn image processing the stereo vision is the process of extracting 3D information from two 2D images. Usually, two horizontally displaced camera are used to obtain two views of the scene.
After this, a stereo matching algorithm will try to match the corresponding points from the two images. Using the image coordinates from the left (x1, y1) and right (x2, y2) images the real world coordinates of the point (x, y, z) are calculated. The output of a stereo matching algorithm is a disparity map, representing the difference in horizontal coordinates of the corresponding image pixels. The disparity values are inverse proportional to the distance of the object in the scene.
16. Camera CalibrationTo do stereo vision we need to do a camera calibration. The scope of the camera calibration is to determine the intrinsic (scene independent) and extrinsic (real world vs camera coordinate system) parameters of the two cameras, as well the relative position and rotation of the two cameras.
The calibration is done by taking a set of photos of a printed out chessboard pattern in different positions and orientation:
For a successful calibration about 40-60 image pairs (left + right) are needed:
The calibration is done in three main steps:
- identify the chessboard pattern's corners in each image -
cv::findChessboardCorners
can be used to do this - calibrate the two cameras and calculate the transformation between the two cameras - done using the
cv::stereoCalibrate
function - calculate the rotation an projection matrices for two cameras - done by calling the
cv::stereoRectify
function with the result of thecv::stereoCalibrate
function
The output of the cv::stereoRectify
are two rotation matrices R1, R2 and two projection matrices P1, P2. Applying these to the input images makes the epipolar lines parallel simplifying the stereo correspondence problem.
The calibration function from the cv::
namespace does not always work well with the with the fisheye style wide angle lenses used in the Donkey Car. Fortunately, there is cv::fisheye::
namespace with the fiseye lens optimized variant of the above functions.
The stereo calibration process is very well demonstrated in following GitHub repositories by sourishg
:
- https://github.com/sourishg/stereo-calibration
- https://github.com/sourishg/fisheye-stereo-calibration
I used the code from these to calibrate the cameras. The parameters resulting from the calibration is saved in a cam_stereo.yml
file.
These parameters can be used to rectify the input images before doing the stereo matching:
- first the input parameters are loaded from the
cam_stereo.yml
file, and thecv::fisheye::initUndistortRectifyMap
called for each camera. The output of the function is, for each camera, an X and Y transformation matrices that can be used to as parameters for theremap
function
cv::fisheye::initUndistortRectifyMap( K1, D1, R1, P1, imgSize, CV_32F, lmapx, lmapy);
cv::fisheye::initUndistortRectifyMap( K2, D2, R2, P2, imgSize, CV_32F, rmapx, rmapy);
- then the
remap()
function is called for the left and right input images
cv::remap(img1, imgU1, lmapx, lmapy, cv::INTER_LINEAR);cv::remap(img2, imgU2, rmapx, rmapy, cv::INTER_LINEAR);
The result of the rectification is something like:
The rectified images can be used to calculate the disparity map. There are multiple stereo matching algorithms available in OpenCV. I tried out StereoBM and StereoSGBM algorithms:
int stereo_run(int num_imgs, char* img_dir, char* leftimg_filename, char* rightimg_filename)
{
string calib_file = "cam_stereo.yml";
Mat R1, R2, P1, P2, Q;
Mat K1, K2, R;
Vec3d T;
Mat D1, D2;
Size imgSize(640, 480);
cv::FileStorage fs1(calib_file, cv::FileStorage::READ);
cout << "K1" << endl;
fs1["K1"] >> K1;
cout << "K2" << endl;
fs1["K2"] >> K2;
cout << "D1" << endl;
fs1["D1"] >> D1;
cout << "D2" << endl;
fs1["D2"] >> D2;
cout << "R" << endl;
fs1["R"] >> R;
cout << "T" << endl;
fs1["T"] >> T;
cout << "R1" << endl;
fs1["R1"] >> R1;
cout << "R2" << endl;
fs1["R2"] >> R2;
cout << "P1" << endl;
fs1["P1"] >> P1;
cout << "P2" << endl;
fs1["P2"] >> P2;
cout << "Q" << endl;
fs1["Q"] >> Q;
cv::Mat lmapx, lmapy, rmapx, rmapy;
cv::Mat imgU1, imgU2;
cv::Mat r;
cv::fisheye::initUndistortRectifyMap(K1, D1, R1, P1, imgSize, CV_32F,
lmapx, lmapy);
cv::fisheye::initUndistortRectifyMap(K2, D2, R2, P2, imgSize, CV_32F,
rmapx, rmapy);
Ptr<StereoBM> stereoBM = StereoBM::create(128, 21);
Ptr<StereoSGBM> stereoSGBM = StereoSGBM::create(0, //int minDisparity
96, //int numDisparities
21, //int SADWindowSize
600, //int P1 = 0
2400, //int P2 = 0
10, //int disp12MaxDiff = 0
16, //int preFilterCap = 0
2, //int uniquenessRatio = 0
20, //int speckleWindowSize = 0
30, //int speckleRange = 0
true); //bool fullDP = false
Mat dispOut, dispNorm;
for (int i = 72; i <= 78; i++) {
char left_img[100], right_img[100];
sprintf(left_img, "%s%s%s%d.png", img_dir, "stereo/", leftimg_filename, i);
sprintf(right_img, "%s%s%s%d.png", img_dir, "stereo/", rightimg_filename, i);
img1 = imread(left_img, CV_LOAD_IMAGE_GRAYSCALE);
img2 = imread(right_img, CV_LOAD_IMAGE_GRAYSCALE);
cv::remap(img1, imgU1, lmapx, lmapy, cv::INTER_LINEAR);
cv::remap(img2, imgU2, rmapx, rmapy, cv::INTER_LINEAR);
sprintf(left_img, "%s%s%s%d.png", img_dir, "stereo/rect_", leftimg_filename, i);
sprintf(right_img, "%s%s%s%d.png", img_dir, "stereo/rect_", rightimg_filename, i);
imwrite(left_img, imgU1);
imwrite(right_img, imgU2);
double minVal; double maxVal;
stereoBM->compute(imgU1, imgU2, dispOut);
minMaxLoc( dispOut, &minVal, &maxVal );
dispOut.convertTo(dispNorm, CV_8UC1, 255/(maxVal - minVal));
sprintf(left_img, "%s%s%s%d.png", img_dir, "stereo/disp_BM_", leftimg_filename, i);
imwrite(left_img, dispNorm);
stereoBM->compute(imgU1, imgU2, dispOut);
minMaxLoc( dispOut, &minVal, &maxVal );
dispOut.convertTo(dispNorm, CV_8UC1, 255/(maxVal - minVal));
stereoSGBM->compute(imgU1, imgU2, dispOut);
minMaxLoc( dispOut, &minVal, &maxVal );
dispOut.convertTo(dispNorm, CV_8UC1, 255/(maxVal - minVal));
sprintf(left_img, "%s%s%s%d.png", img_dir, "stereo/disp_SGBM_", leftimg_filename, i);
imwrite(left_img, dispNorm);
}
return 0;
}
Original input images:
The input images after rectification:
The output of the StereoBM and StereSGBM algorithms look like:
Brighter values represents smaller distances, while the darker ones higher ones.
The output is a little bit noisy, but result could be enhanced by changing the parameters and applying filtering to the input image.
18. Xilinx ReVision and xfOpenCVXilinx reVISION is a framework, a collection development resources for platform, algorithm and application development. It has resources for may hardware accelerated machine learning and computer vision algorithms.
xFOpenCV is Xilinx's extension over the OpenCV library. Based on key OpenCV functions, will allow you to easily compose and accelerate computer vision functions in the FPGA fabric through SDx or HLx environments.
It also has hardware accelerated support for the StereoBM algorithm used above.
19. Creating the SDSoC Hardware PlatformTo be able use the new hardware design in SDx projects, we first need to create an SDSoc platform. To do this we can follow the SDSoC Environment Tutorial - Platform Creation (UG1236) tutorial:
The tutorial has 3 main parts:
1. Exporting a project as DSA file from Vivado - this includes:
- setting a some PFM properties, on different components - basically we need to specify a platform name and the clocks, master and slave AXI interfaces and interrupts that will be available in the exported platform
- generating HDL output products with Global Synthesis option
- exporting and validating a DSA file
- exporting the hardware and launching Xilinx SDK
2. Creating FSBL project and Linker Script with Xilinx SDK - this includes:
- creating a FSBL project and generating a
.bif
file for it - creating a empty application and generating a Linker Script
- preparing a folder with some files that will be used in the Xiling SDx platform
3. Creating custom platform Xilinx SDx - this includes:
- Creating a Platform Project using the DSA file exported from Vivado
- setting the
.bif
, the Linker Script and the BSP setting files on the platform project - Generating a new Platform and uploading to Custom Repositories
After this we should be able to create an Application Project with our newly created platform:
To test the platform, I used the Array Partitioning example. If we build the project, there will be a sd_card
folder created. We can run the app by copying its content to the SD card:
Xilinx Zynq MP First Stage Boot Loader
Release 2018.2 Nov 23 2018 - 21:30:47
PMU-FW is not running, certain applications may not be supported.
Number of CPU cycles running application in software: 19504320
Number of CPU cycles running application in hardware: 242951
Speed up: 80.2809
Note: Speed up is meaningful for real hardware execution only, not for emulation.
TEST PASSED < < < It works! :)
(Note: my first try with building the Array Partitioning example failed as it needed more DSP units than the ones availableinthe device. The Vivado design, especially the VPSS Scaler components, used a relatively high number of DSP-s, so, there were not enough DSPleft for the SDSoC part. To solve this, I reconfigured the video pipelines from 2 pixels / clock to 1 pixel / clock processing. This reduced the DSP-s used with about 50%. After this, I was able to create a new SDSoC platform and build the Array partitioning example.)
20. SDSoC Applications on PetaLinuxTo be able to do hardware accelerated video processing on the camera, we need the Linux's V4L2 framework, so we need to run SDSoC on PetaLinux.
To be able to do this we need to add a new Linux System Configuration to our platform project:
After this we should be able to create applications targeting Linux with our custom hardware platform.
The build generates some files that should be copied to the SD card and .elf
executable file. Running it we should see something like:
root@Ultra96:~# ./array_part.elf
Number of CPU cycles running application in software: 19550092
Number of CPU cycles running application in hardware: 152666
Speed up: 128.058
Note: Speed up is meaningful for real hardware execution only, not for emulation.
TEST PASSED
21. Image Processing with SDSoC, xfOpenCV and reVisionAfter running the SDSoC examples I tried out some hardware xfOpenCV examples.
First I tried out the Harris corner detection example. The project compiles successfully with the hardware accelerated function. The software version of the corner detection works OK, but with hardware acceleration enable the the Ultra96 hangs.
The StereoBM algorithm also has a xfOpenCV version, so tried to compile a project with hardware acceleration enabled for xf::StereoBM
. After reducing the NO_OF_DISPARITIES
an PARALLEL_UNITS
parameters a little bit, the project compiles successfully.
SDx project with the hardware accelerated xfOpenCV StereoBM
function:
Project Summary of the Vivado project:
Unfortunately, when I try to run the hardware accelerated example on the Ultra96 the app hangs. Probably the problem is related to the instability problems of the programmable logic / video pipelines.
Cheers!
Comments