Published December 11, 2020 © Apache-2.0

Rapid Prototyping with GStreamer, Vitis AI and DeepLib

Develop Intelligent Video Processing Applications on Zynq UltraScale+ devices in minutes.

IntermediateFull instructions provided2 days6,041

Rapid Prototyping with GStreamer, Vitis AI and DeepLib

Things used in this project

Hardware components

Zynq UltraScale+ MPSoC ZCU104

e-con Systems See3CAM_CU30 - 3.4 MP Low Light USB Camera

Webcam, Logitech® HD Pro

USB Hub, 4 Port

Monitor, LCD

HDMI Male to Male Hi Speed Cable Assembly with Ethernet, 3D & 4K

Computer Cable, DisplayPort Plug

Flash Memory Card, MicroSD Card

Software apps and online services

AMD Vivado Design Suite HLx Editions

version 2019.2 and 2020.1

AMD PetaLinux

version 2019.2 and 2020.1

Vitis AI

version 2019.2 and 2020.1

GStreamer

pipeline based multimedia framework

DeepLib

easy creation on GStreamer based video processing pipelines

Vitis AI Model Zoo

ready to use deep learning models

OpenCV – Open Source Computer Vision Library OpenCV

used in the custom GStreamer plugins

AMD PYNQ Framework

Story

In this project I will present a framework enabling rapid prototyping of hardware accelerated video pipelines on Xilinx Zynq UltraScale+ devices.

The project is based on GStreamer, Vitis AI and DeepLib and comes with pre-built SD card image for the ZCU104evaluation board. This can be readily used to implement hardware accelerated intelligent video analytics applications effortlessly. Deep learning models from the Vitis AI Model Zoo and custom models can be used with the framework.

Concept and Introduction

FPGA based devices offer great flexibility in terms of design possibilities. From different communication interfaces, video processing elements to deep learning accelerators many powerful features can be implemented in FPGA based devices.

On the other hand, designing on FPGA based systems can be time consuming. The process involves designing hardware parts for the programmable logic, building custom embedded OS platforms, generating hardware accelerators for different tasks. On the top of that things can go wrong and debugging hardware designs is not easy.

For this reason developers can be reticent to get into designing FPGA based systems. Instead they may prefer more readily available solutions like PYNQ (Python productivity for Zynq) framework.

> PYNQ & GStreamer

PYNQ is a Python (and Jupiter notebooks) based framework that allows easily creating hardware accelerated applications on Xilinx Zynq UltraScale+ devices, without the need to do FPGA design for yourself.

PYNQ consists of a readily built PetaLinux based embedded system, with a base FPGA image providing the minimal functionality for specific devices. On top of FPGA base image, PYNQ ships with different FPGA overlays providing hardware accelerated functionality for different tasks.

PYNQ is distributed in the form of SD Card images for a good number of Xilinx Zynq UltraScale+ devices (PYNQ-Z1, PYNQ-Z2, Ultra96, ZCU104, etc).

PYNQ provides some powerful overlays for image processing applications. FPGA base image. However PYNQ may not be the best choice for video processing applications, as it operates with a fairly low level frame-by-frame basis.

For non-trivial video processing applications, usually pipeline based media processing frameworks like GStreamer are preferred. Unfortunately, the hardware features for accelerators from the PYNQ base images and overlays cannot be readily used with GStreamer.

To fully take advantage of features like the Video Coded Unit (VCU), HDMI Input and Output, or DeepLearningUnit (DPU) accelerated object detection and classification, a custom PetaLinux distribution must be built.

> GStreamer & DeepLib support for ZYNQ UltraScale Devices

GStreamer is a popular pipeline-based multimedia framework. It allows building multimedia (audio & video) processing pipelines from elements fulfilling different tasks. The pipeline elements can be sources (ex. file, v4l2), sinks (ex. egl, file, hdmi) or processing elements (ex. video convert, h.264 encode / decode).

DeepLib is a Python library I created earlier this year, that allows easy creation of GStreamer video processing pipelines. It is based on the GStreamer Python bindings, and works by grouping and auto-configuring GStreamer pipeline elements in higher level DeepLib elements fulfilling common task like handling different video inputs (V4L2, MIPI, HDMI, etc) and outputs (EGL, HDMI, DP, RTSP, etc), or implementing hardware accelerated object detection and classification.

DeepLib was originally built NVidia Jetson based devices. In this project I extended it for Xilinx Zynq UltraScale+ platforms. Now, building pipelines with DeepLib is as straight forward as this:

Along the standard ones, on Xilinx Zynq UltraScale+ devices the following features are supported with DeepLib:

HDMI Input and Output
DisplayPort Output
Hardware Accelerated Face Detection (Vitis AI DPU based)
Hardware Accelerated Person Detection (Vitis AI DPU based)
Hardware Accelerated Single Shot Detector (SSD) (Vitis AI DPU based)

with more features coming.

To be easy to use, the framework can be delivered in the form of pre-built SD card images.

A pre-built SD-Card image(preview*) for the ZCU104 with DeepLib installed can be downloaded from here.

The SD Card image consists of a custom built PetaLinux based platform with the following components:

Video Input and Output pipelines implemented in Vivado
custom PetaLinux platform with HDMI, VCU and other Drivers
Vitis AI and Vitis Vision accelerators added using Vitis
Xilinx Runtime Library (XRT) and Vitis AI (pre-installed)
GStreamer and DeepLib (pre-installed)

Note: some of the features are not yet fully working

As DeepLib is Python based it could be used with PYNQ too.

In the following sections I will present in detail how framework was built, and how each feature works.

Getting Started with the Xilinx ZCU104 Evaluation Kit

The Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit is a development kit offered by Xilinx, meant to evaluate the Zynq UltraScale+ MPSoC EV series based systems.

The ZCU104 features:

Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC
2GB DDR4 memory
HDMI Input and Output
DisplayPort Output
USB3 and Ethernet ports
MicroSD Card slot
FMC LPC and PMOD 3 connectors

It comes with a basic demo app, which test that features like memory, boot, user switches work.

After going though that, I decided to continue with two guides using Vitis Platform:

Next, I decided to also try the official PYNQ overlays for the ZCU104, and I did some tests with a HDMI Inputs and USB camera:

On the PYNQ images, I also installed GStreamer:

$ sudo apt-get install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 gstreamer1.0-pulseaudio gir1.2-gst-rtsp-server-1.0

We can test with a basic pipeline if it works:

$ gst-launch-1.0 -v videotestsrc pattern=snow ! fakesink

After this I also installed DeepLib, and checked that it works with a example streaming a USB Camera image over RTSP:

Building PetaLinux

When creating more complex or custom applications for Xilinx Zynq UltraScale+ device, at some point we will need to build PetaLinux from scratch.

PetaLinux is a custom Linux distribution from Xilinx designed to work with the Xilinx Zynq UltraScale+ devices.

There are multiple ways to build a PetaLinux image.

> PYNQ PetaLinux Images

To build a PYNQ based image from sketch the following step needs to be taken:

clone the PYNQ GitHub Repository fromhttps://github.com/Xilinx/PYNQ
download the ZCU104 2020.1 BSP to the boards/ZCU104 folder
and run the make command:$ make BOARDS=ZCU104
the resulting SD Card image can be found in the sdbuild/output folder

The main advantage of the of the PYNQ based build is that the resulting Root FS is based on Ubuntu. This means the APT package manager is installed by default, and can be used install AArch64 based packages easily.

The disadvantage of the PYNQ based build is that it that RootFS based features of PetaLinux, like custom / proprietary software packages, drivers or kernel modules, are not supported, so they need to be implemented manually.

> Bare PetaLinux

Building PetaLinux from custom hardware designs may be easier using the standard build procedure.

To build PetaLinux the hardware package need to be packaged in a BSP(Board Support Package) or XSA(Xilinx® Support Archive) file.

The first step is to create a PetaLinux project from the hardware description file:

% petalinux-create -t project -n zcu104-vcu -s ~/Downloads/xilinx-zcu104-v2020.1-final.bsp

If the hardware design changes, the project can be updated using:

$ petalinux-config --get-hw-description <folder with BSP / XSA file>

PetaLinux comes with a MenuConfig based configuration tool. This can be used to configure the Project Setting, RootFS and Kernel related settings:

$ petalinux-config
$ petalinux-config -c kernel
$ petalinux-config -c rootfs

The resulting configuration is stored in the project-spec folder. This folder contains all the project specific settings of our project. The content of the folder can also be edited manually.

To build the project we need to run

$ petalinux-build

After the build is successful, we can prepare the content of the SD Card using:

$ petalinux-package --boot --force --fsbl images/linux/zynqmp_fsbl.elf --fpga <bitstream>.bit --u-boot

The rootfs.tar.gz contains the content of the root

HDMI Support

The ZCU104 has two HDMI ports that can be used HDMI Input and Output capabilities.

As the base BSP(Board Support Package) for the ZCU104 does not contains the structures needed for HDMI the features, is order to get the HDMI working some changes in the hardware design is needed.

This hardware design for HDMI can be implemented in Vivado. As a base I used the BSP for the ZCU104, and added the HDMI components based on the Vitis Single Sensor demo Platform for the zcu104 Board example project.

The HDMI hardware design consists of:

a HDMI PHY Controller
a HDMI Input Video Pipeline
a HDMI Ouput Video Pipeline
an AXI IIC for the HDMI Control interface

1 / 3

As the BSP for the ZCU104 is already quite complicated, I decided to implement the HDMI Input Output in a sub-block:

Along the external Pin-s going to the HDMI Ports, the communication with the rest of the system is done using AXI and AXI Lite interfaces.

> HDMI Input

The HDMI Input video pipeline consists of:

a HDMI 1.4/2.0 Receiver Subsystem
a Scaler Video Processing Subsystem (VPS) with Color Conversion support
a Video Frame Buffer Write
slicers for the Reset pins
a custom IP for the HDMI RX Heartbeat LED

1 / 4

> HDMI Output

The HDMI Output video pipeline consists of:

a HDMI 1.4/2.0 Transmitter Subsystem
a Video Mixer
slicers for the Reset pins
a custom IP for the HDMI TX Heartbeat LED

1 / 3

> PetaLinux Config

In order to get the HDMI Input and Output working there are couple of things that needs to be configured in our PetaLinux project.

In the Device Tree we need to add the following nodes:

/* HDMI V-PHY */

&hdmi_in_out_vid_phy_controller_0 {
   	compatible = "xlnx,vid-phy-controller-2.2";
   	
	clock-names = "vid_phy_axi4lite_aclk", "dru-clk";
	clocks = <&misc_clk_0>, <&hdmi_dru_clk>;
};

/* HDMI RX */

&amba_pl {
	vcap_hdmi {
		compatible = "xlnx,video";
		dmas = <&hdmi_in_out_hdmi_input_v_frmbuf_wr_0 0>;
		dma-names = "port0";

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
				reg = <0>;
				direction = "input";
				vcap_hdmi_in: endpoint {
					remote-endpoint = <&hdmi_in_out_hdmi_input_v_proc_ss_scaler_out>;
				};
			};
		};
	};
};

&hdmi_in_out_hdmi_input_v_hdmi_rx_ss_0 {
	phys = <&vphy_lane0 0 1 1 0>, <&vphy_lane1 0 1 1 0>, <&vphy_lane2 0 1 1 0>;
	phy-names = "hdmi-phy0", "hdmi-phy1", "hdmi-phy2";
	xlnx,input-pixels-per-clock = <2>;
	xlnx,max-bits-per-component = <8>;
	xlnx,edid-ram-size = <0x100>;

	ports {
		#address-cells = <1>;
		#size-cells = <0>;

		port@0 {
			reg = <0>;

			xlnx,video-format = <XVIP_VF_RBG>;
			xlnx,video-width = <8>;

			hdmi_in_out_hdmi_input_v_hdmi_rx_ss_0_out: endpoint {
				remote-endpoint = <&hdmi_in_out_hdmi_input_v_proc_ss_scaler_in>;
			};
		};
	};
};

&hdmi_in_out_hdmi_input_v_proc_ss_scaler {
	compatible = "xlnx,v-vpss-scaler-2.2";
	reset-gpios = <&gpio 87 1>;

	ports {
		#address-cells = <1>;
		#size-cells = <0>;

		port@0 {
			reg = <0>;

			xlnx,video-format = <XVIP_VF_RBG>;
			xlnx,video-width = <8>;

			hdmi_in_out_hdmi_input_v_proc_ss_scaler_in: endpoint {
				remote-endpoint = <&hdmi_in_out_hdmi_input_v_hdmi_rx_ss_0_out>;
			};
		};

		port@1 {
			reg = <1>;

			xlnx,video-format = <XVIP_VF_YUV_422>;
			xlnx,video-width = <8>;

			hdmi_in_out_hdmi_input_v_proc_ss_scaler_out: endpoint {
				remote-endpoint = <&vcap_hdmi_in>;
			};
		};
	};
};

&hdmi_in_out_hdmi_input_v_frmbuf_wr_0 {	
	compatible = "xlnx,axi-frmbuf-wr-v2.1";	
	reset-gpios = <&gpio 88 1>;
	
	xlnx,dma-addr-width = <32>;
	xlnx,pixels-per-clock = <2>;
	xlnx,vid-formats = "yuyv", "uyvy", "y8", "rgb888";
};
	
/* HDMI TX */

&hdmi_in_out_axi_iic_0 {	
	/* idt8t49n241 i2c clock generator */
	idt8t49n24x: clock-generator@6c {
		status = "okay";
		compatible = "idt,idt8t49n24x";
		#clock-cells = <1>;
		reg = <0x6c>;

		/* input clock(s); the XTAL is hard-wired on the ZCU104 board */
		clocks = <&refhdmi>;
		clock-names = "input-xtal";

		settings = [
			09 50 00 60 67 c5 6c 01 03 00 31 00 01 40 00 01 40 00 74 04 00 74 04 77 6d 00 00 00 00 00 00 ff
			ff ff ff 01 3f 00 2e 00 0d 00 00 00 01 00 00 d0 08 00 00 00 00 00 08 00 00 00 00 00 00 44 44 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 e9 0a 2b 20 00 00 00 0f 00 00 00 0e 00 00 0e 00 00 00 27 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			e3 00 08 01 00 00 00 00 00 00 00 00 00 b0 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
			00 00 00 00 85 00 00 9c 01 d4 02 71 07 00 00 00 00 83 00 10 02 08 8c
		];
	};
	
	/* DP159 exposes a virtual CCF clock. Upon .set_rate(), it adapts its retiming/driving behaviour */
	dp159: hdmi-retimer@5e {
		status = "okay";
		compatible = "ti,dp159";
		reg = <0x5e>;
		#address-cells = <1>;
		#size-cells = <0>;
		#clock-cells = <0>;
	};
};


&hdmi_in_out_hdmi_output_v_mix_0 {
	reset-gpios = <&gpio 83 1>;
		
	xlnx,dma-addr-width = <32>;
	xlnx,bpc = <8>;
	xlnx,ppc = <2>;
	xlnx,num-layers = <5>;

	crtc_mixer_port: port@0 {
		reg = <0>;
		mixer_crtc: endpoint {
			remote-endpoint = <&hdmi_encoder>;
		};
	};
	mixer_master_layer: layer_0 {
		xlnx,layer-id = <0>;
		xlnx,vformat = "BG24";
		xlnx,layer-max-width = <3840>;
		xlnx,layer-max-height = <2160>;
	};
	mixer_overlay_1: layer_1 {
		xlnx,layer-id = <1>;
		xlnx,vformat = "YUYV";
		xlnx,layer-alpha;
		xlnx,layer-max-width = <3840>;
	};
	mixer_overlay_2: layer_2 {
		xlnx,layer-id = <2>;
		xlnx,vformat = "YUYV";
		xlnx,layer-alpha;
		xlnx,layer-max-width = <3840>;
	};
	mixer_overlay_3: layer_3 {
		xlnx,layer-id = <3>;
		xlnx,vformat = "UYVY";
		xlnx,layer-alpha;
		xlnx,layer-max-width = <3840>;
	};
	mixer_overlay_4: layer_4 {
		xlnx,layer-id = <4>;
		xlnx,vformat = "AR24";
		xlnx,layer-alpha;
		xlnx,layer-max-width = <3840>;
		xlnx,layer-primary;
	};
	mixer_logo: logo {
		xlnx,layer-id = <5>;
		xlnx,logo-height = <64>;
		xlnx,logo-width = <64>;
	};
};

&hdmi_in_out_hdmi_output_v_hdmi_tx_ss_0 {
	reg-names = "hdmi-txss";
	phys = <&vphy_lane0 0 1 1 1>, <&vphy_lane1 0 1 1 1>, <&vphy_lane2 0 1 1 1>;
	phy-names = "hdmi-phy0", "hdmi-phy1", "hdmi-phy2";
	clock-names = "s_axi_cpu_aclk", "link_clk", "s_axis_audio_aclk", "video_clk", "s_axis_video_aclk", "txref-clk", "retimer-clk";
	clocks = <&misc_clk_0>, <&misc_clk_2>, <&misc_clk_0>, <&misc_clk_3>, <&misc_clk_1>, <&idt8t49n24x 2>, <&dp159>;

	xlnx,input-pixels-per-clock = <2>;
	xlnx,max-bits-per-component = <8>;
	xlnx,output-fmt = "rgb";

	ports {
		#address-cells = <1>;
		#size-cells = <0>;
		encoder_hdmi_port: port@0 {
			reg = <0>;
			hdmi_encoder: endpoint {
				remote-endpoint = <&mixer_crtc>;
			};
		};
	};
};

In the Kernel config:

CONFIG_VIDEO_XILINX_DEMOSAIC=y
CONFIG_VIDEO_XILINX_GAMMA=y
CONFIG_VIDEO_XILINX_VPSS_CSC=y
CONFIG_VIDEO_XILINX_VPSS_SCALER=y
CONFIG_XILINX_FRMBUF=y
CONFIG_VIDEO_XILINX_HDMI_RX=y
CONFIG_DRM_XILINX_HDMI=y
CONFIG_PHY_XILINX_VPHY=y
CONFIG_COMMON_CLK_SI5324=y
CONFIG_RETIMER_DP159=y
CONFIG_DRM_XILINX_XVMIXER=y
CONFIG_COMMON_CLK_IDT8T49N24X=y
CONFIG_FONT_AUTOSELECT=y

The RootFS config need to be completed with:

CONFIG_gstreamer1.0-plugins-bad=y
CONFIG_gstreamer1.0-plugins-base=y
CONFIG_gstreamer1.0-plugins-base-apps=y
CONFIG_gstreamer1.0-plugins-good=y
CONFIG_kernel-module-hdmi=y
CONFIG_gstreamer1.0=y
CONFIG_gstreamer1.0-bash-completion=y
CONFIG_gstreamer1.0-omx=y
CONFIG_gstreamer1.0-rtsp-server=y
CONFIG_libdrm=y
CONFIG_libdrm-tests=y
CONFIG_libdrm-kms=y

Note: the above configs are included in the attached GitHub repository

> Testing

As I encountered some problems with the PetaLinux built by myself, I decided to test the HDMI functionality with the pre-built Vitis Single Sensor demo Platform for the zcu104 Board images.

To test the HDMI output, we need to connect a display to the upper HDMI port of the ZCU104. Then, the modetest command can be used to check that the display was detected:

root@xilinx-zcu104-2019_2:~# modetest -D b00c0000.v_mix -c
Connectors:
id      encoder status          name            size (mm)       modes   encoders
37      0       connected       HDMI-A-1        470x300         18      36
  modes:
        name refresh (Hz) hdisp hss hse htot vdisp vss vse vtot)
  1680x1050 60 1680 1728 1760 1840 1050 1053 1059 1080 119000 flags: phsync, nvsync; type: preferred, driver
  1280x1024 75 1280 1296 1440 1688 1024 1025 1028 1066 135000 flags: phsync, pvsync; type: driver
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 108000 flags: phsync, pvsync; type: driver
  1280x960 60 1280 1376 1488 1800 960 961 964 1000 108000 flags: phsync, pvsync; type: driver
  1152x864 75 1152 1216 1344 1600 864 865 868 900 108000 flags: phsync, pvsync; type: driver
  1024x768 75 1024 1040 1136 1312 768 769 772 800 78750 flags: phsync, pvsync; type: driver
  1024x768 70 1024 1048 1184 1328 768 771 777 806 75000 flags: nhsync, nvsync; type: driver
  1024x768 60 1024 1048 1184 1344 768 771 777 806 65000 flags: nhsync, nvsync; type: driver
  832x624 75 832 864 928 1152 624 625 628 667 57284 flags: nhsync, nvsync; type: driver
  800x600 75 800 816 896 1056 600 601 604 625 49500 flags: phsync, pvsync; type: driver
  800x600 72 800 856 976 1040 600 637 643 666 50000 flags: phsync, pvsync; type: driver
  800x600 60 800 840 968 1056 600 601 605 628 40000 flags: phsync, pvsync; type: driver
  800x600 56 800 824 896 1024 600 601 603 625 36000 flags: phsync, pvsync; type: driver
  640x480 75 640 656 720 840 480 481 484 500 31500 flags: nhsync, nvsync; type: driver
  640x480 73 640 664 704 832 480 489 492 520 31500 flags: nhsync, nvsync; type: driver
  640x480 67 640 704 768 864 480 483 486 525 30240 flags: nhsync, nvsync; type: driver
  640x480 60 640 656 752 800 480 490 492 525 25175 flags: nhsync, nvsync; type: driver
  720x400 70 720 738 846 900 400 412 414 449 28320 flags: nhsync, pvsync; type: driver
  props:
        1 EDID:
                flags: immutable blob
                blobs:

                value:
                        00ffffffffffff004c2d4c0432324d43
                        1e130103802f1e782aee91a3544c9926
                        0f5054bfef80b30081808140714f0101
                        0101010101017c2e90a0601a1e403020
                        3600da281100001a000000fd00384b1e
                        510f000a202020202020000000fc0053
                        796e634d61737465720a2020000000ff
                        00484d43533730373336380a202000d7
        2 DPMS:
                flags: enum
                enums: On=0 Standby=1 Suspend=2 Off=3
                value: 0
        5 link-status:
                flags: enum
                enums: Good=0 Bad=1
                value: 0
        6 non-desktop:
                flags: immutable range
                values: 0 1
                value: 0
        19 CRTC_ID:
                flags: object
                value: 0

Next, we can use the following command to set a resolution, a pixel format and display a test pattern on the screen:

$ modetest -D b0050000.v_mix -s 37:1680x1050@AR24

As the modetest commands worked, I decided to check out a little animation using GStreamer:

$ gst-launch-1.0 videotestsrc pattern=ball ! video/x-raw,width=1680,height=1050 ! xlnxvideosink sink-type="hdmi" sync=false

To test the HDMI Input capabilities, a HDMI source, like the output of a notebook, need to be connected to the lower HDMI port of the ZCU104.

We can use media-ctl to check the input device:

root@xilinx-zcu104-2019_2:~# media-ctl -d /dev/media0 -p
Media controller API version 4.19.0

Media device information
------------------------
driver          xilinx-video
model           Xilinx Video Composite Device
serial          
bus info        
hw revision     0x0
driver version  4.19.0

Device topology
- entity 1: vcap_hdmi output 0 (1 pad, 1 link)
            type Node subtype V4L flags 0
            device node name /dev/video2
        pad0: Sink
                <- "b0100000.scaler":1 [ENABLED]

- entity 5: b0100000.scaler (2 pads, 2 links)
            type V4L2 subdev subtype Unknown flags 0
            device node name /dev/v4l-subdev0
        pad0: Sink
                [fmt:RBG888_1X24/1920x1080 field:none colorspace:srgb]
                <- "a1000000.hdmi_rxss":0 [ENABLED]
        pad1: Source
                [fmt:UYVY8_1X16/1920x1080 field:none colorspace:srgb]
                -> "vcap_hdmi output 0":0 [ENABLED]

- entity 8: a1000000.hdmi_rxss (1 pad, 1 link)
            type V4L2 subdev subtype Unknown flags 0
            device node name /dev/v4l-subdev1
        pad0: Source
                [fmt:RBG888_1X24/1920x1080 field:none colorspace:srgb]
                [dv.caps:BT.656/1120 min:0x0@25000000 max:4096x2160@297000000 stds:CEA-861,DMT,CVT,GTF caps:progressive,reduced-blanking,custom]
                [dv.detect:BT.656/1120 1920x1080p60 (2200x1125) stds:CEA-861 flags:CE-video]
                -> "b0100000.scaler":0 [ENABLED]

Then we set the correct formats:

$ media-ctl -v -d /dev/media0 -V '"b0100000.scaler":0 [fmt:RBG888_1X24/1920x1080 field:none colorspace:srgb]'

And use the yavta utility to capture some frames:

root@xilinx-zcu104-2019_2:~# yavta -n 3 -c10 -f UYVY -s 1920x1080 --skip 7 -F /dev/video2
Device /dev/video2 opened.
Device `vcap_hdmi output 0' on `platform:vcap_hdmi:0' is a video output (without mplanes) device.
Video format set: UYVY (59565955) 1920x1080 field none, 1 planes: 
 * Stride 3840, buffer size 4147200
Video format: UYVY (59565955) 1920x1080 field none, 1 planes: 
 * Stride 3840, buffer size 4147200
3 buffers requested.
length: 1 offset: 3942314464 timestamp type/source: mono/EoF
Buffer 0/0 mapped at address 0x7fb6b85000.
length: 1 offset: 3942314464 timestamp type/source: mono/EoF
Buffer 1/0 mapped at address 0x7fb6790000.
length: 1 offset: 3942314464 timestamp type/source: mono/EoF
Buffer 2/0 mapped at address 0x7fb639b000.
0 (0) [-] none 0 0 B 149.766923 149.766937 28.083 fps ts mono/EoF
1 (1) [-] none 1 0 B 149.783555 149.783565 60.125 fps ts mono/EoF
2 (2) [-] none 2 0 B 149.800187 149.800197 60.125 fps ts mono/EoF
3 (0) [-] none 3 0 B 149.816819 149.816829 60.125 fps ts mono/EoF
4 (1) [-] none 4 0 B 149.833451 149.833460 60.125 fps ts mono/EoF
5 (2) [-] none 5 0 B 149.850083 149.850092 60.125 fps ts mono/EoF
6 (0) [-] none 6 0 B 149.866716 149.866725 60.121 fps ts mono/EoF
7 (1) [-] none 7 0 B 149.883347 149.883356 60.129 fps ts mono/EoF
8 (2) [-] none 8 0 B 149.899978 149.917370 60.129 fps ts mono/EoF
9 (0) [-] none 9 0 B 149.916610 149.951932 60.125 fps ts mono/EoF
Captured 10 frames in 0.220618 seconds (45.327217 fps, 0.000000 B/s).
3 buffers released.

The saved frames are in RAW format, and can be opened in the rawpixels.net utility:

note: the pixel format is a little bit off :)

After this I also tried mirroring the HDMI Input to the HDMI Output port:

Video Code Unit (VCU) Support

The ZCU104 board is features the Zynq UltraScale+ XCZU7EV MPSoC. The EV suffix means the MPSoC has a hardware based Video Codec Unit (VCU) capable of encoding and decodingH.264 / H.265 video content.

The hardware parts needed to take advantage of the hardware accelerated video encoding and decoding are already built in the ZCU104 BSP.

The used components are:

ZYNQ UltraScale+ VCU
Video Frame Buffer Read
Video Frame Buffer Write

When importing the hardware design, the PetaLinux tools are capable to automatically generated most of the device tree configuration, but some settings still need to be done manually.

/* VCU */
&v_frmbuf_rd_0 {
	reset-gpios = <&gpio 79 1>;
};

&v_frmbuf_wr_0 {
	reset-gpios = <&gpio 80 1>;
};

The kernel modules to be included are the:

CONFIG_XILINX_FRMBUF=y
CONFIG_DRM_XILINX_HDMI=y

while on the RootFS the following packages need to be included:

CONFIG_gstreamer1.0-omx=y
CONFIG_gstreamer-vcu-examples=y

Building PetaLinux with this options should result in having four new GStreamer plugins, which can be used for hardware accelerated H.264 / H.265 encoding and decoding.

$ gst-inspect-1.0 | grep h26
...
omx:  omxh265dec: OpenMAX H.265 Video Decoder
omx:  omxh265enc: OpenMAX H.265 Video Encoder
omx:  omxh264dec: OpenMAX H.264 Video Decoder
omx:  omxh264enc: OpenMAX H.264 Video Encoder
...

We can test the encoding feature with a simple pipeline like the one bellow:

$ gst-launch-1.0 videotestsrc ! omxh264enc ! mpegts
mux ! filesink location=test1.ts

The above pipeline uses a videotestsrc as a video source, encodes with H.264, and writes the result into a media file. The method can be used to efficiently save any video content to disk.

We can play back the resulting media file on a PC:

We can decode H.264 / H.265 content using the decodebin element:

$ gst-launch-1.0 filesrc location=test1.ts ! decodebin ! fakesin

(Note: the hardware accelerated omxh264dec and omxh265dec elements are automatically selected when availaible)

Building a Vitis AI Platform for the ZCU104

In this section we will follow the ZCU104 FMC Quad-Camera + ML Example tutorial from Avnet. The guide goes through the steps needed to a custom platform with Vitis AI based hardware acceleration.

The first part goes through the steps needed to build the Vivado Hardware Platform, a PetaLinux based on it. At the end a Vitits hardware platform is exported.

The Vitis Hardware Project contains the following components:

4 x MIPI CSI-2 capture pipelines
HDMI Output

It was actually built for a Quad MIPI Camera Setup, but we will use it USB cameras.

The PetaLinux Project is build from the exported XSA hardware definition, and along the standard components it also contains the Vitis-AI Runtime and Vitis-AI Library Packages.

In the final step the resulting Vitis PFM platform definition file is build from the Vitis and PeteLinux Project.

The second parts adds a Vitis-AI Deep Learning Processor Unit (DPU) IP to the platform.

The Deep Learning Processor Unit (DPU)is a configurable IP from Xilinx, that allows accelerating convolutional neural networks. It consists of a register configure module, data controller module, and convolution computing module with an instruction set designed for convolutional neural networks.

The convolutional neural networks supported by the DPU includes VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN and others.

The DPU is implemented in the Programmable Logic.

The third part covers the software components of project.

There are two components to build:

two GStreamer hardware accelerated plugins for face and traffic detection
a Face Detection model AI model

The final part covers the steps needed to run the application on the ZCU104.

The application is first copied to an SD Card, and then there are scripts to setup the environment and run the example.

As I not used the platform with a quad camera, a couple of changes were needed.

To run the example:

$ gst-launch-1.0 v4l2src device=/dev/video6 ! videoconvert ! sdxfacedetect ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix plane-id=30" sync=false fullscreen-overlay=false

We can also do this with multiple streams at once:

$ gst-launch-1.0 \
> v4l2src device=/dev/video6 ! videoconvert ! sdxfacedetect ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix render-rectangle=\"<640,360,640,360>\" plane-id=30" sync=false fullscreen-overlay=false \
> v4l2src device=/dev/video8 ! videoconvert ! sdxfacedetect ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix render-rectangle=\"<0,0,640,360>\" plane-id=31" sync=false fullscreen-overlay=false

Building a Hardware Accelerated GStreamer Plugin

The next step was to build a custom hardware accelerated GStreamer plugin, using Vitis AI. For this I followed the Creating a Vitis-AI GStreamer Plugin tutorial from Avnet.

The tutorial uses the ZCU104 Quad-Camera + ML Platform built in the previous section, and uses the Deep Learning Processor Unit (DPU) to build two hardware accelerated GStreamer plugins: one for face detection and one for person detection.

The first section describes the steps needed to download the Vitis AI packages, and set up cross compilation for the PetaLinux platform we build before.

In the second part we are using GStreamer development tools to generate two GStreamer plugin templates. The templates then are completed with Vitis AI specific code implementing face detection and person detection.

We need to add / modify a couple of things in the GStreamer plugins:

input and output capabilities (format + resolution) - BGR, 640 x 360 px

/* Input format */
#define VIDEO_SRC_CAPS \
    GST_VIDEO_CAPS_MAKE("{ BGR }")

/* Output format */
#define VIDEO_SINK_CAPS \
    GST_VIDEO_CAPS_MAKE("{ BGR }")

gst_element_class_add_pad_template (GST_ELEMENT_CLASS(klass),
    gst_pad_template_new ("src", GST_PAD_SRC, GST_PAD_ALWAYS,
      gst_caps_from_string (VIDEO_SRC_CAPS ",width = (int) [1, 640], height = (int) [1, 360]")));

gst_element_class_add_pad_template (GST_ELEMENT_CLASS(klass),
    gst_pad_template_new ("sink", GST_PAD_SINK, GST_PAD_ALWAYS,
      gst_caps_from_string (VIDEO_SINK_CAPS ", width = (int) [1, 640], height = (int) [1, 360]")));

plugin metadata

# Face detection
gst_element_class_set_static_metadata (GST_ELEMENT_CLASS(klass),
    "Face detection using the Vitis-AI-Library", 
    "Video Filter", 
    "Face Detection",
    "FIXME <fixme@example.com>");

# Person detection
gst_element_class_set_static_metadata (GST_ELEMENT_CLASS(klass),
    "Person detection using the Vitis-AI-Library", 
    "Video Filter", 
    "Person Detection",
    "FIXME <fixme@example.com>");

and the actual image processing logic

The Face Detection and Person Detection plugins uses the Vitis AI libraries for hardware acceleration.

Namely, the Face Detection plugin uses the vitis::ai::FaceDetect class, while the Person Detection uses a more generic vitis::ai::SSDSingle Shot Detector AI processor.

The resulting matches are plotted as rectangles onto the original images using OpenCV.

After the modifications are made, the plugins can be simply compiled with the provided Makefile-s.

The third section describes the steps needed to install the Vitis AI Runtime Libraries, on the target hardware: the ZCU 104.

The SD Card images built in the previous sections are used as a base. To install the Vitis AI Runtime Libraries, a couple of deg and tar.gz packages need to downloaded from the internet.

The final section helps us install and run the GStreamer plugins we build on the ZCU104.

The plugins are build as shared libraries (.so) and can be installed by copying the to the GStreamer plugin folder (and clearing the GStreamer caches).

We can run the plugins with GStreamer pipelines similar to the ones used in the previous sections:

Face Detection

$ gst-launch-1.0 v4l2src device=/dev/video6 ! videoconvert ! vaifacedetect ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix plane-id=30" sync=false fullscreen-overlay=false

Person Detection

$ gst-launch-1.0 v4l2src device=/dev/video6 ! videoconvert ! vaipersondetect ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix plane-id=30" sync=false fullscreen-overlay=false

> Generalized Vitis AI Single Shot Detector (SSD) Plugin

The two GStreamer plugins we build before implements two verify specific tasks: Face and Person detection, using two specific deep learning models.

But, we can do this better: we can create GStreamer plugins that are customizable and can be used and user given deep learning models.

The the Single Shot Detector (SSD) we used for the Person Detecion plugin is initialized with an user defined model. We will use this create a generic GStreamer plugin implementing Single Shot Detection.

The new vaissd plugin can be used with the SSD models from the Vitis AI Model Zoo, and well with custom models(after they are installed to the platform).

We can create the plugin as in the previous section, but we will also define parameter for the plugin named model. The parameter will represent the name of the model we want to use. For example with vaissd model=ssd_pedestrain_pruned_0_97 the plugin will behave like the Person Dection plugin.

Adding the property is fairly simple. In the gstvaisst.cpp we need to add:

enum
{
  PROP_0,
  PROP_MODEL_NAME
};


static void
gst_vaissd_class_init (GstVaissdClass * klass)
{
...
  /* define properties */
  g_object_class_install_property (gobject_class, PROP_MODEL_NAME,
    g_param_spec_string ("model", "Model Name", "Model Name",
              "ssd_pedestrain_pruned_0_97", (GParamFlags) (G_PARAM_READWRITE | G_PARAM_STATIC_STRINGS)));
}

void
gst_vaissd_set_property (GObject * object, guint property_id,
    const GValue * value, GParamSpec * pspec)
{
...
    case PROP_MODEL_NAME:
      vaissd->model_name = g_value_dup_string(value);
      g_print ("Vitis AI SSD Model Name: %s\n", vaissd->model_name);
      break;
...
}

void
gst_vaissd_get_property (GObject * object, guint property_id,
    GValue * value, GParamSpec * pspec)
{
...
    case PROP_MODEL_NAME:
      g_value_set_string(value, vaissd->model_name);
      break;
...
}

Then we can use the initiate the vitis::ai::SSD class with the value of the parameter, instead of the hard-coded value:

/* Create Single Shot Detector object */
  thread_local auto ssd = vitis::ai::SSD::create((char*) vaissd->model_name, "true");

After building the plugin and installing it on the ZCU104, the plugin can be run as follows:

model ssd_traffic_pruned_0_9

$ gst-launch-1.0 v4l2src device=/dev/video6 ! videoconvert ! vaissd model=mlperf_ssd_resnet34_tf ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix plane-id=30" sync=false fullscreen-overlay=false

model mlperf_ssd_resnet34_tf

gst-launch-1.0 v4l2src device=/dev/video6 ! videoconvert ! vaissd model=mlperf_ssd_resnet34_tf ! queue ! fpsdisplaysink video-sink="kmssink bus-id=b00c0000.v_mix plane-id=30" sync=false fullscreen-overlay=false

DeepLib: Adding support for Xilinx platforms

DeepLib is an easy to use Python Library, which allows creating GStreamer based video processing pipelines in an easy way. I created it at the start of this year, and initially it supported Nvidia Jetson based devices.

The library uses the GStreamer Python bindings to create and run GStreamer based video pipeline. It works by grouping and auto-configuring GStreamer pipeline elements in higher level DeepLib elements fulfilling common task like handling different video inputs (V4L2, MIPI, HDMI, etc) and outputs (EGL, HDMI, DP, RTSP, etc), or implementing hardware accelerated object detection and classification.

In order to make the library work on platforms other the Nvidia Jetson family a couple of changes were needed.A new concept, the Platform was introduced. Its purpose is to separate features / GStreamer elements that are available only on certain platforms.

Currently there are tree platforms supported:

Generic Platform - supported on any platform where GStreamer is installed
NVidia Platform - for the NVidia Jetson devices and PC with Nvidia GPU-s
Xilinx Platform - for Xilinx Zynq UltraScale+ devices

The Genericplatform uses standard GStreamer elements shipped with every installation. These are usually CPU based processing elements.

The Vendorspecificplatforms (Xilinx, Nvidia) implements certain tasks using hardware acceleration.

The hardware accelerated elements can be:

replacements for standard GStremer elements, that are then used in differed DeepLib input, output and processing elements - examples include hardware accelerated video encoding / decoding (H.264/H.265), video format conversion and specific implementation for video input and output elements like HDMI, MIPI, etc.
DeepLib processing elements, implementing higher lever features like object detection and classification

The library is implemented in such a way that is capable to automatically detect when it is used on a vendor specific platform. The hardware accelerated elements are automatically selected when available.

Implementing the Xilinx platform for Zynq UltraScale+ devices meant creating new platform, and adding a couple of new elements.

A new input element:

HDMI Input - capture an HDMI input stream

and two new output elements were added:

HDMI Output - send content over an HDMI output port
DisplayPort Output - display content on a DisplayPort capable monitor

The Xilinx specific implementation for these uses the kmssink and v4l2src elements.

A couple of Vitis based processing elements were also added

Xilinx Face Detect - face detection using the Vitis AI based GStreamer plugin implemented in the previous section
Xilinx Person Detect - person / pedestrian detection using the Vitis AI based GStreamer plugin
Xilinx SingleShotDetector - configurable single shot detector based on the vitis::ai::SSD

Future Improvements

This project is far from being complete. There is room and potential for a lot of improvements and interesting new features.

First, I would like to cover more Vitis AI based features. And I would also like to implement some features using Vitis Vision Library.

The framework is currently supported on the ZCU104 board. But, in the near future I plan to build some pre-built SD card images for the Ultra96 (V1) and PYNQ-Z2 boards.

Lastly, I want to migrate from bare PetaLinux based SD card images, to PYNQ based ones. This would add support for using DeepLib from PYNQ. The PYNQ based images also have the advantage that are Ubuntu, so installing packages are much easier.