Determinism, responsivity and performance are requirements that drive the architecture of most embedded applications at the edge for a range of applications from autonomous driving, to robotics and advanced vision systems.
Xilinx Heterogeneous SoC such as the Zynq MPSoC provides engineers with the ability to implement solutions which achieves the determinism, responsivity, and performance targets. These targets can be achieved thanks to the combination of high-performance processing system which contains quad core 64-bit Arm A53 processors and programmable logic.
This combination of processing system and programmable logic enables implementation of the algorithms and functions using the most appropriate implementation technology. For example, network communications and man machine interface can be implemented using the processing system. While high performance image and signal processing, neural network acceleration can leverage the highly parallel nature of programmable logic.
Development of applications for the processing system can leverage bare-metal, real time operating systems or embedded Linux solutions. While traditionally development of solutions for the programmable logic solution use register transfer level languages such as VHDL and Verilog.
Implementing programmable logic solutions using RTL comes with a longer development and verification times, due to the lower level of design capture. However, when developing Xilinx Heterogeneous SoC devices we can accelerate algorithms from the processor to programmable logic using OpenCL.
OpenCL is an open source framework from the Khronous group designed for heterogeneous systems, at its core is the concept of Host and Kernel. The host is typically a x86 based system, although it does not have to be. While the kernel can be anything from another CPU to a GPU, or FPGA or ASIC. The aim of OpenCL is to enable portability across platforms without changing the source code. As such the host applications are commonly created using languages such as C or C++ combined with OpenCL API’s. Kernels are developed using OpenCL C which is derived from ISO C99 with necessary limitations and changes to enable cross platform support. For example, standard headers are not allowed e.g. stdlib.h, stdio.h while scalar types are all a defined size unlike in C/C++ where they are compiler and architecture dependent.
This allows developers of heterogeneous systems using OpenCL to use standard compilers for hosts e.g. GCC while kernel use custom compilers supplied by the kernel manufacturer.
When developing for Xilinx heterogeneous SoC such as the Zynq and Zynq MPSoC the OpenCL module can be used for development of the processing system (Host) and programmable logic (kernel). Such an approach is supported by the Xilinx unified software development tool Vitis, using Vitis it is possible to accelerate algorithms from the processing system into the programmable logic. Of course, this opens the performance of programmable logic to non-traditional developers as no longer is knowledge of VHDL or Verilog required to implement solutions in the programmable logic.
Accelerating developments using Vitis and Genesys ZUDeveloping embedded systems is challenging, throughout the development cycle capabilities must be demonstrated and risks retired before the final application can be deployed to the field. One of the main way’s risks can be retired and the technology readiness level increased is to demonstrate key capabilities and algorithms running on hardware early in the development cycle, this is were development boards such as the Genesys ZU comes into its own. The Genesys ZU with its wide range of interfacing capabilities enables rapid prototyping and risk reduction on target hardware.
To get started developing using the Vitis OpenCL acceleration flow on the Genesys ZU we need a Genesys ZU Vitis platform. Creating and testing this platform is very straight forward and uses the following Xilinx tools
- Vivado - Used to create a base platform of hardware configuration with the necessary resources made available to the Vitis compiler
- PetaLinux – Used to create the petalinux operating system which contains OpenCL API’s along with support for contiguous memory allocation and direct memory access drivers. PetaLinux is also used to create the SYS Root used to support the Vitis acceleration platform.
- Vitis – Used to create the Vitis acceleration platform and the resulting accelerated application.
To get started we need a base Vivado platform, this platform can contain interfaces and processing elements along with making resources available for the Vitis compiler. As a bare minimum this platform needs to define
- Processor Configuration – The configuration of the processing system, clocks, available DDR and configuration, Multiplexed IO configuration for the PS Interfacing peripherals.
- Clocks - Provided from a clock wizard several different clocks which can be used by the Vitis compiler.
- Interrupts – A single interrupt is provided to the processing system from an AXI Interrupt controller. The interrupts connected to the AXI Interrupt controller are then made available to the Vitis compiler.
- Processor Reset blocks – One processor reset block needs to be provided to the Vitis compiler for each of the available clocks.
- PS / PL interface – At least one PS AXI Master and one PS AXI Slave need to be defined this is to allow configuration and control of the created acceleration core, along with high speed DMA transfer from the PL to PS if required.
Once added into the design, resources are made available to the Vitis compiler using the platform interface view within Vivado.
The Vivado platform for the Genesys ZU I created, includes support for the MIPI camera interface, this can then be output over Display Port if required from the application software in PetaLinux.
The interfaces made available to the Vitis compiler in this design are two clocks (150 MHz and 300 MHz), one AXI master and three Slave AXI Interfaces along with eight interrupts.
To use this base hardware design with downstream tools we need to export an XSA which defines the hardware configuration. This XSA can be exported before a bit file is created as the Vitis compiler will generate the necessary bit files for the application.
Petalinux ConfigurationThe exported XSA can be used to create and configure a new PetaLinux project targeting the Genesys ZU. Once the project is created and configured, we need to make some customisations to enable support for Vitis and acceleration.
These changes are straight forward including
- Adding OpenCL and Xilinx Run Time in to the PetaLinux Meta User Layer
- Configuring the kernel to support contiguous memory allocation and DMA drivers
- Building the PetaLinux OS including the necessary boot files
- Create and install a SYS Root which can be used by Vitis
Along with the SYS Root a Vitis platform needs the following elements created by PetaLinux
- FSBL.elf – First Stage Boot loader
- Image.ub – The kernel image itself
- PMUFW.elf – The platform management unit firmware
- Bl31.elf – ARM TrustZone firmware
- U-Boot.elf – Second Stage Bootloader which loads in the kernel image
With these elements available at the end of the build sequence we are able to start working with Vitis to create an acceleration platform.
Creating the Vitis PlatformWe are now in possession of everything which is needed to create a Vitis Acceleration platform, helpfully Vitis provides a several step Wizard which supports the creation of the platform. Using this wizard starts with from the Vitis welcome page when it is started.
There are two elements needed for an acceleration platform hardware, and software elements. The hardware element is defined by the XSA previously exported from Vivado.
The definition of the software element of the platform is similar to the hardware element, this time we need to identify the various elements produced during the PetaLinux build.
Once this is completed the platform can be built and is available, for use in new acceleration projects.
My First Vitis Acceleration ProjectTo complete the testing of the acceleration platform, the easiest way is to build one of the existing example applications. To get started we need to create a new system application which is targeting the Genesys ZU platform, this platform should be show support for both embedded development and acceleration.
Walking through the new project creation wizard will allow you to create the vector addition example application.
This vector addition example will contain one OpenCL kernel which is accelerated into the programmable logic.
Within the kernel, to optimise performance for implementation in programmable logic there are several Pragmas used to control loop unrolling and interfacing, remember this accelerated block needs to be able to connect into the AXI interfaces which were made available in the Vivado platform.
Building the project will take a little while however, once completed Vitis will provide everything needed to copy on to the SD Card.
When the SD Card is inserted into the Genesys ZU, we can test the application over the command line.
Running the application will show the Kernel being loaded into the programmable logic and the steps associated with execution of the program.
Of course, you should also see the test reporting as passed!
This article has introduced the concepts and benefits of OpenCL acceleration for Zynq MPSoC based heterogeneous SoC devices. Demonstrating the different elements required to create an acceleration platform, before creating a simple test application to demonstrate the validity of the acceleration platform.
If you want get started yourself the acceleration platform is available here.
Comments