----------------------------------------------------------------------------------------------------------
Link
Schematic: https://www.avnet.com/opasdata/d120001/medias/docus/193/Ultra96-V2%20Rev1%20Schematic.pdf
Board files: GitHub - Avnet/bdf: Avnet Board Definition Files
----------------------------------------------------------------------------------------------------------
The objective of this module is to integrate an acceleration platform overlay on the PYNQ framework. We have the hardware (Vivado) that allows the connections to exist and the software (Vitis) finishes the connections for a complete design. This is then transported to PYNQ
The integration is in 2 parts, the 1st step is the setup of the hardware to instantiate the hooks for the acceleration platform (it is seen in Vivado as incompleted - that is normal), see it as more of a foundation or more of a wall socket metaphor.
We have the connections created, allowing 2nd part which is the software attaching to thehardware foundation which are the hooks/plug into the wall socket. The hooks connect to the software application. (This completes the hardware design through automated processes in the software), we take what is existed from the template (a placeholder for your/user applications)
Once we have a completed design with both the hardware in Vivado and the software in Vitis, we migrate the project as a hwh/tcl/bit file. We can then interface with the block/template instantiated allowing us to use the function of the block. In this case it is an addition function A + B = C.
HW - Creating Block DiagramWhen creating the hardware module, we will take a look and leverage an existing project flow which explains in detail how to setup our hardware to thus connect up the software. Refer to the sections accordingly.
High Level we are accomplishing an addition vector algorithm: A + B = C
As for a flow diagram, we are running through the hardware process showing the cycle chronologically. This allows the PS to interface with the PL nodes we have placed. Having 5 different resets that we can hook up.
Requirements that we need in making an Acceleration Platform - NOTE - Follow specific instructions. You have to have the following in the design or the acceleration platform will not operate correctly.
- Memory & Controller Interface - MPSoC
- CLock
- Interrupt
We build the vivado platform, create our rootfs, fsbl, pmwfw, uboot, sysroot, system, dtb. This is then pathed and registered in the Vitis platform project to allow the new application to be run with custom code, knowing it has MPSoC build based from the petalinux, below describes the process in a simple flow.
To inquire more about Petalinux and Vitis refer to resources (UG1144 |UG1393)
------------------------------------------------------------------------------------------------------
Follow Adam Taylor's flow(https://www.hackster.io/adam-taylor)
Creating Vivado XSAOnce in the Vivado directory, we can start Vivado and create a new project — make sure we target the Ultra96 V2 board.
With the project open, the next step is to create the block diagram.
Project Open and Ready for Creation
When the create block diagram dialog appears, leave the name as design_1 and click OK.
Creating the block diagram
In the block diagram, we need to add the MPSoC processing system and configure it for the Ultra96 V2 by running the block automation.
Adding in the MPSoC IP Core
Running the block automation
Now that we have the processing system configured for a base Ultra96 V2 project, we can configure it how we need for an acceleration XSA by re customizing the MPSoC IP.
The first step is turn off the AXI HPM0 and HPM1 FPD interfaces.
Disabling the interfaces on the Ultra96 default settings
With the MPSoC configured as we desire, the next step is to implement the clocking and reset structures.
Let's start with adding in a Clocking Wizard IP Block and re-configuring it to provide five outputs. Increasing in frequency from 100 MHz to 400 MHz make the fifth clock output 600MHz.
Setting clock frequencies
To ensure the reset on the clocking wizard is compatible with the MPSoC IP block reset, we need to set the clocking wizard reset to be active low.
Setting the clock wizard reset to be active low
To function correctly in the Vitis acceleration flow, each clock needs an associated reset. So now let's address this.
Adding in the reset
To add in a processor system reset IP, we need one of these for each of the clock outputs on the clocking wizard. Copy and paste this four times so we have a total of five in the block diagram.
At this stage our block diagram should look as below:
All IP blocks within the block diagram
Run the connection automation and associate each processor reset block, slowest sync clock with one of the clocking wizard clocks.
Setting the slowest clocks in the run connection automation
Set each of the ext_reset_in signals to the pl_resetn0 output from the MPSoC.
setting the connection automation configuration
Once the automation has completed, the final stage is to connect the dcm_locked inputs on the processor reset systems to the locked output on the clocking wizard.
Finally, add a Concat IP block and connect it to the PL-PS_IRQ input. Ensure there is only one input on the concat block.
Completed block diagram
Having completed the base platform, the next stage is to declare capabilities which will be made available or not to the V++ compiler.
To do this, first we need to enable the platform view — this is enabled by selecting:
Window -> Platform Interfaces
Opening the platform interface
This will create a new Platform Interfaces window within the block diagram.
Completed block diagram
Click on the Enable Platform Interfaces, and you will see a list of the available interfaces under each of the elements in the design.
These can be enabled or disabled by right clicking on the interface. For the Ultra96 V2 MPSoC, ensure the interfaces below are enabled.
Enabling platform interfaces
Select clk_out3 and in the options below set the ID to 0 and make it the default clock.
Setting the clock 3 as the default
Finally, enable In0 to in7 on the Concat block.
We also need to set the design intent to show where we are intending to deploy the design enter the commands below in the TCL console.
set_property platform.design_intent.embedded true [current_project]set_property platform.design_intent.server_managed false [current_project]set_property platform.design_intent.external_host false [current_project]set_property platform.design_intent.datacenter false [current_project]set_property platform.default_output_type "sd_card" [current_project]
Save the project, validate the block diagram, and generate a HDL wrapper for the block diagram.
Successful validation
Once we have the HDL wrapper implement the bit stream.
Wait until the bitstream is available, then export and validate the XSA with the following TCL commands:
write_hw_platform -include_bit ultra96_min.xsavalidate_hw_platform ./ultra96_min.xsa
Validation of the XSA
Now that we have the hardware element of the platform created, we can start looking at the software element next.
------------------------------------------------------------------------------------------------------
SW - Creating Block Diagram (Follow Adam Taylor's flow)Before we can install PetaLinux, we need to ensure we have necessary prerequisites. For the VM we created a few weeks ago, we can install them using the command:
sudo apt-get install -y gcc git make net-tools libncurses5-dev tftpd zlib1g-dev libssl-dev flex bison libselinux1 gnupg wget diffstat chrpath socat xterm autoconf libtool tar unzip texinfo zlib1g-dev gcc-multilib build-essential libsdl1.2-dev libglib2.0-dev zlib1g:i386 screen pax gzip gawk
Once the required packages have been installed, we can download and install PetaLinux.
Installing PetaLinux
Completion of installation
With PetaLinux available under the directory, we created for the platform previously we need to create three new directories PFM, WKSP1 and BOOT.
cd ultra96_min_pkgmkdir pfmcd pfmmkdir wksp1mkdir bootcd ..
In the same terminal window, we need to source the following files:
- /settings64.sh
- /settings.sh
We are now ready to create the PetaLinux project. Make sure the project name is the same as the hardware in this case ultra96_min.
Creating the new PetaLinux project
petalinux-create -t project --template zynqMP -n ultra96_min
We then need to configure the new project for the hardware design using the command:
cd ultra96_minpetalinux-config --get-hw-description=../vivado
This will open a configuration dialog, set the boot arg to
earlycon clk_ignore_unused root=/dev/ram rw
and stdin/stdout to pus_uart_1.
Configuring the hardware
Setting stdin / stdout
Once this is completed, save the changes and exit the dialog.
We need to make some changes to the meta-user Yocto layer, under the directory:
/project-spec/meta-user open the conf file and add in the required OpenCL requirements.
CONFIG_xrtCONFIG_xrt-devCONFIG_zoclCONFIG_opencl-clhpp-devCONFIG_opencl-headers-devCONFIG_packagegroup-petalinux-opencv
Editing the conf file
We also need to make changes to the user device tree, under:
/project-spec/recipes-bsp/device-tree/files
Edit the file system-user.dsti and add in the following:
/include/ "system-conf.dtsi"
/ {
amba {
mmc@ff160000 {
u-boot,dm-pre-reloc;
compatible = "xlnx,zynqmp-8.9a", "arasan,sdhci-8.9a";
status = "okay";
interrupt-parent = <0x4>;
interrupts = <0x0 0x30 0x4>;
reg = <0x0 0xff160000 0x0 0x1000>;
clock-names = "clk_xin", "clk_ahb";
xlnx,device_id = <0x0>;
#stream-id-cells = <0x1>;
iommus = <0xd 0x870>;
power-domains = <0xc 0x27>;
clocks = <0x3 0x36 0x3 0x1f>;
clock-frequency = <0xb2d0529>;
xlnx,mio_bank = <0x0>;
no-1-8-v;
disable-wp;
};
};
};
&amba {
zyxclmm_drm {
compatible = "xlnx,zocl";
status = "okay";
reg = <0x0 0xA0000000 0x0 0x10000>;
};
};
Updating the device tree
Once these edits have been made, open the rootfs configuration and enable the user packages.
petalinux-config -c rootfs
Enabling the user packages
To be able to use the platform for acceleration, we need to make a few changes to the kernel configuration:
petalinux-config -c kernel
Make the following changes:
- Device Drivers > Generic Driver Options > DMA Contiguous Memory Allocator > Size in Mega Bytes change the size from 256 to 1024 MB
- Device Drivers -> Staging drivers -> Xilinx APF Accelerator driver
- Device Drivers -> Staging drivers -> Xilinx APF Accelerator driver -> Xilinx APF DMA engines support
Once this has been completed, we then ready to build the PetaLinux image.
petalinux-build
Building the PetaLinux Image
Wait until the build completes and we can then create the sysroot, change directory into the /images/linux/directory.
petalinux-build --sdk
We will use this to install the sysroot in the pfm directory run the command:
./sdk.sh
When prompted, enter the full path to the pfm directory.
Sysroot installer
Finally, copy the following files from the /images/linux directory to the boot directory.
- image.ub
- zynqmp_fsbl.elf
- pmufw.elf
- bl31.elf
- u-boot.elf
Copying the boot files
We also need to create a linux.bif file under the boot directory this should contain the following.
/* linux */ the_ROM_image: { [fsbl_config] a53_x64 [bootloader] <zynqmp_fsbl.elf> [pmufw_image] <pmufw.elf> [destination_device=pl] <bitstream> [destination_cpu=a53-0, exception_level=el-3, trustzone] <bl31.elf> [destination_cpu=a53-0, exception_level=el-2] <u-boot.elf> }
------------------------------------------------------------------------------------------------------
Creating Vitis Platform ProjectWith all of this completed, we are now in a position to open Vitis and begin to create the platform. From within the pfm directory, run the following commands:
vitis -workspace wksp1
This will open the Vitis GUI. From under the project column, select Create Platform Project.
Vitis welcome screen
This will open a new platform project dialog, enter the project name and click next.
Dialog for project creation
On the next dialog, select the create from hardware specification.
Selecting the platform definititon source
On the next dialog, select the XSA which is under the Vivado directory.
Selecting the XSA
Select the operating system as Linux and Processor as the PSU_Cortexta53.
Defining the SW solution
Completing the dialog will open a platform project in Vitis. To be able to build the application, we need to provide the location of the BIF, boot directory, Linux image and sysroot — all of which are available under the PFM directory.
With the information provided, we can build the application project. This might take a minute or two.
Selecting the new platform
For the application, select the demo example, change the target to hardware, and build the application.
This took about 30 minutes on my system.
Acceleration application build completed
We now have a platform which we can use to accelerate our OpenCL applications on for the Ultra96 V2.
See My FPGA / SoC Projects:Adam Taylor on Hackster.io
Get the Code:ATaylorCEngFIET (Adam Taylor)
Access the MicroZed Chronicles Archives with over 300 articles on the FPGA / Zynq / Zynq MpSoC updated weekly atMicroZed Chronicles.
------------------------------------------------------------------------------------------------------
SW - Application Addition VectorCreate a new application project and select our new platform that was build and created in the last step. Now that the hardware foundation has been constructed, we now have to hook in our kernel linker and edit the source code.
Getting started - Welcome Screen
Select your exported XSA acceleration platform
Select the template - Vector Addition Example
Edit the template such that you can strip out the content you do not need and change the pointers to integers
Now time to link the kernel_Vector Add to the project - The compute Unit next to Name is how many kernels you want to drop in. Since we are only using 1 kernel --> we can leave the Compute Unit = 1
Configuration time! Here is where things can become messy. We have to now create a.config file with the system link where the compiler can stitch in the new kernel that has been brought in.
Create a new file --> vector_add-link.cfg
Now we have a V++ compiler, we can see this kernel be brought into our design.
Build project - this may take a while. Give it 10-25 mins depends on the machine you are working on. NOTE - Let it think if it gets "stuck"
Files that are generated:
- TCL File
- HWH File
- Bit File
- XCLBIN File
Find the Link_Summary and double click this option to then verify if your hardware is behaving as you are expecting.
At a high level, this is what we expected! You have 2 inputs & 1 output from the source code. We can now go further and open Vivado which is living under the hood in Vitis. Let me show you how you complete this.
Once you see design_1_bd.tcl - Double click it! This will open a new Vivado window which shows you the complete Platform from the combination of Hardware and Software. The software automates this step.
This is awesome! The best part is that you configured your project such that the tool could work its magic.
Vitis to PYNQSince we have a working project and we can see that the correct files exist, let's use [WinSCP] and copy over the files to the Ultra96.
Now let's transpose the files you need - Here are their locations:
TCL File
- vitis_accel_add_sw_v3/U96_platform_system_hw_link/Hardware/binary_container_1.build/link/vivado/vpl/prj/prj.runs/impl_1
HWH File
- vitis_accel_add_sw_v3/U96_platform_system_hw_link/Hardware/binary_container_1.build/link/vivado/vpl/prj/prj.runs/impl_1
Bit File
- vitis_accel_add_sw_v3/U96_platform_system_hw_link/Hardware/binary_container_1.build/link/vivado/vpl/prj/prj.runs/impl_1
XCLBIN File
- vitis_accel_add_sw_v3/U96_accel_app_system/Hardware/
We are accomplishing the following code: A + B = C. With PYNQ we have to set up and allocate space for memory to be allocated. This then creates a simple array of 4 data points. We expect the kernel to operate the add function and alter the two inputs of array A and array B.
SW Codefrom pynq import Overlay
import pynq
import time
from pynq import allocate
from pynq.lib.dma import DMA
import numpy as np
# from PIL import Image
from IPython.display import Image
import matplotlib.pyplot as plt
# Import libs
from pynq import Overlay
from pynq import Device
from pynq import DefaultIP
for i in range(len(pynq.Device.devices)):
print("{}) {}".format(i, pynq.Device.devices[i].name))
0) Ultra96
1) Ultra96
overlay_1 = Overlay("bitfile_hwh_files/vitis_accel_add.xclbin", device=pynq.Device.devices[1])
overlay_1?
#-----------------------
#Bring the bit file in [Do Not NEED]
#-----------------------
# overlay_1 = pynq.Overlay('my_overlay1.bit', device=pynq.Device.devices[0])
# overlay_1 = Overlay("bitfile_hwh_files/test_2.bit")
# overlay_1?
# overlay_2 = Overlay("bitfile_hwh_files/test_3.bit")
# help(overlay_2)
# overlay_3 = Overlay("bitfile_hwh_files/vitis_accel_add.bit")
# overlay_3?
# overlay = Overlay("test_2.xclbin", device=pynq.Device.devices[1])
# overlay?
# Creating a driver for the new kernel function vadd
# Verify you can run this Ip
add_ip = overlay_1.krnl_vadd_1
# add_ip?
add_ip.register_map
RegisterMap {
CTRL = Register(AP_START=0, AP_DONE=0, AP_IDLE=1, AP_READY=0, AUTO_RESTART=0, AP_CONTINUE=0),
in1 = Register(value=1263337472),
in2 = Register(value=1924894720),
out_r = Register(value=967557120),
size = Register(value=4)
}
Load in the allocation lib#Running xrt accelerator code
import pynq
from pynq import allocate
# INITIALIZE - Allocation arroay of at least 5 32 integers
input_buf_1 = pynq.allocate(shape=(4,), dtype='u4')
input_buf_2 = pynq.allocate(shape=(4,), dtype='u4')
#output
outbuf_1 = pynq.allocate(shape=(4,), dtype='u4')
# outbuf_2 = pynq.allocate(shape=(4,), dtype='u4')
#initialize the buffers
for i in range(4):
input_buf_1 [i] = i
input_buf_2 [i] = i + 1
# Write to size
# add_ip.register_map.size = 4
print(input_buf_1)
print(input_buf_2)
[0 1 2 3]
[1 2 3 4]
add_ip.register_map
RegisterMap {
CTRL = Register(AP_START=0, AP_DONE=0, AP_IDLE=1, AP_READY=0, AUTO_RESTART=0, AP_CONTINUE=0),
in1 = Register(value=1263337472),
in2 = Register(value=1924894720),
out_r = Register(value=967557120),
size = Register(value=4)
}
Sync to Device & Start Kernel Call#XRT Framework
input_buf_1.sync_to_device()
input_buf_2.sync_to_device()
# Input 1 | input 2 | output | size
overlay_1.krnl_vadd_1.call(input_buf_1, input_buf_2, outbuf_1, 4)
#Send the data XRT - START
handle = overlay_1.krnl_vadd_1.start(input_buf_1, input_buf_2, outbuf_1, 4)
handle.wait()
outbuf_1.sync_from_device()
outbuf_1
PynqBuffer([1, 3, 5, 7], dtype=uint32)
Output Vitis Acceleration OverlayArray_A = [ 0, 1, 2, 3]
Array_B = [ 1, 2, 3, 4]
Addition_Array_C= [1, 3, 5, 7]
Comments
Please log in or sign up to comment.