The Fast Fourier Transform (FFT) is one of the fundamental building blocks of Digital Signal Processing (DSP) and Signal Analysis. Due to its frequent use, many device manufacturers offer code libraries and intellectual property (IP) optimized for their architecture in order to achieve the highest possible performance. Xilinx, for example, offers a customizable FFT IP core that is optimized for their programmable devices. When it comes to general purpose computers there are a few open source code libraries. One such open source library is the Fastest Fourier Transform in the West (FFTW) library which can be obtained from www.fftw.org.
This project looks at the performance of two different single-precision floating-point FFT implementations using the Avnet UltraZed-EG Starter Kit and the Vitis development environment. The first implementation uses version 3.3.8 of the FFTW library compiled for the ARM® Cortex®-A53 processing system (PS) within the Xilinx MPSoC device residing on the UltraZed-EG SoM. The second implementation is an accelerator using the Xilinx LogiCore IP FFT version 9.1 (XFFT) running in the programmable logic (PL) of the MPSoC device.
Project directory structureVitis acceleration is only supported on Linux development hosts. This project will use commands and references based on a development host running Ubuntu 18.04 LTS. Use the following commands to create the project directory structure:
mkdir ~/uz3eg_fft
cd ~/uz3eg_fft
mkdir app kernels platform xclbin
After executing the commands above you should see a directory structure that looks like the image below.
The 2020.1 Vitis platform for the UltraZed-EG Starter Kit can be downloaded here. After the Avnet SharePoint site loads, click on the 2020.1 folder icon, then click on Vitis_Platform, and then on UZ3EG_IOCC_VITIS_2020_1.tar.gz. You should now be at the download page and be able to click on a "Download" button (see image below).
After the platform is downloaded, extract it to the platform directory of the project. The example command below can be used to extract the downloaded platform tarball.
tar -xvzf ~/Downloads/UZ3EG_IOCC_VITIS_2020_1.tar.gz -C ~/uz3eg_fft/platform
Source the Vitis environment scriptBefore compilation is started we need to initialize the Vitis environment by sourcing the settings script from the Vitis install directory. Vitis is typically installed in either the /opt
or /tools
directory. The following command will set up the Vitis environment.
source /tools/Xilinx/Vitis/2020.1/settings64.sh
The command shown above assumes that Vitis is installed in the /tools
directory. Modify the above command as necessary if Vitis is installed in a different location on your machine.
The Xilinx LogiCORE Fast Fourier Transform IP is used for the FFT acceleration kernel. The IP needs to be packaged using the Vivado IP packager before it can be used as an accelerator. Once the IP is packaged it can be compiled into a Xilinx object file (.xo) and used with the Vitis linker. The high-level steps are:
- Package the Xilinx LogiCORE FFT IP using the Vivado IP Packager
- Compile the packaged IP into an xo file using the Vivado package_xo tcl command
Note: the 2 step methodology listed above applies to the creation of RTL kernels per UG1393, Chapter 8.
This project has been set up to use a scripted flow to package the FFT IP and create the.xo file. The following Linux commands will download the necessary scripts and generate the FFT accelerator Xilinx object file. The FFT IP is configured to support single-precision floating-point and power-of-2 FFT sizes from 8 to 16384.
mkdir -p ~/uz3eg_fft/kernels/xfft
cd ~/uz3eg_fft/kernels/xfft
wget https://www.hackster.io/code_files/507565/download -O package_ip.tcl
wget https://www.hackster.io/code_files/507564/download -O package_xo.tcl
wget https://www.hackster.io/code_files/507566/download -O create_ip_xo.sh
wget https://www.hackster.io/code_files/507560/download -O kernel.xml
dos2unix *
bash ./create_ip_xo.sh
If the commands above execute successfully then you will see an fft.xo
file in the ~/uz3eg_fft/kernels/xfft
directory
The purpose of the DMA kernel is to read data from PS memory via an AXI4 memory-mapped interface and convert to an AXI4-Stream interface for the FFT accelerator to consume. The HLS DMA code takes on the form of simple for-loops shown in the code snippet below.
The code shown above has two for-loops: rd_loop
and wr_loop
. The rd_loop
for-loop reads data from a 128-bit input port and writes data on an AXI4-stream interface as two subsequent 64-bit words. Similarly, the wr_loop
for-loop reads two subsequent 64-bit samples from an AXI4-stream interface and combines them into a 128-bit word to write over the AXI4-MM interface. The DATAFLOW pragma in the code shown above informs the Vitis kernel compiler that the loops should execute simultaneously instead of sequentially.
The figure below helps to visualize how the HLS DMA connects between the PS and the FFT accelerator.
A top-level wrapper function named fft_infc
is also created to define interface types. A code snippet is shown below.
The complete code for the HLS DMA is located in the attached fft_infc.cpp
file. The commands shown below will download and compile the HLS DMA accelerator kernel.
mkdir -p ~/uz3eg_fft/kernels/fft_infc/build
cd ~/uz3eg_fft/kernels/fft_infc
wget https://www.hackster.io/code_files/506248/download -O fft_infc.cpp
cd build
v++ \
-t hw \
-c \
--platform ~/uz3eg_fft/platform/UZ3EG_IOCC/UZ3EG_IOCC.xpfm \
--kernel_frequency 300 \
-k fft_infc \
../fft_infc.cpp \
--save-temps \
-o ../fft_infc.xo
cd ..
When the kernel compilation complete there will be an fft_infc.xo
file located in ~/uz3eg_fft/kernels/fft_infc
.
The Vitis linker handles inserting the FFT accelerator and DMA kernels into the PL. The output of the linking phase will be a binary container (xclbin) used to configure the programmable logic portion of the UltraZed-EG Xilinx MPSoC device. The connections between the host (PS) and the accelerator kernels (PL) are specified with a configuration file.
The connections.cfg file attached to this project defines the connections shown in Kernel Connection Diagram. A portion of the connection.cfg file is shown below. Notice that the FFT (fft_1
) and DMA (fft_infc_1
) kernels are connected to clock ID 1.
[clock]
#########################
# clock id 0 = 150 MHz #
# clock id 1 = 300 MHz #
# clock id 2 = 75 MHz #
# clock id 3 = 100 MHz #
# clock id 4 = 200 MHz #
# clock id 5 = 400 MHz #
# clock id 6 = 600 MHz #
#########################
id=1:fft_infc_1.ap_clk
id=1:fft_1.ap_clk
The portion of the connections.cfg file that defines the interconnect between the PS an PL accelerators, as well as the connection between accelerators is shown below.
[connectivity]
################################
# AXI-MM Interfaces #
################################
sp=fft_infc_1.m_axi_gmem1:HP0
sp=fft_infc_1.m_axi_gmem2:HP1
######################################
# Kernel-to-kernel Stream Interfaces #
######################################
sc=fft_infc_1.config:fft_1.S_AXIS_CONFIG
sc=fft_infc_1.strm_out:fft_1.S_AXIS_DATA
sc=fft_1.M_AXIS_DATA:fft_infc_1.strm_in
The sp
tag indicates that the kernel is connecting to a system port, and the sc
tag indicates a streaming connection. Notice that the connections between the kernels are using streaming connections (which is what we want).
The following commands are used to download the connections.cfg file and run the Vitis linker:
mkdir -p ~/uz3eg_fft/xclbin/build
cd ~/uz3eg_fft/xclbin/build
wget https://www.hackster.io/code_files/507575/download -O ../connections.cfg
v++ \
-t hw \
-j $(nproc) \
--link \
--platform ~/uz3eg_fft/platform/UZ3EG_IOCC/UZ3EG_IOCC.xpfm \
--config ../connections.cfg \
../../kernels/xfft/fft.xo \
../../kernels/fft_infc/fft_infc.xo \
--save-temps \
-o fft.xclbin
Download and compile the FFTW libraryThe FFTW software library is an open source code base that supports multiple FFT configurations. This project will compile the FFTW library for single threaded single-precision floating-point operation.
The following commands will download and compile version 3.3.8 of the FFTW library for the A53 processing system
The commands above can be executed by downloading the get_compile_fftw.sh
script using the following Linux commands:
cd ~/uz3eg_fft/app
wget https://www.hackster.io/code_files/507578/download -O get_compile_fftw.sh
dos2unix get_compile_fftw.sh
bash ./get_compile_fftw.sh
The compiled library files are located in ~/uz3eg_fft/app/fftw/fftw-3.3.8/build/install/lib
and the header files are in the ~/uz3eg_fft/app/fftw/fftw-3.3.8/build/install/include
. These files will be necessary for compiling and linking the application.
Several code files are attached to this project to facilitate running the FFTW library on the ARM processing system as well as running the accelerated Xilinx FFT in the programmable logic. The table below summarizes the files.
The test application generates complex exponential test data for validating the FFTW results against the Xilinx FFT accelerator. The test application is depicted in the figure below.
The following commands are used to download source code and compile the application:
mkdir -p ~/uz3eg_fft/app/src
cd ~/uz3eg_fft/app/src
wget https://www.hackster.io/code_files/507650/download -O fft_test.cpp
wget https://www.hackster.io/code_files/507651/download -O fft_test.h
wget https://www.hackster.io/code_files/507652/download -O lnx_time.h
wget https://www.hackster.io/code_files/507653/download -O main.cpp
wget https://www.hackster.io/code_files/507654/download -O xcl2.cpp
wget https://www.hackster.io/code_files/507655/download -O xcl2.hpp
wget https://www.hackster.io/code_files/507656/download -O xfft.hpp
dos2unix *
cd ..
wget https://www.hackster.io/code_files/507579/download -O compile_app.sh
dos2unix compile_app.sh
bash ./compile_app.sh
An fft_test.exe
file will be created in the ~/uz3eg_fft/app
directory upon successful compilation.
The final step in the process is to generate the SD card image file. The following commands will generate the SD card image.
PLATFORM=~/uz3eg_fft/platform/UZ3EG_IOCC/UZ3EG_IOCC.xpfm
ROOTFS=~/uz3eg_fft/platform/UZ3EG_IOCC/sw/UZ3EG_IOCC/PetaLinux/rootfs/rootfs.ext4
cd ~/uz3eg_fft/xclbin/build
v++ -t hw --platform $PLATFORM \
--package fft.xclbin \
--package.out_dir ./ \
--package.boot_mode sd \
--package.image_format ext4 \
--package.rootfs $ROOTFS \
--package.sd_file ~/uz3eg_fft/app/fft_test.exe
The SD card image file is output at ~/uz3eg_fft/xclbin/build/sd_card.img
. The dd
command can be used to write this SD card image file to a blank SD card. An example command is shown below and assumes that the SD card is enumerated as /dev/sdX
in the Linux host system. This will need to be modified for your system.
cd ~/uz3eg_fft/xclbin/build
sudo dd if=sd_card.img of=/dev/sdX status=progress
sudo eject /dev/sdX
Run the benchmarking applicationAfter inserting the SD card into the UltraZed-EG IOCC SD card slot, power on the board and then login as the root user (root password is root).
After logging in, navigate to the /mnt/sd-mmcblk1p1
directory, and source the init.sh
script to set up environment variables.
cd /mnt/sd-mmcblk1p1
source ./init.sh
It's helpful to change the Linux kernel message level in order to reduce the non-application console output. This can be done with the dmesg
command shown below.
dmesg -n 1
The next step is to launch the application and choose the execution mode (single FFT size execution, or benchmark). The benchmark option will execute every power-of-2 FFT size from 8 to 16384. The figure below shows the execution of a 1024-point FFT.
The figure below shows example output for the 16384 FFT size.
This project presented two options for implementing the Fast Fourier Transform in the Zynq UltraScale+ MPSoC device family from Xilinx. The first option was a software only implementation using the FFTW open-source code library which runs on the ARM Cortex-A53 processor. The second option used the Xilinx LogiCORE IP to implement the FFT in programmable logic as an accelerator.
Significant gains were achieved with the FFT accelerator which boosted execution time performance by a factor of 22 for the 16384-point FFT size. The table below summarizes average processing times for each power-of-2 FFT size when processing a 128MB buffer of data.
Note: the execution time is the average time for the FFT size, not the total time to process the buffer. For example, a 128MB buffer has enough data for 1024 16384-point FFTs. The execution time in that example would be the time it takes on average to perform a 16384-point FFT.
Additional Notes- This project targeted the UltraZed-EG, but it can easily be ported to any board that has a Vitis platform.
- (Added 7/16/2021) For 2020.2 the following updates need to be manually made:
1 - Change the last line of the connections.cfg file to
prop=run.impl_1.strategy=Performance_ExploreWithRemap
2 - The SD card packaging command will rename the fft.xclbin file to a.xclbin. You will need to manually rename the /mnt/sd-mmcblk1p1/a.xclbin file to fft.xclbin once the board is booted prior to running the application on hardware
Comments