With the Vitis High-Level Synthesis (HLS) the general development time for FPGAs can be shortened considerably.
In this project it will be shown how to accelerate a FIR filter on a FPGA using HLS.
In a previous blog post about running a simple Neural Network with HLS, the setup procedure for the KV260 with Pynq has been shown.
All data and pre-built hardware are in the attached GitHub repository
FundamentalsIn digital signal processing a Finite Impulse Response (FIR) Filter has a finite response to any given finite input signal. A FIR filter is constructed with a tapped delay line for delaying the input signal by a given number of taps (N). The z^{-1} is the delay operator from the Z-Transformation
The filter coefficients can be arranged in a impulse response vector.
The output signal can be computed with
or short
Which is the same as the convolution of the input signal with the impulse response
For the filter design the Scipy Cookbook about the lowpass FIR-filter design with python was used.
The filter has been designed with a kaiser window with following properties:
- Cutoff-frequency (f_c) of 10 Hz
- Transition width (∆f) of 5 Hz
- Stopband ripple (A_stop) of 60 dB
The cookbook has been adapted for this project in fir.py
Coefficients:
Frequency Response:
Filtered Signal:
In the cookbook the scipy function scipy.signal.lfilter()
is used for filtering a singal. A pure and non optimized python (with NumPy) implementation would look like:
For the HLS part we filter a signal with length 1024 with a 74 tap filter. With no parallelism we need about 75k cycles to filter the signal.
The C++ code in fir.cpp
for the HLS looks very similar to the Python code. With some code hoisting techniques (put the code when i = 0 outside of the for-loop) HLS can pipeline the outermost loop. If a pipelined loop contains more loops, they will be automatically unrolled.
The Python script fir.py
writes the computed tap coefficients into a C++ header file. For debugging purpose, a test signal and the expected response are also written into the fir.h
header file.
In the post-synthesis report, we see a rather large overhead because even though the loop is pipelined, it takes 1345 cycles to filter a signal of length 1024. This is due to the expensive floating point operations.
To avoid floating point operation the fixed point package from Vitis HLS can be used. In order not to work with fixed point in Python (for communication with Pynq), the input and output of the function is still in float. Input and output must be typecast accordingly. In this project a word width of 32 bits and an integer width of 1 bit is used.
As one in the report can see, the overhead is nearly gone and with 1058 cycles really close the the optimal Latency of 1024 cycles.
As in the previous blog post, generate the hardware with Vitis HLS and Vivado. For the clock frequency use 100 MHz, it can be overclocked afterwards in Pynq.
PynqThe pynq code (fir.ipynb
) is very similar as in the previous blog post. And the system can be overclocked up to 250 MHz
For the plain Python Implementation a huge performance gain of 3160 times has been achieved. For the comparison with lfilter() from scipy (lib) a performance gain of 6.7 times can be achieved.
Comments
Please log in or sign up to comment.