Published July 26, 2022 © Apache-2.0

FPGA FIR-Filter | HLS | Kria KV260 | Pynq

Designing a fully pipelined and parallel FIR filter with float and fixed-point datatype. Using the Kria KV260 FPGA, HLS and Pynq

IntermediateFull instructions provided3 hours3,089

FPGA FIR-Filter | HLS | Kria KV260 | Pynq

Things used in this project

Hardware components

AMD Kria KV260 Vision AI Starter Kit

AMD-Xilinx - Kria KV260 Basic Accessory Pack

Software apps and online services

AMD Vivado Design Suite

AMD PYNQ Framework

AMD-Xilinx - Vitis HLS

Story

Introduction

With the Vitis High-Level Synthesis (HLS) the general development time for FPGAs can be shortened considerably.

In this project it will be shown how to accelerate a FIR filter on a FPGA using HLS.

In a previous blog post about running a simple Neural Network with HLS, the setup procedure for the KV260 with Pynq has been shown.

All data and pre-built hardware are in the attached GitHub repository

Fundamentals

In digital signal processing a Finite Impulse Response (FIR) Filter has a finite response to any given finite input signal. A FIR filter is constructed with a tapped delay line for delaying the input signal by a given number of taps (N). The z^{-1} is the delay operator from the Z-Transformation

The filter coefficients can be arranged in a impulse response vector.

The output signal can be computed with

or short

Which is the same as the convolution of the input signal with the impulse response

For the filter design the Scipy Cookbook about the lowpass FIR-filter design with python was used.

The filter has been designed with a kaiser window with following properties:

Cutoff-frequency (f_c) of 10 Hz
Transition width (∆f) of 5 Hz
Stopband ripple (A_stop) of 60 dB

Filter design with Kaiser window. Own presentment, inspired by title={Introduction to signal processing}, publisher={Prentice Hall}, author={Orfanidis, Sophocles J.}, year={1998}

The cookbook has been adapted for this project in fir.py

Coefficients:

Frequency Response:

Filtered Signal:

The final plots shows the original signal (thin blue line), the filtered signal (shifted by the appropriate phase delay to align with the original signal; thin red line), and the "good" part of the filtered signal (heavy green line). The "good part" is the part of the signal that is not affected by the initial conditions.

In the cookbook the scipy function scipy.signal.lfilter() is used for filtering a singal. A pure and non optimized python (with NumPy) implementation would look like:

High Level Synthesis

For the HLS part we filter a signal with length 1024 with a 74 tap filter. With no parallelism we need about 75k cycles to filter the signal.

The C++ code in fir.cpp for the HLS looks very similar to the Python code. With some code hoisting techniques (put the code when i = 0 outside of the for-loop) HLS can pipeline the outermost loop. If a pipelined loop contains more loops, they will be automatically unrolled.

The Python script fir.py writes the computed tap coefficients into a C++ header file. For debugging purpose, a test signal and the expected response are also written into the fir.h header file.

In the post-synthesis report, we see a rather large overhead because even though the loop is pipelined, it takes 1345 cycles to filter a signal of length 1024. This is due to the expensive floating point operations.

To avoid floating point operation the fixed point package from Vitis HLS can be used. In order not to work with fixed point in Python (for communication with Pynq), the input and output of the function is still in float. Input and output must be typecast accordingly. In this project a word width of 32 bits and an integer width of 1 bit is used.

As one in the report can see, the overhead is nearly gone and with 1058 cycles really close the the optimal Latency of 1024 cycles.

Vitis HLS & Vivado

As in the previous blog post, generate the hardware with Vitis HLS and Vivado. For the clock frequency use 100 MHz, it can be overclocked afterwards in Pynq.

Pynq

The pynq code (fir.ipynb) is very similar as in the previous blog post. And the system can be overclocked up to 250 MHz

For the plain Python Implementation a huge performance gain of 3160 times has been achieved. For the comparison with lfilter() from scipy (lib) a performance gain of 6.7 times can be achieved.

Credits

Michael Schmid

3 projects • 12 followers

Embedded AI Enthusiast

Contact

Thanks to IMES Institute of Microelectronics and Embedded Systems and HLS Book.

Comments

Please log in or sign up to comment.

FPGA FIR-Filter | HLS | Kria KV260 | Pynq

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Fundamentals

High Level Synthesis

Vitis HLS & Vivado

Pynq

Code

FIR-FIlter_HLS

Credits

Michael Schmid

Comments

Embed the widget on your own site

FPGA FIR-Filter | HLS | Kria KV260 | Pynq

FPGA FIR-Filter | HLS | Kria KV260 | Pynq

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Fundamentals

High Level Synthesis

Vitis HLS & Vivado

Pynq

Code

FIR-FIlter_HLS

Credits

Michael Schmid

Comments

Related channels and tags