Published May 31, 2018 © GPL3+

Using HLS on an FPGA-Based Image Processing Platform

Building on the Zybo Z7 image processing application. This project demonstrates using HLS with C/C++ to accelerate image processing.

IntermediateFull instructions provided3 hours18,313

Using HLS on an FPGA-Based Image Processing Platform

Things used in this project

Hardware components

Digilent Zybo Z7: Zynq-7000 ARM/FPGA SoC Development Board

Story

Create New Project

To create our HLx Image processing block we will be using the eclipse-based Vivado HLS. Once we have Vivado HLS open, the first thing to do is create a new project and select the correct target device.

Defining the project name and location

Selecting the target design

In this case as we are targeting the Zybo Z7, the target device is the XC7Z020-1CLG400C.

In this project, we are going to create a simple function that converts a color image into a gray scale image. Converting an image to gray scale is a common step for may image processing applications. Working with gray scale images makes the processing less complex when we want to detect elements like edges for example.

Directories

With the project created, we are going to do the following:

Under the source code directory create the following files cvt_colour.cpp and cvt_colour.hpp. These are the files which will be used to create the RTL IP core. The cpp file will contain the actual function while the header file will provide the function definition, along with other common definition and type definitions.

Beneath the Test Bench Directory create a new cpp file called cvt_colour_tb.cpp. This will be the test bench which uses OpenCV functions to test the accelerated function performance.

HLx Project new project.

Interface with AXI Stream

To be able to drop the accelerated IP core into the Zybo Z7 design, we need to be able to interface with an AXI Stream.

Therefore we need the input and output image to be AXI Streams. We can do this in the code using pragma to ensure the HLS compiler instantiates our desired interface. To provide flexibility, we also need the function to be aware of the image size that it is working with.

As such, the function definition becomes:

void image_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM, int rows, int cols)

The AXI_STREAM input is a type definition which implements a AXI Stream interface with side band signals. These side band signals are needed to indicate the start of frame (TUser) and end of line (TLast) indications.

HLS AXI Stream Interface without side band signals

HLS AXI Stream with side band signals

To define the AXI Stream interface with side band signals, we include the following type definition within the cvt_colour.hpp file:

typedef hls::stream<ap_axiu<32, 1, 1, 1> > AXI_STREAM;

The ap_axiu structure is defined with ap_axi_sdata.h which supports signed (ap_axis) and unsigned (ap_axiu) AXI Streams with side bands signals.

The above example creates a 32-bit wide data bus with one-bit wide TUser, TID, and TDest, TLast is included by default.

Color Conversion & Configuration

With the images being received and output as AXI Streams we need to convert the AXI Stream to and from the HLS::Mat format which is required by the HLS::CVTColor function.

We do this using the AXIvideo2Mat and Mat2AXIvideo functions at the necessary points in the code.

For the conversion between AXI Stream and HLS::Mat to work correctly we need to have previously defined the size and type of the HLS::Mat.

Again this is defined with the cvt_colour header file and defines the maximum width and height, along with the number of channels and depth of each channel.

As such the RGB HLS::Mat is defined as being of the type HLS_8UC3 which defines an 8-bit, unsigned, 3-channel structure. While the Gray HLS::Mat is defined as being HLS_8UC1 8-bit, unsigned, 1-channel.

When converting from color to gray we have to be able to accommodate a range of pixel formats. Using the hls::CvtColor conversion function we can convert pixels formatted using either a Red, Green, Blue or a Blue, Green, Red representation.

In this application, as we are working with a BMP input, the pixel formatting is Blue, Green and Red.

hls::CvtColor<HLS_BGR2GRAY>(img_0, img_1);

Final Code

Having explained the AXI Stream IO, the conversion to and from the IO types to types compatible with the color conversion function and the configuration of the conversion function itself. The final code to be accelerated looks as below:

#include "cvt_colour.hpp"
void image_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM, int rows, int cols)
{
#pragma HLS INTERFACE axis port=INPUT_STREAM
#pragma HLS INTERFACE axis port=OUTPUT_STREAM
RGB_IMAGE  img_0(rows, cols);
GRAY_IMAGE img_1(rows, cols);
RGB_IMAGE  img_2(rows, cols);
#pragma HLS dataflow
hls::AXIvideo2Mat(INPUT_STREAM, img_0);
hls::CvtColor<HLS_BGR2GRAY>(img_0, img_1);
hls::CvtColor<HLS_GRAY2RGB>(img_1, img_2);
hls::Mat2AXIvideo(img_2, OUTPUT_STREAM);
}

While the header file contains the following:

#include  "hls_video.h"
#include <ap_fixed.h>
#define MAX_WIDTH  2000
#define MAX_HEIGHT 2000
typedef hls::stream<ap_axiu<32,1,1,1> >           AXI_STREAM;
typedef hls::Mat<MAX_HEIGHT,   MAX_WIDTH,   HLS_8UC3> RGB_IMAGE;
typedef hls::Mat<MAX_HEIGHT,   MAX_WIDTH,   HLS_8UC1> GRAY_IMAGE;
void image_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM, int rows, int cols);

Create Test Bench

All we need to do now is create a test bench.

Within the test bench file, we need to be able to read in a 24-bit BMP file, apply it to the gray scale conversion function and write out the resultant image.

To be able to work with OpenCV we can use the hls_opencv.h library. This provides all the functions we need to open / close / save images and to covert them into types which are compatible with the AXI Streaming types.

The simple test bench code can be seen below:

#include <hls_opencv.h>
#include "cvt_colour.hpp"
#include <iostream>
using namespace std;
int main (int argc, char** argv) {
IplImage* src;
IplImage* dst;
AXI_STREAM src_axi, dst_axi;
src = cvLoadImage("test.bmp");
dst = cvCreateImage(cvGetSize(src), src->depth, src->nChannels);
IplImage2AXIvideo(src, src_axi);
image_filter(src_axi, dst_axi,src->height,src->width);
AXIvideo2IplImage(dst_axi, dst);
cvSaveImage("op.bmp", dst);
cvReleaseImage(&src);
cvReleaseImage(&dst);
}

With both the test bench and source code written the next step is to perform the C simulation.

C simulation

C simulation is much faster than a corresponding RTL simulation and is the first step in the HLS verification process as it enables us to ensure there are no issues in the design before we undertake more time consuming design stages.

Before we can run the C simulation we need to ensure under the source code of the project there is the BMP you wish to convert. Once this is present click the C simulation button.

Click the highlighted button to run C Simulation

The C simulation will run quickly and its results will be available under the directory Solution 1/CSim/Build. In this directory you will notice the file OP.bmp this is the output file from the C simulation and should show the input image but in gray scale.

Location of C Simulation results (left hand side)

Once you are happy with the C simulation performance the next step is to run C synthesis which converts the C source code into a VHDL or Verilog IP block.

Selecting the run C Synthesis option

This might take a little while and will provide a simple report on the device utilization and latency following completion.

Latency and Resource Utilization of the RTL Implementation

Following C synthesis the next stage is Cosimulation, this enables the C test bench to stimulate the generated RTL model and for the results to captured again by the test bench.

Running C / RTL Co Simulation

This ensures we get the same performance from the RTL as we do from the Cosimulation. Again you can find the Cosimulation results under the path: solution 1/Sim/Wrapc.

When I ran the cosimulation and examined the output image file I obtained the following results.

Gray Scale Conversion

All that remains now is to export the IP core and add it into our Vivado design like we do any other IP core.

I will create another project soon showing how we do that.