Fractals are never ending complex patterns, made from repeating simple processes, which appear the same at different levels as they are zoomed in, this is called self-similarity
Fractals occur in significantly in nature, from DNA to frost and snow flakes if examined you will find fractal patterns.
In this project we are going to examine how we can use Vivado HLS to create a Mandelbrot set fractal which can be explored and zoomed in under software control.
We an then integrate this core on any Xilinx development board and display the fractal patterns.
The Mandelbrot SetThe Mandelbrot set is one of the most recognized fractal patterns, along with the Julia Set and the Koch Snow flake.
The Mandelbrot set is a based on a complex number plane and can be created using the description
While it looks complicated this can be pretty easily implemented in software, or even a FPGA as we are about to see.
Creating the IP Core in Vivado HLSTo implement the Mandelbrot Set we first need to create a new Vivado HLS project. It is in this project that we will first implement and test the algorithm using a standard C based flow before performing HLS Synthesis and Co-Simulation to demonstrate the algorithm still works.
We will also be creating a test bench which works allows us to capture the generated image as a BMP so we can view the algorithm is correct.
As such we will be creating three files
- mandelbrot.cpp - the algorithm itself for implementation
- mandelbrot.hpp - header file containing definitions of the types required and the Mandelbrot function
- testbench.cpp - test bench for the algorithm
I want the HLS block to output the image over a AXI Stream as such we need to define the following in the header file.
#include "hls_video.h"
#include <ap_fixed.h>
#define MAX_WIDTH 1280
#define MAX_HEIGHT 720
typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM;
typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> RGB_IMAGE;
void mandel_brot( AXI_STREAM& OUTPUT_STREAM);
Within the header file we can define a HLS::stream type which is 24 bits wide for data and contains the necessary sideband signals for carrying image data. Including TUser and TLast.
We also will be creating a MAT to store the image, this will also be defined in the header file.
In the main body of the file we will be implementing the algorithm itself, we also need to implement a coloring scheme for coloring in the pixels.
To show the flexibility that comes with HLS I am going to use quite a complex coloring scheme to demonstrate how flexible HLS is.
#include "hls_video.h"
#include <ap_fixed.h>
#define MAX_WIDTH 1280
#define MAX_HEIGHT 720
typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM;
typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> RGB_IMAGE;
void mandel_brot( AXI_STREAM& OUTPUT_STREAM, int width, int height);
#include "mandel_brot.hpp"
void mandel_brot(AXI_STREAM& OUTPUT_STREAM)
{
#pragma HLS INTERFACE axis port=OUTPUT_STREAM
#define maxiter = 1000;
int row, col;
RGB_IMAGE img_0(height, width);
typedef hls::Scalar<3, unsigned char> pix;
pix op_pix;
int i;
double real, imag;
double newRe, newIm, oldRe, oldIm;
double zoom = 1, moveX = 0, moveY = 0;
int maxIterations = 1000;
for(row = 0; row < height; row ++ ){
for(col = 0; col < width; col ++){
real = 1.5 * (col - width / 1.3) / (0.5 * zoom * width) + moveX;
imag = (row - height / 2) / (0.5 * zoom * height) + moveY;
newRe = newIm = oldRe = oldIm = 0;
for(i = 0; i < maxIterations; i++)
{
oldRe = newRe;
oldIm = newIm;
newRe = oldRe * oldRe - oldIm * oldIm + real;
newIm = 2 * oldRe * oldIm + imag;
if((newRe * newRe + newIm * newIm) > 4) break;
}
if(i == maxIterations){
op_pix.val[0] = 0;
op_pix.val[1] = 0;
op_pix.val[2] = 0;
}
else
{
double z = sqrt(newRe * newRe + newIm * newIm);
int brightness = 256. * log2(1.75 + i - log2(log2(z))) / log2(double(maxIterations));
op_pix.val[0] = 255;
op_pix.val[1] = brightness;
op_pix.val[2] = brightness;
}
img_0.write(op_pix);
}
}
hls::Mat2AXIvideo(img_0, OUTPUT_STREAM);
}
For each iteration I assign a pixel color based on the number of iteration. As the matrix has three channels one each for RGB I declare a scalar pixel and then write that to the overall image.
Finally the image outputs the frame over the AXIStream - Which is created by the HLS Interface Pragma.
The test bench receives the image and save the received image as a BMP so we can check the algorithm.
#include "mandel_brot.hpp"
#include <hls_opencv.h>
using namespace std;
int main (int argc, char** argv) {
IplImage* src;
IplImage* dst;
AXI_STREAM dst_axi;
src = cvLoadImage("test.bmp");
dst = cvCreateImage(cvGetSize(src), src->depth, src->nChannels);
mandel_brot( dst_axi, 1280, 720);
AXIvideo2IplImage(dst_axi, dst);
cvSaveImage("op.bmp", dst);
cvReleaseImage(&dst);
}
Once these files have been created we are able to run a C Simulation, the resulting image will be available under the directory <project>/Solution1/csim/build
Opening the file op.bmp should show the initial Mandelbrot set colored inline with our algorithm.
One of the great things with the Mandelbrot Set is the ability to zoom in on areas of interest. Setting the algorithm to zoom in with a factor of 625 around X -0.761574 and Y -0.00847596 will show the fractal below.
Increasing the Zoom to 78125 at the same position will also generate the fractal shown below.
Of course these are just C simulations, what we want to do is run HLS and then run co simulation to demonstrate the algorithm is still correct following synthesis to gates.
However, the performance of the solution has a little to be desired as it can take up to 200 seconds to generate a image. Which equates to a frame rate significantly below one frame per second which is unacceptable
HLS OptimizationOf course, this implementation uses a floating point representation as it uses doubles. To be able to efficiently implement the design and ensure the design is achieved in realistic latency we need to implement the algorithm using a fixed point number system.
To be able to represent fractional numbers updated the algorithm to use the ap_(u)fixed type.
The ap_(u)fixed libraries help us work with a fixed point representation as the compiler automatically aligns the decimal points.
We also need to make a few changes to the functions called, the main one being we need to switch out the SQRT function as that is not supported for ap_(u)fixed types. Instead we can instead use the POW function to implement the square root.
To be able to switch between the fixed and floating point implementations a type definition is used so we can see the differences between the implementations.
#include "mandel_brot.hpp"
#include "hls_math.h"
void mandel_brot(AXI_STREAM& OUTPUT_STREAM, fixed_point moveX, fixed_point moveY, fixed_point zoom ) //int width, int height)
{
#pragma HLS INTERFACE s_axilite port=return bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveX bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveY bundle=cmd
#pragma HLS INTERFACE s_axilite port=zoom bundle=cmd
#pragma HLS INTERFACE axis port=OUTPUT_STREAM bundle=VIDEO_OUT
#pragma HLS DATAFLOW
#define maxiter 50
int width = 1280;
int height = 720;
int row= 0, col= 0;
RGB_IMAGE img_0((int)height, (int) width);
typedef hls::Scalar<3, unsigned char> pix;
//typedef float fixed_point;
pix op_pix;
int i;
int maxIterations = maxiter;//after how much iterations the function should stop
fixed_point real_top,real_btm,real, imag,imag_top,imag_btm, newRe, newIm, oldRe, oldIm;
fixed_point brightness, bright_top,log_max_it, z;
mandel_brot_label2:for(row = 0; row < height; row ++ ){
mandel_brot_label1:for(col = 0; col < width; col ++){
real_top = (col - width / (fixed_point) 1.3);
real_btm = ( (fixed_point) 0.5 * zoom * width) ;
real = (fixed_point) 1.5 * (real_top / real_btm) + moveX;
imag_top = (row - height / (fixed_point) 2.0);
imag_btm = ( (fixed_point) 0.5 * zoom * height);
imag = (imag_top / imag_btm) + moveY;
newRe = newIm = oldRe = oldIm = 0;
mandel_brot_label0:for(i = 0; i < maxIterations; i++)
{
oldRe = newRe;
oldIm = newIm;
newRe = oldRe * oldRe - oldIm * oldIm + real;
newIm = 2 * oldRe * oldIm + imag;
if((newRe * newRe + newIm * newIm) > 4) break;
}
if(i == maxIterations){
op_pix.val[0] = 0;
op_pix.val[1] = 0;
op_pix.val[2] = 0;
}
else
{
z= pow((newRe * newRe + newIm * newIm),0.5);
brightness = 256. * log2(1.75 + i)/ log2((maxIterations));
op_pix.val[0] = 255;
op_pix.val[1] = brightness;
op_pix.val[2] = brightness;
}
img_0.write(op_pix);
}
}
hls::Mat2AXIvideo(img_0, OUTPUT_STREAM);
}
Once we have the HLS code optimized the next step is to set the synthesis properties.
We define these options in the Solution Setting dialog window.
The first thing we are going to do is to partition any BRAMS which will be used in the implementation. BRAMS can act as bottle necks in the implementation as they only allow one read and write per access. By partitioning the BRAMS we can perform multiple read and writes in parallel.
The next step is to increase the effort undertake during the binding process, I set this effort to high.
The final setting is to update the scheduling effort, again we set the scheduling effort to high.
We also want to be able to control the X, Y and Zoom position of the Mandelbrot as it runs so we can explore the entire space.
We want to be able to control this over a AXI Lite interface so we can control the position and zoom using register access.
To do this we can use the HLS Interface Pragma and set the type to be s_axilite.
#pragma HLS INTERFACE s_axilite port=return bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveX bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveY bundle=cmd
#pragma HLS INTERFACE s_axilite port=zoom bundle=cmd
When we re-synthesize the HLS optimized source code, we see a significant performance increase in the frame rate.
With a maximum latency of 18.457 mS we can achieve a frame rate of 54 Frames per second, quite an improvement over the original 200 plus seconds between images.
Co SimulationOnce we have the optimization completed and the synthesis completed the next step is to perform a co simulation which will run the test bench against the HLS RTL.
When we run this what we want to see is that the HLS image looks similar to the C simulation one even though we have used a fixed point number system in place of floating point.
With Co Simulation complete and the image acceptable we are able to export the core and implement it within a FPGA or SoC of our choice along with a video output processing chain.
Exporting the IPTo be able to use the IP core in our selected device in Vivado we first need to export the RTL core.
Once we have exported the core we can easily add it to the Vivado IP library and get started on our overall design implementation on any chosen Xilinx FPGA or SoC.
Wrap UpThis project is a little unusual as it is purely based on HLS though it shows how we can use HLS to implement complex algorithms quickly and easily.
We can add in the IP core we have created to any Xilinx FPGA device, and see the Mandelbrot run at the high frames something which requires significant computational power in a processor based solution.
Comments