Published December 14, 2019 © GPL3+

Fun with Fractals

Fractals are great patterns to recreate in FPGA. Let's look at what they are and how to implement them.

BeginnerFull instructions provided3 hours3,543

Things used in this project

Hardware components

AMD PYNQ-Z2 board

Digilent Zybo Z7: Zynq-7000 ARM/FPGA SoC Development Board

Software apps and online services

AMD Vivado Design Suite

Story

Introduction

Fractals are never ending complex patterns, made from repeating simple processes, which appear the same at different levels as they are zoomed in, this is called self-similarity

Fractals occur in significantly in nature, from DNA to frost and snow flakes if examined you will find fractal patterns.

In this project we are going to examine how we can use Vivado HLS to create a Mandelbrot set fractal which can be explored and zoomed in under software control.

We an then integrate this core on any Xilinx development board and display the fractal patterns.

The Mandelbrot Set

The Mandelbrot set is one of the most recognized fractal patterns, along with the Julia Set and the Koch Snow flake.

The Mandelbrot set is a based on a complex number plane and can be created using the description

While it looks complicated this can be pretty easily implemented in software, or even a FPGA as we are about to see.

Creating the IP Core in Vivado HLS

To implement the Mandelbrot Set we first need to create a new Vivado HLS project. It is in this project that we will first implement and test the algorithm using a standard C based flow before performing HLS Synthesis and Co-Simulation to demonstrate the algorithm still works.

We will also be creating a test bench which works allows us to capture the generated image as a BMP so we can view the algorithm is correct.

As such we will be creating three files

mandelbrot.cpp - the algorithm itself for implementation
mandelbrot.hpp - header file containing definitions of the types required and the Mandelbrot function
testbench.cpp - test bench for the algorithm

I want the HLS block to output the image over a AXI Stream as such we need to define the following in the header file.

#include  "hls_video.h"
#include <ap_fixed.h>
#define MAX_WIDTH  1280
#define MAX_HEIGHT 720
typedef hls::stream<ap_axiu<24,1,1,1> >     AXI_STREAM;
typedef hls::Mat<MAX_HEIGHT,   MAX_WIDTH,   HLS_8UC3> RGB_IMAGE;
void mandel_brot( AXI_STREAM& OUTPUT_STREAM);

Within the header file we can define a HLS::stream type which is 24 bits wide for data and contains the necessary sideband signals for carrying image data. Including TUser and TLast.

We also will be creating a MAT to store the image, this will also be defined in the header file.

In the main body of the file we will be implementing the algorithm itself, we also need to implement a coloring scheme for coloring in the pixels.

To show the flexibility that comes with HLS I am going to use quite a complex coloring scheme to demonstrate how flexible HLS is.

#include  "hls_video.h"
#include <ap_fixed.h>
#define MAX_WIDTH  1280
#define MAX_HEIGHT 720
typedef hls::stream<ap_axiu<24,1,1,1> >           AXI_STREAM;
typedef hls::Mat<MAX_HEIGHT,   MAX_WIDTH,   HLS_8UC3> RGB_IMAGE;
void mandel_brot( AXI_STREAM& OUTPUT_STREAM, int width, int height);
#include "mandel_brot.hpp"
void mandel_brot(AXI_STREAM& OUTPUT_STREAM)
{
#pragma HLS INTERFACE axis port=OUTPUT_STREAM
#define maxiter = 1000;
int row, col;
RGB_IMAGE  img_0(height, width);
typedef hls::Scalar<3, unsigned char>  pix;
pix op_pix;
int i;
double real, imag;                   
double newRe, newIm, oldRe, oldIm;   
double zoom = 1, moveX = 0, moveY = 0; 
int maxIterations = 1000;
for(row = 0; row < height; row ++ ){
    for(col = 0; col < width; col ++){
        real = 1.5 * (col - width / 1.3) / (0.5 * zoom * width) + moveX;
        imag = (row - height / 2) / (0.5 * zoom * height) + moveY;
        newRe = newIm = oldRe = oldIm = 0; 
        for(i = 0; i < maxIterations; i++)
        {
            oldRe = newRe;
            oldIm = newIm;
            newRe = oldRe * oldRe - oldIm * oldIm + real;
            newIm = 2 * oldRe * oldIm + imag;
            if((newRe * newRe + newIm * newIm) > 4) break;
            }
            if(i == maxIterations){
                op_pix.val[0] = 0;
                op_pix.val[1] = 0;
                op_pix.val[2] = 0;
            }
            else
            {
                double z = sqrt(newRe * newRe + newIm * newIm);
                int brightness = 256. * log2(1.75 + i - log2(log2(z))) / log2(double(maxIterations));
                op_pix.val[0] = 255;
                op_pix.val[1] = brightness;
                op_pix.val[2] = brightness;
            }
            img_0.write(op_pix);
          }
      }
    hls::Mat2AXIvideo(img_0, OUTPUT_STREAM);
}

For each iteration I assign a pixel color based on the number of iteration. As the matrix has three channels one each for RGB I declare a scalar pixel and then write that to the overall image.

Finally the image outputs the frame over the AXIStream - Which is created by the HLS Interface Pragma.

The test bench receives the image and save the received image as a BMP so we can check the algorithm.

#include "mandel_brot.hpp"
#include <hls_opencv.h>

using namespace std;
int main (int argc, char** argv) {
IplImage* src;
IplImage* dst;
AXI_STREAM  dst_axi;
src = cvLoadImage("test.bmp");
dst = cvCreateImage(cvGetSize(src), src->depth, src->nChannels);
mandel_brot( dst_axi, 1280, 720);
AXIvideo2IplImage(dst_axi, dst);
cvSaveImage("op.bmp", dst);
cvReleaseImage(&dst);
}

Once these files have been created we are able to run a C Simulation, the resulting image will be available under the directory <project>/Solution1/csim/build

Opening the file op.bmp should show the initial Mandelbrot set colored inline with our algorithm.

C Simulation Output

One of the great things with the Mandelbrot Set is the ability to zoom in on areas of interest. Setting the algorithm to zoom in with a factor of 625 around X -0.761574 and Y -0.00847596 will show the fractal below.

zoom = 625, moveX = -0.761574, moveY = -0.0847596;

Increasing the Zoom to 78125 at the same position will also generate the fractal shown below.

zoom = 78125, moveX = -0.761574, moveY = -0.0847596;

Of course these are just C simulations, what we want to do is run HLS and then run co simulation to demonstrate the algorithm is still correct following synthesis to gates.

HLS Utilization Figures

However, the performance of the solution has a little to be desired as it can take up to 200 seconds to generate a image. Which equates to a frame rate significantly below one frame per second which is unacceptable

HLS Optimization

Of course, this implementation uses a floating point representation as it uses doubles. To be able to efficiently implement the design and ensure the design is achieved in realistic latency we need to implement the algorithm using a fixed point number system.

To be able to represent fractional numbers updated the algorithm to use the ap_(u)fixed type.

Fixed Point Representation in HLS

The ap_(u)fixed libraries help us work with a fixed point representation as the compiler automatically aligns the decimal points.

We also need to make a few changes to the functions called, the main one being we need to switch out the SQRT function as that is not supported for ap_(u)fixed types. Instead we can instead use the POW function to implement the square root.

To be able to switch between the fixed and floating point implementations a type definition is used so we can see the differences between the implementations.

#include "mandel_brot.hpp"
#include "hls_math.h"
void mandel_brot(AXI_STREAM& OUTPUT_STREAM, fixed_point moveX, fixed_point moveY, fixed_point zoom ) //int width, int height)
{
#pragma HLS INTERFACE s_axilite port=return bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveX bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveY bundle=cmd
#pragma HLS INTERFACE s_axilite port=zoom bundle=cmd
#pragma HLS INTERFACE axis port=OUTPUT_STREAM bundle=VIDEO_OUT
#pragma HLS DATAFLOW
#define maxiter 50
int width = 1280;
int height = 720;
int row= 0, col= 0;
RGB_IMAGE  img_0((int)height, (int) width);
typedef hls::Scalar<3, unsigned char>  pix;
//typedef float fixed_point;
pix op_pix;
int i;
int maxIterations = maxiter;//after how much iterations the function should stop
fixed_point real_top,real_btm,real, imag,imag_top,imag_btm, newRe, newIm, oldRe, oldIm;
fixed_point  brightness, bright_top,log_max_it, z;
mandel_brot_label2:for(row = 0; row < height; row ++ ){
    mandel_brot_label1:for(col = 0; col < width; col ++){
        real_top = (col - width / (fixed_point) 1.3);
        real_btm =  ( (fixed_point) 0.5 * zoom * width) ;
        real =  (fixed_point) 1.5 *  (real_top / real_btm) + moveX;
        imag_top = (row - height /  (fixed_point) 2.0);
        imag_btm = ( (fixed_point) 0.5 * zoom * height);
        imag = (imag_top / imag_btm) + moveY;
        newRe = newIm = oldRe = oldIm = 0;
        mandel_brot_label0:for(i = 0; i < maxIterations; i++)
    {
    oldRe = newRe;
    oldIm = newIm;
    newRe = oldRe * oldRe - oldIm * oldIm + real;
    newIm = 2 * oldRe * oldIm + imag;
    if((newRe * newRe + newIm * newIm) > 4) break;
    }
    if(i == maxIterations){
        op_pix.val[0] = 0;
        op_pix.val[1] = 0;
        op_pix.val[2] = 0;
    }
    else
    {
     z= pow((newRe * newRe + newIm * newIm),0.5);
    brightness = 256. * log2(1.75 + i)/ log2((maxIterations));
    op_pix.val[0] = 255;
    op_pix.val[1] = brightness;
    op_pix.val[2] = brightness;
    }
    img_0.write(op_pix);
    }
}
hls::Mat2AXIvideo(img_0, OUTPUT_STREAM);
}

Once we have the HLS code optimized the next step is to set the synthesis properties.

We define these options in the Solution Setting dialog window.

The first thing we are going to do is to partition any BRAMS which will be used in the implementation. BRAMS can act as bottle necks in the implementation as they only allow one read and write per access. By partitioning the BRAMS we can perform multiple read and writes in parallel.

Setting the Array Partition

The next step is to increase the effort undertake during the binding process, I set this effort to high.

Setting the Binding Effort

The final setting is to update the scheduling effort, again we set the scheduling effort to high.

Setting the schedule effort to hig

We also want to be able to control the X, Y and Zoom position of the Mandelbrot as it runs so we can explore the entire space.

We want to be able to control this over a AXI Lite interface so we can control the position and zoom using register access.

To do this we can use the HLS Interface Pragma and set the type to be s_axilite.

#pragma HLS INTERFACE s_axilite port=return bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveX bundle=cmd
#pragma HLS INTERFACE s_axilite port=moveY bundle=cmd
#pragma HLS INTERFACE s_axilite port=zoom bundle=cmd

Interfaces on the completed block diagram

Updated Performance Following Optimization

When we re-synthesize the HLS optimized source code, we see a significant performance increase in the frame rate.

With a maximum latency of 18.457 mS we can achieve a frame rate of 54 Frames per second, quite an improvement over the original 200 plus seconds between images.

Co Simulation

Once we have the optimization completed and the synthesis completed the next step is to perform a co simulation which will run the test bench against the HLS RTL.

Co Simulation Results In Vivado

When we run this what we want to see is that the HLS image looks similar to the C simulation one even though we have used a fixed point number system in place of floating point.

Co Simulation Results

With Co Simulation complete and the image acceptable we are able to export the core and implement it within a FPGA or SoC of our choice along with a video output processing chain.

Exporting the IP

To be able to use the IP core in our selected device in Vivado we first need to export the RTL core.

Once we have exported the core we can easily add it to the Vivado IP library and get started on our overall design implementation on any chosen Xilinx FPGA or SoC.

Wrap Up

This project is a little unusual as it is purely based on HLS though it shows how we can use HLS to implement complex algorithms quickly and easily.

We can add in the IP core we have created to any Xilinx FPGA device, and see the Mandelbrot run at the high frames something which requires significant computational power in a processor based solution.

Adam Taylor

133 projects • 2276 followers

Adam Taylor is an expert in design and development of embedded systems and FPGA’s for several end applications (Space, Defense, Automotive)

Fun with Fractals

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

The Mandelbrot Set

Creating the IP Core in Vivado HLS

HLS Optimization

Co Simulation

Exporting the IP

Wrap Up

Credits

Adam Taylor

Comments

Embed the widget on your own site

Fun with Fractals

Fun with Fractals

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

The Mandelbrot Set

Creating the IP Core in Vivado HLS

HLS Optimization

Co Simulation

Exporting the IP

Wrap Up

Credits

Adam Taylor

Comments

Related channels and tags