MicroZed Chronicles: HLS Overlay Generation
Creating image overlays with Vivado HLS.
One of the great things about image processing is that we can layer video streams on top of each other. This gives us the ability to do picture in picture and to overlay text and graphics on the screen.
When it comes to displaying information, typically the information we want to display will result from sensor data which has been gathered by a processor; for example, temperature, pressure, altitude, navigation information, etc.
Displaying this information can be achieved in several different ways. If the processor is capable enough, it can read the sensors, process the information, and then create its own frame buffer in DDR memory which can be applied as an overlay to the output image.
In the Xilinx ecosystem, we can use a Video Mixer IP to merge live video with additional video layers. Using the Video Mixer IP, input video streams can be supplied from either live AXIS stream or AXI Memory Map in DDR memory.
If the processor does not have the capability to achieve the processing in the required processing deadlines, we can leverage Vivado HLS to create an overlay generation IP block.
In this case, all the processor needs to supply is the information to be displayed and the HLS IP core will create the overlay. This frees up the processor significantly to focus on other tasks.
How to do this is what we are going to be exploring in the remainder of the blog. This example will display two characters on the screen, demonstrating the ease with which it can be implemented. The solution is also very efficient when it comes to the use of block RAM and other resources.
The first thing we need to do is determine the interfacing of the Overlay Creation IP block. In this instance, the module is going to have the following interfaces:
- Number of pixels on a line
- Number of lines in a image
- Character 1 to display
- Character 2 to display
- Control and status interface
- AXI stream image output
void hud_gen(axis& op, int row, int column, int char_1, int char_2) {
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE s_axilite port=char_1
#pragma HLS INTERFACE s_axilite port=char_2
#pragma HLS INTERFACE s_axilite port=column
#pragma HLS INTERFACE s_axilite port=row
#pragma HLS INTERFACE axis register both port=op
Displaying the character is the challenging part of this as, we are targeting hardware. As I was developing the solution, I kept in mind the end solution would be targeting a FPGA. As such, I wanted to use the block RAM to store the characters for display on the screen.
To ensure the block RAM utilization was sensible, I used small 10 pixels by 11 line arrays in C to contain the character to be output.
Each of these arrays contained the RGB code for the pixel along with an Alpha value for each pixel.
These characters are defined within their own header file to simplify the file strucutre.
The HLS IP core is then able to output each pixel several times to make the actual image readable on the overlay. This provides a efficient and salable character display solution.
The AXI stream output from the IP module will output pixel data as RGBA that means along with the RGB channels we also have a pixel by pixel alpha channel.
This enables the video mixer to mix the live video and the overlay on a pixel by pixel basis.
The output AXI stream is therefore 32 bits, which is important to remember when we create the test bench.
The main loop of the code in the body uses the techniques I have demonstrated previously to generate and output a AXI Stream (example one and two).
When it comes to creating a test bench, we want the test bench to be able to capture the output image and display it is a BMP so we can check it is outputting data correctly.
As we are using 32 bit pixels, I used the IPL color depth of 32-bit signed. This means the test pattern image will be black and white, but it will show the correct outputting of the characters and border.
#include "hud.h"
#include <hls_opencv.h>
int main (int argc, char** argv) {
IplImage* src;
IplImage* dst;
axis dst_axi;
int y;
dst = cvCreateImage(cvSize(1920,1080),IPL_DEPTH_32S, 1);
hud_gen( dst_axi, 1080, 1920, 3, 7);
AXIvideo2IplImage(dst_axi, dst);
cvSaveImage("op.bmp", dst);
cvReleaseImage(&dst);
}
The C simulation result can be seen below, the co-simulation results are the same.
Once the test bench demonstrates the desired performance, the next stage is to synthesize the design so we can include it in our Vivado design.
When this was integrated into a image processing project, the Video Mixer IP was able to mix the live video stream with the HLS generated overlay.
A MicroBlaze processor was used to read in temperature values and pass these to the Overlay Generation IP block.
The output video can be seen below, as I covered and released the temperature sensor the display on screen updated in real-time.
The code for the HLS overlay Generation IP core can be found on my GitHub.
See My FPGA / SoC Projects: Adam Taylor on Hackster.io
Access the MicroZed Chronicles Archives with over 300 articles on the FPGA / Zynq / Zynq MpSoC updated weekly at MicroZed Chronicles.