- Part 1: Straight to the Finish Line
- Part 2: Customize our Core
- Part 3: Creating an IP Core Manually
- Part 4: Raw AXI Streams
- Part 5: AXI Video Streams(you are here)
In the previous part, we built an IP core that sent and received raw AXI Stream data. In this version, we are going to create a core that outputs video over an AXI Stream. This is helpful because video frames in AXI Streams can be used by existing IP including stream to frame generators, video mixers, HLS image processing cores, VDMAs, etc...
I was inspired to create this entire series of projects by Project F, an excellent site discussing various aspects of FPGA development, in particular, this post about generating graphics on an FPGA. The writeup was well written and I enjoyed creating a graphical core but I feel the core could be improved by adding an AXI Lite interface to configure the core and an AXI Stream output to write the video frames.
In this post, we'll review how the code for the graphics project generates a video AXI Stream. We'll look through the simulations and then generate a Vivado IP core. We'll then incorporate the IP into a Pynq Z2 project that outputs video to an HDMI display, bring it over to the Pynq Z2 board and exercise it with Jupyter Notebook.
This post builds on previous posts, if you haven't gone through the previous posts I would suggest doing that before continuing.
At a high levelIn this post we will do the following:
- Review the AXI graphics core
- Simulate the core
- Generate an IP core
- Incorporate the IP core in a Pynq Z2 Project
- Upload the project to Pynq
- Demonstrate the image
In the IP-core repo move to the 'axi_graphics' directory.
cd <ip-cores>/cores/axi_graphics
in the 'hdl' directory we'll find the 'axi_graphics.v' core along with the boilerplate AXI support files.
axi_defines.v axi_graphics.v axi_lite_slave.v
The 'axi_defines.v' and 'axi_lite_slave.v' files are what you would expect, the only big difference is the axi_graphics.v core.
axi_graphics.v
As mentioned above this core was influenced by this post so if you want to learn more about generating graphics (and even expand on this) you will find a much better writeup there, I'll only be going over the changes that were needed to get the graphics core compatible with AXI lite and AXI streams.
Defines
...
`define DEFAULT_WIDTH 640
`define DEFAULT_HEIGHT 480
//Given a 100MHz reference clock, 60Hz Output
`define DEFAULT_INTERVAL 1666666
`define CTL_BIT_ENABLE 0
`define CTL_BIT_RGBA_FMT 1
...
There are other defines related to graphics but the most relevant to the video output are the image size and the timer for the 60Hz output video. The value can be changed to suit your board. You can also make this into a parameter that can be configured by Vivado itself.
Ports
...
//Read Data Channel
output reg o_axis_out_tuser,
output o_axis_out_tvalid,
input i_axis_out_tready,
output [AXIS_DATA_WIDTH - 1: 0]o_axis_out_tdata,
output o_axis_out_tlast
);
...
The only other part added to the top-level ports is the video output stream, this time we will make use of the 'tuser' and 'tlast' signals.
- 'tuser': indicates the start of a new frame. This signal is high when the 'tvalid' signal is high along with the first piece of data in 'tdata'. When the receiver asserts 'tready' then the 'tuser' will go low and stay low until the sender sends a new frame.
- 'tlast': This signal goes high at the end of a row. Not just the end of the frame.
Parameters
//Address Map
localparam REG_CONTROL = 0 << 2;
localparam REG_STATUS = 1 << 2;
localparam REG_WIDTH = 2 << 2;
localparam REG_HEIGHT = 3 << 2;
localparam REG_INTERVAL = 4 << 2;
localparam REG_MODE_SEL = 5 << 2;
localparam REG_XY_REF0 = 6 << 2;
localparam REG_XY_REF1 = 7 << 2;
localparam REG_FG_COLOR_REF = 8 << 2;
localparam REG_BG_COLOR_REF = 9 << 2;
localparam REG_ALPHA = 10 << 2;
localparam REG_ANIMATE = 11 << 2;
localparam REG_VERSION = 20 << 2;
localparam MAX_ADDR = REG_VERSION;
//State Machine
localparam IDLE = 0;
localparam DRAW = 1;
localparam END_LINE = 2;
//registers/wires
localparam MODE_BLACK = 0;
localparam MODE_WHITE = 1;
localparam MODE_RED = 2;
localparam MODE_GREEN = 3;
localparam MODE_BLUE = 4;
localparam MODE_CB = 5;
localparam MODE_SQUARE = 6;
localparam MODE_RAMP = 7;
//localparam MODE_ANIMATE = 8;
We use the local parameters to define a lot of user-configurable registers. We also define our states as well as a bunch of modes for the graphics core:
- MODE_BLACK, MODE_WHITE... MODE_BLUE will fill the screen with that particular color
- MODE_CB: Color Bars
- MODE_SQUARE: Draw a square to the screen
- MODE_RAMP: Draw an incrementing horizontal ramp
- MODE_ANIMATE: An unimplemented Animate mode
AXI Register Interface
Normally I don't like posting a wall of code but I'll make an exception in this case because although there are a lot of user-configurable registers the code is still easy to grasp. For both the host->core and core->host situations there are case statements that will decode the register and put/get the data to/from the right place.
...
if (w_reg_in_rdy) begin
//From master
case (w_reg_address)
REG_CONTROL: begin
r_enable <= w_reg_in_data[`CTL_BIT_ENABLE];
r_rgba_format <= w_reg_in_data[`CTL_BIT_RGBA_FMT];
end
REG_WIDTH: begin
r_width <= w_reg_in_data[WIDTH_SIZE - 1: 0];
end
REG_HEIGHT: begin
r_height <= w_reg_in_data[HEIGHT_SIZE - 1: 0];
end
REG_INTERVAL: begin
r_interval <= w_reg_in_data[INTERVAL_SIZE - 1: 0];
end
REG_MODE_SEL: begin
r_mode <= w_reg_in_data;
end
REG_XY_REF0: begin
r_ref_x0 <= w_reg_in_data[`BM_REF_X];
r_ref_y0 <= w_reg_in_data[`BM_REF_Y];
end
REG_XY_REF1: begin
r_ref_x1 <= w_reg_in_data[`BM_REF_X];
r_ref_y1 <= w_reg_in_data[`BM_REF_Y];
end
REG_FG_COLOR_REF: begin
r_ref_fg_color <= w_reg_in_data;
end
REG_BG_COLOR_REF: begin
r_ref_bg_color <= w_reg_in_data;
end
REG_ALPHA: begin
r_alpha <= w_reg_in_data[7:0];
end
// REG_ANIMATE: begin
// r_animate_en <= w_reg_in_data[`ANMT_BIT_ENABLE];
// r_animate_x_dir <= w_reg_in_data[`ANMT_BIT_X_DIR];
// r_animate_y_dir <= w_reg_in_data[`ANMT_BIT_Y_DIR];
// r_animate_bounce <= w_reg_in_data[`ANMT_BIT_BOUNT];
// r_animate_count_div <= w_reg_in_data[`ANMT_BR_COUNT_DIV];
// r_animate_x_step <= w_reg_in_data[`ANMT_BR_X_STEP];
// r_animate_y_step <= w_reg_in_data[`ANMT_BR_Y_STEP];
// end
default: begin
$display ("Unknown address: 0x%h", w_reg_address);
end
endcase
if (w_reg_address > MAX_ADDR) begin
//Tell the host they wrote to an invalid address
r_reg_invalid_addr <= 1;
end
//Tell the AXI Slave Control we're done with the data
r_reg_in_ack <= 1;
end
else if (w_reg_out_req) begin
//To master
//$display("User is reading from address 0x%0h", w_reg_address);
case (w_reg_address)
REG_CONTROL: begin
r_reg_out_data <= 32'h0;
r_reg_out_data[`CTL_BIT_ENABLE] <= r_enable;
r_reg_out_data[`CTL_BIT_RGBA_FMT] <= r_rgba_format;
end
REG_STATUS: begin
r_reg_out_data <= 32'h0;
end
REG_WIDTH: begin
r_reg_out_data <= 32'h0;
r_reg_out_data[WIDTH_SIZE - 1:0]<= r_width;
end
REG_HEIGHT: begin
r_reg_out_data <= 32'h0;
r_reg_out_data[HEIGHT_SIZE - 1:0] <= r_height;
end
REG_INTERVAL: begin
r_reg_out_data[INTERVAL_SIZE - 1:0] <= r_interval;
end
REG_MODE_SEL: begin
r_reg_out_data <= r_mode;
end
REG_XY_REF0: begin
r_reg_out_data <= 32'h00;
r_reg_out_data[`BM_REF_X] <= r_ref_x0;
r_reg_out_data[`BM_REF_Y] <= r_ref_y0;
end
REG_XY_REF1: begin
r_reg_out_data <= 32'h00;
r_reg_out_data[`BM_REF_X] <= r_ref_x1;
r_reg_out_data[`BM_REF_Y] <= r_ref_y1;
end
REG_FG_COLOR_REF: begin
r_reg_out_data <= r_ref_fg_color;
end
REG_BG_COLOR_REF: begin
r_reg_out_data <= r_ref_bg_color;
end
REG_ALPHA: begin
r_reg_out_data <= {24'h0, r_alpha};
end
// REG_ANIMAGE: begin
// r_reg_out_data <= 32'h0;
// r_reg_out_data[`ANMT_BIT_ENABLE] <= r_animate_en;
// r_reg_out_data[`ANMT_BIT_X_DIR] <= r_animate_x_dir;
// r_reg_out_data[`ANMT_BIT_Y_DIR] <= r_animate_y_dir;
// r_reg_out_data[`ANMT_BIT_BOUNT] <= r_animate_bounce;
// r_reg_out_data[`ANMT_BR_COUNT_DIV] <= r_animate_count_div;
// r_reg_out_data[`ANMT_BR_X_STEP] <= r_animate_x_step;
// r_reg_out_data[`ANMT_BR_Y_STEP] <= r_animate_y_step;
// end
REG_VERSION: begin
r_reg_out_data <= w_version;
end
default: begin
r_reg_out_data <= 32'h00;
//r_reg_invalid_addr <= 1;
end
endcase
//Tell the AXI Slave to send back this packet
if (w_reg_address > MAX_ADDR) begin
r_reg_invalid_addr <= 1;
end
r_reg_out_rdy <= 1;
end
Three notable things:
- The control register is making use of bit-level access, the bit values are defined in the defined region above. There is a helper function in the python interface that allows us to set/get a single bit at a time.
- There are two registers REG_XY_REF0 and REG_XY_REF1 that set/get two values at a time, specifically r_ref_x[0-1] and r_ref_y[0-1]. There is a helper function that allows us to set a range of values in a register.
- If we look at the register parameter size there is an address gap between 11 and 20. These values can be ignored but if we did need a small amount of memory, we can use the 'default' case in the read or write section as a block of memory. The access is much slower than something like a streaming interface but it is very helpful for setting memory values of relatively low-speed protocols, for example, a UART or SPI data buffer.
The rest of this core is explained much better in the Project F graphics post but to summarize the core consists of.
- A synchronous process used for generating a start strobe when there is a new frame
- A synchronous process used for generating X/Y video timing signals
- An asynchronous process used for generating the X/Y color values for each pixel
The more relevant portions are how to control the AXI Stream signals:
reg [3:0] r_cm_index;
assign o_axis_out_tlast = (x == (r_width - 1));
assign o_axis_out_tvalid = (state == DRAW);
//Generate timing signals for X and Y signals
always @ (posedge i_axi_clk) begin
r_frame_finished <= 0;
if (w_axi_rst) begin
state <= IDLE;
o_axis_out_tuser <= 0;
r_last <= 0;
x <= 0;
y <= 0;
r_temp_width <= 0;
r_cm_index <= 0;
end
else begin
case (state)
IDLE: begin
o_axis_out_tuser <= 0;
r_last <= 0;
x <= 0;
y <= 0;
if (r_start_stb) begin
o_axis_out_tuser <= 1;
state <= DRAW;
end
end
DRAW: begin
if (i_axis_out_tready) begin
o_axis_out_tuser <= 0;
if (o_axis_out_tlast) begin
state <= END_LINE;
y <= y + 1;
x <= 0;
end
else begin
x <= x + 1;
end
end
end
END_LINE: begin
if (y < r_height) begin
//Go to the next line
state <= DRAW;
end
else begin
//Finished
state <= IDLE;
r_frame_finished <= 1;
end
end
default: begin
state <= IDLE;
end
endcase
...
endmodule
- The r_start_signal, which s controlled by the frame timing process above will initiate a new frame to send and move the state from 'IDLE' to 'DRAW'.
- The tuser signal will go high upon entering 'DRAW' and go low when the receiver acknowledges the reception of data by asserting tready. tuser will not go high again until a new frame is started.
- The tvalid signal is always high in the 'DRAW' state.
- The tlast signal will go high on the last column of a row.
- The 'DRAW' state cannot proceed unless the tready signal is high.
- The 'END_LINE' state will be entered when an entire line is sent and the current y position is checked to see if we should send another line or finish the frame.
I have only written two true unit tests of this core 'color bar generator' and 'draw a square'.
Here is the color bar generator:
@cocotb.test(skip = False)
async def test_colorbars(dut):
"""
Description:
Draw color bars
Test ID: 1
Expected Results:
The frame generated by the core will match the frame generated locally
"""
dut._log.setLevel(logging.WARNING)
WIDTH = 16
HEIGHT = 4
ref_frame = [[0 for x in range(WIDTH)] for y in range(HEIGHT)]
for y in range (HEIGHT):
for x in range (WIDTH):
cb_index = (x // (WIDTH // 8))
color = 0x00000000
if cb_index == 0:
color = COLOR_BLACK
elif cb_index == 1:
color = COLOR_RED
elif cb_index == 2:
color = COLOR_ORANGE
elif cb_index == 3:
color = COLOR_YELLOW
elif cb_index == 4:
color = COLOR_GREEN
elif cb_index == 5:
color = COLOR_BLUE
elif cb_index == 6:
color = COLOR_PURPLE
elif cb_index == 7:
color = COLOR_WHITE
ref_frame[y][x] = color
ref_frame[y][x] |= 0xFF000000
setup_dut(dut)
driver = AXIGraphicsDriver(dut, dut.clk, dut.rst, CLK_PERIOD, name="aximl", debug=False)
axis_sink = AXISSink (dut, "axis_out", dut.clk, dut.rst)
dut.test_id <= 1
await reset_dut(dut)
await axis_sink.reset()
cocotb.fork(capture_frame(dut, HEIGHT, WIDTH, ref_frame, DEBUG))
await driver.set_width(WIDTH)
await driver.set_height(HEIGHT)
await driver.set_mode(5)
await driver.enable(True)
dut._log.info("Done")
await Timer(CLK_PERIOD * 200)
Because the output of this core will be a video image we can write a simple coroutine that will run in parallel with the core. Its job will be as an AXI Stream Video Sink, it will read in video frames from the core and compare the generated video frame with a reference frame generated locally.
async def capture_frame(dut, height, width, ref_frame, display = False):
axis_sink = AXISSink (dut, "axis_out", dut.clk, dut.rst)
for y in range (height):
await(axis_sink.receive())
rdata = axis_sink.read_data()
if display:
for y in range (height):
for x in range(width):
print (" %08X" % rdata[y][x], end='')
print ("")
# Compare the reference image with the received data
for y in range (height):
for x in range(width):
assert (ref_frame[y][x] == rdata[y][x])
The 'capture_frame' coroutine reads in the height and width of the image along with a reference frame of a 2D list of integers as well as a display flag, that if set, will print out the display in hex format. As an example here is what color bars look like for a 16x4 image:
(Note: the top byte indicates the opacity where 'FF' is fully visible. This will become relevant when generated an image)
Here is a box that is drawn from (1, 1) to (3, 2) where the background is black and the foreground is white.
Compare this with trying to decipher things in the viewer, here are the color bars. You can do this but when we are debugging a core it makes things more challenging when we have to go through 16 * 4 values every single time.
The 'axis_out_tdata' is the actual data.
The entire core has a lot of potential, I plan on playing with this in the future.
Generating the IP CoreIn the previous post, we created a core with incoming and outgoing AXI Streams. For this project the process is similar and although we could generate a core manually but in order to get to the novel portion of this post we can just run the script.
Within the 'axi_graphics' directory run the following to generate the IP core.
make xilinx_ip_no_gui
You should now have a new IP core you can pull into your Pynq project.
Incorporate the IP core in a Pynq Z2 ProjectAt a high level there are three parts to the project:
The control block contains all the Zynq core support as well as all of the AXI interconnects and supporting cores. I would have liked to have put the Zynq core inside of the control block but Pynq does not like this so however you organize your project to keep the Zynq core in the main block diagram. The following is an image of the 'control block' or the block where all the interconnects, resets, and interrupts reside.
The 'low-speed block' has all the buttons and LEDs
I did go a little overboard when creating this Pynq image. Instead of attaching the video output directly to an HDMI subsystem I instead opted to use the Video Mixer core (v_mix_0 in the image below) which will compose one or more incoming images into an output image, supports scaling, alpha blending, and a lot more. This comes in handy when you want to change video output.
To operate, the mixer always needs a reference image so I opted for a test pattern generator (v_tpg_0) to continuously send images. The Video mixer will then use this as a reference for the total generated image size.
I chose to use the video mixer core because generating video to be outputted over HDMI can be tricky as many cores must be configured to 'agree' with each other, the cores that must work together are as follows:
- The HDMI clock (clk_wiz_0) must generate the correct clock frequency
- HDMI Out (hdmi_out) must be configured for the appropriate frequency
- Video timing generator (timing_generator) must be configured to generate the appropriate timing signals for the appropriate frequency and image size
- Video Mixer (v_mix_0) must generate the appropriately sized image
If we had decided to attach our AXI Graphics core directly to the HDMI subsystem then we could only choose one size video output but because we are using the video mixer we can change the size of the generated graphics image at any given time and just specify to the video mixer how we want to treat that Graphics core. Near the conclusion, I'll show some images of various sizes graphics core images.
On the Pynq BoardSimilar to what we did previously. We need to upload the 'bit' and 'hwh' files.
Use the web interface again and move into the 'axi-graphics-core/data' directory, and upload the bitstream and hwh file
Projects/axi-graphics-core/data
Press the 'upload' button and find the 'system_wrapper.bit' and 'system.hwh' files
The files can be found in the following locations:
<vivado project base>/<project name>.runs/impl_1/system_wrapper.bit
and
<vivado project base>/<project name>.srcs/sources_1/bd/system/hw_handoff/system.hwh
Remember to change the 'system.hwh' to 'system_wrapper.hwh'
The picture above is unreadable so here is a link to the notebook on GitHub
Here is a screenshot of 640x480 color bars.
You can adjust the size of the image to even match what was on the simulations:
It's there... a 16 x 4 color bar image.
Or go to town with a full-screen image of this:
(Every time I look at this I keep thinking there's a bug, and that the white bar on the right didn't come through... aww man another bug! then I squint and see a slight color difference the white bar is there!)
ConclusionThis project showed how to implement video output using AXI Streams. We showed how simple it is to generate an AXI Stream compatible image within Verilog. We used Cocotb to simulate and verify the video frames generated by the simulated core. After generating an IP core we created a Pynq compatible image that will output video to an HDMI monitor and pulled together a notebook that configured everything and outputted video.
Appendix: Using VDMAsAs I was writing this I wanted to add pictures of the generated video but I didn't want to use a photo of my screen with my messy desk so instead, I used Video DMAs and AXI Stream Interconnect to redirect a video frame from the HDMI output to memory. A Video DMA (VDMA) is a very useful IP core that can do the following.
- Capture an AXI Stream and write it to memory as a video frame
- Read a video frame from memory and send it out over an AXI Stream
A single VDMA can also be configured to both capture and send video frames at different rates. As an example, if you have a camera that outputs frames at a rate of 40 FPS and an HDMI output that can read frames at 60FPS a single VDMA can attach to both of these and manage different frame rates without frame tearing.
VDMAs are very useful but I found that there are not a lot of resources online regarding their usage so I created a modified version of this project to read and write video frames to/from the FPGA. Here is the change to the video submodule, the main difference is the addition of two VDMAs, two subset converters and an AXI Stream interconnect.
Here is a close-up of the 'Memory to Stream' VDMA that the user could write frames from the host (a Jupyter notebook) for the VDMA to read, I've labeled the interconnects that reads the frame from the memory and writes the frame out through a stream.
Here is a close-up of the 'Stream to Memory' VDMA, I've labeled the portions of the VDMA which read in data from the stream and write data to memory.
I also need to use an AXI Stream interconnect to channel the video either to the HDMI output or to the VDMA using an interconnect, selecting 'Master 0' will route the frame to the HDMI, selecting 'Master 1' will route the frame to the VDMA.
The documentation for this can be confusing so I wrote a small driver that allows the user to select the route using, what I think, is a more intuitive approach. the first number is the 'from' and the second is the 'to' location. So routing from the only slave 'Slave 0' or the video mixer to the HDMI is done by setting the route to 0, 0.
if ENABLE_VDMA_READ:
ai.set_route(0, 1) # From VMIX To VDMA
else:
ai.set_route(0, 0) # From VMIX To output screen
To exercise this, I've generated a solid red image in the script, wrote it to memory, and told the 'Memory to Stream' VDMA to write it, I've also configured the Video Mixer to put the image from the VDMA output to be placed right after the color bars.
Attached to this writeup is the 'pynq_graphics_vdma.zip' which will contain the project as well.
After building it, setting up a new Jupyter project I wrote a small AXI Stream Router and captured the above images. Some surprising things I noticed. The output of the video mixer works perfectly with the HDMI output but when reading the frames from the VDMA I had to unscramble them with the AXI Subset converter (EDIT: Also had to 'scramble' the incoming AXI signals to output the colors I expected).
If the image is not readable, the remapped data is as follows:
tdata[15:8],tdata[7:0],tdata[23:16]
My color channels were scrambled so that the red color was in the middle bits, the blue color was in the highest bits and the green color was in the lowest bit, essentially making BRG color pattern, these values needed to be unscrambled to be read appropriately by the VDMA.
Here is the Pynq notebook link, although not shown in the GitHub link you can capture an image by pressing the capture button
I hope this can ease the use of VDMAsCreating Custom AXI Cores Links
Comments