Published December 10, 2021 © GPL3+

Transfer Data between HDL and Embedded C in FPGA using DMA

This project walks through a basic structure of how to transfer data between HDL in the PL to embedded C running on a processor in an FPGA.

IntermediateFull instructions provided1 hour6,233

Transfer Data between HDL and Embedded C in FPGA using DMA

Things used in this project

Software apps and online services

AMD Vivado Design Suite

AMD Vitis Unified Software Platform

Story

Given the rise of hardware acceleration in FPGA designs for applications such as machine learning and artificial intelligence, I thought it would be a good time to peel back a few layers of the onion and talk about the basics of passing data back and forth between the HDL code running in the programmable logic (PL) of an FPGA and the corresponding software running on a physical or soft processor in the FPGA.

Hardware acceleration can be boiled down to the basic idea of implementing certain functions in hardware (aka the programmable logic of an FPGA) that were previously being run in software that is either located on a host PC or running on a processor instantiated within the FPGA. Therefore, a good command of the knowledge of how to pass data back and forth between hardware and software is required to be an efficient designer.

In this particular case, I'm using a Zynq SoC (System on Chip) FPGA that has a physical ARM-core processor instantiated within the programmable logic. This ARM-core and supporting hardware is referred to as the processing system or PS.

While there are a few different ways to accomplish a data transfer between the PL and PS, including writing your own custom interface, I would argue that the most common mechanism is via a direct memory access (DMA) transfer. This is because DMA allows for the CPU of the ARM-core to simply initiate a data transfer between itself and DDR without the CPU having to wait for the transfer to complete before performing any other tasks. DMA also allows for the CPU to initiate a transfer between an external device and the DDR.

In this project, I'm demonstrating the functionality of DMA in the most basic way possible by using the Xilinx DMA IP block that converts memory map interface to a streaming interface via AXIS buses. I'm writing a series of 32 bytes to memory in embedded C, then transferring it to the PL via the memory map to stream (MM2S) AXIS, running each value through a register, then streaming the data back to memory via the stream to memory map (S2MM) port of the DMA IP block.

While this example is far too simple for heavy-duty hardware acceleration applications, that level of high speed data transfer can get very complex/overwhelming to learn when new to FPGAs. I felt like this project is a much needed introduction to the use of DMA and its behavior. And while I intended on this project focusing more on the aspect of data processing, I found enough little "gotchas" in the DMA transaction implementation that I'll have to leave the data processing focus to another project.

High level flow diagram of this project.

There are two main layers to controlling the data transfer between the HDL in PL and C code in the PS using AXI DMA:

1. The AXI stream handshaking signals in the HDL code of the PL on the Memory Map to Stream (MM2S) and Stream to Memory Map (S2MM) channels (The control channels of the DMA are written to using plain AXI, but this is all handled automatically by Vivado so I'm only focusing on the AXI stream interface here).

2. The sequence of register reads/writes to the DMA in the C code of the PS.

AXI Stream Handshaking in Verilog

The AXI stream interface is a straightforward set of handshaking signals used for data exchange in embedded designs. There are many optional signals in the AXI stream interface but the relevant and required ones for DMA MM2S and S2MM data exchanges are tdata, tvalid, tready, tlast, and tkeep. AXI stream refers to the entity sending data as the master and the entity receiving data as the slave in the interface.

tdata: the data bus
tvalid: asserted by the master interface when the data it has placed on the tdata bus is valid
tready: asserted by the slave when it is in a state ready to receive data on the tdata bus
tlast: asserted by the master for the duration of the last packet in the stream on the tdata bus to tell the slave no data will be following that packet
tkeep: a secondary validation of the packets on the tdata bus set by the master indicating whether a packet is a part of the stream or not

Exactly how the AXI DMA IP implements this handshaking interface for data transfers out of memory (MM2S) and into memory (S2MM) is quite fickle especially on the S2MM side... and that's the nicest term that doesn't contain any curse words in my mind at the moment.

What you need to know about the S2MM transaction of the AXI DMA can however be mostly summed up into a single sentence that should be in explicitly stated its user guide but instead is kinda hard to extrapolate from it: the S2MM transaction must be set up and kicked off by writing to the appropriate control registers in the DMA in the appropriate order before you attempt to send any data to it and the S2MM channel will stop the transaction once it sees the tlast signal.

Data transfers happen on the tdata bus in S2MM and MM2S transactions each clock cycle where both tready and tvalid are asserted (true). So you have to be careful on the master side of the AXI interface when you're in charge of asserting tvalid that you don't leave tvalid asserted for more than one clock cycle when the incoming tready signal from the slave is also asserted for the same data word on the tdata bus. Otherwise the slave will clock in the same data packet twice as two separate data packets. And because you have to specify how many bytes are in a transfer in the control registers, the DMA channel (S2MM in this case) will think the exchange is over before it sees the tlast signal is provided since the count is now off and it will hang.

I wrote a simple state machine in Verilog that implements a slave AXI stream interface to receive data from the MM2S channel of the DMA, pass each data packet in the stream through a register, then implements a master AXI stream interface to send the data stream back to the S2MM channel. The register the data from the tdata bus is being passed through is meant to serve as the placeholder for where any custom data processing would happen for hardware acceleration.

I took a screenshot from the ILAs in Vivado about to show the timing diagram I implemented with the state machine. The top AXI stream is the MM2S side and the bottom is the S2MM side.

Here's the flow diagram of the Verilog state machine and the actual file is attached to the project at the end. It's important to note that the master/slave interfaces in the flow diagram are from the perspective of my Verilog state machine. This is something that I can get myself turned around on easily.

I simply created a block design in a new Vivado project targeting one of my Zynq-based FPGA development boards, and added an AXI DMA IP block after adding the Zynq PS IP and running block automation to apply the board presets for my development board. The Zynq PS also needs one of the high performance AXI ports enabled under PS-PL Configuration > HP Slave AXI Interface > S AXI HP0 interface for the DMA to be able to access the DDR.

Block automation and connection automation take care of all of the connections between the Zynq IP and DMA IP.

For the specific settings of the DMA IP, I unchecked the option for scatter gather so I'm using the DMA in direct register mode. Then I left everything else as the default settings and checked the option to allow unaligned transfers which I've found gives me a bit more wiggle room when writing a custom AXI stream interface to the DMA.

DMA IP block settings

To add my Verilog state machine to the block design, I right-clicked in a blank area of the block design and selected the Add Module... option, which will show you all of the valid Verilog modules Vivado can find in your design source files to use in the block design.

It's import to note that my signal naming convention follows the standard of "s_axis" and "m_axis" for the slave and master interfaces respectively. This is import for the block design to be able to detect that it is indeed an AXI stream interface and allow me to connect it to the AXI stream ports on the DMA IP block.

DMA Register Read/Write Control Sequence

I've covered the basic operation of DMA in the past from the perspective of using it from the Linux user-space, but this time I'm using it at a lower level directly from HDL in the programmable logic and bare metal embedded C. So here's the more straightforward sequence when using DMA from a bare metal user-space:

1. Reset the DMA by writing a 1 to bit 2 of the MM2S (offset 0x00) and S2MM (offset 0x30) control registers.

2. Write the destination address of the location in the DDR the S2MM channel is to write data to, to the S2MM DMA destination address register (offset 0x48).

3. Start the DMA S2MM channel by writing a 1 to bit 0 of the S2MM control register (offset 0x30).

4. Write the length of the buffer for the S2MM channel by writing the value for the total number of bytes to read into memory on the S2MM channel to the S2MM buffer length register (offset 0x58). This kicks off the S2MM transfer such that the DMA is prepared to receive a data stream from a device in the FPGA logic (which doesn't actually start until it is actually fed data and tvalid on the AXI stream bus is asserted by the device in logic).

5. Write the source address in the DDR of the data the MM2S channel is to read from to the MM2S DMA source address register (offset 0x18).

6. Start the DMA MM2S channel by writing a 1 to bit 0 of the MM2S control register (offset 0x00).

7. Write the length of the transfer from the MM2S channel by writing the value for the total number of bytes to send out to the MM2S transfer length register (offset 0x28). This kicks off the MM2S transfer from the DMA to the receiving device in the FPGA logic.

Remember what I mentioned before about the S2MM channel having to be kicked off and running before the device in PL tries to send data to it? Well that's why I have the steps above in the sequence that I do. Steps 2 - 4 configure and kick off the S2MM channel while steps 5 - 7 configure and kick off the MM2S channel.

It's okay to have some other processes happen between steps 4 and 5, but steps 2 - 4 MUST occur before steps 5 - 7. Once step 4 has been executed, the S2MM AXI stream channel will assert its tready signal, at which point your HDL code can start sending it data.

I created separate source files for controlling the DMA from the bare metal userspace attached below so they can be easily imported and reused in any Vitis application project, then the main file implements this demo project. You can find the sequence of steps above implemented in dma_controller.c

And this also explains something I noticed in the example DMA projects in SDK/Vitis when I first started using DMA. I always though it was backwards that the example code appeared to attempt to pull data into the DDR (by performing the S2MM - XAXIDMA_DEVICE_TO_DMA transfer first) before anything had been written out from it with the MM2S - XAXIDMA_DMA_TO_DEVICE transfer. However, the S2MM channel has to be ready and waiting to receive data in order to work properly and not lock up.

Anyways, hopefully this brain dump of DMA transfers is helpful. DMA seems to be a tricky method to get started with in FPGA design, but it's super helpful once you figure it out.

module data_processor(
    input clk,
    input reset,
    input [31:0] s_axis_tdata,
    input [3:0] s_axis_tkeep,
    input s_axis_tlast,
    output reg s_axis_tready,
    input s_axis_tvalid,
    output reg [31:0] m_axis_tdata,
    output reg [3:0] m_axis_tkeep,
    output reg m_axis_tlast,
    input m_axis_tready,
    output reg m_axis_tvalid, 
    output [2:0] state_reg
    );
    
    reg [2:0] state_reg;
    reg [31:0] tdata;
    reg tlast;
    
    parameter init               = 3'd0;
    parameter SetSlaveTready     = 3'd1;
    parameter CheckSlaveTvalid   = 3'd2;
    parameter ProcessTdata       = 3'd3;
    parameter CheckTlast         = 3'd4;
    
    always @ (posedge clk)
        begin
			// Default outputs            
			m_axis_tvalid <= 1'b0;
            
            if (reset == 1'b0)
                begin
                    tlast <= 1'b0;
                    tdata[31:0] <= 32'd0;
                    s_axis_tready <= 1'b0;
                    m_axis_tdata[31:0] <= 32'd0;
                    m_axis_tkeep <= 4'h0;
                    m_axis_tlast <= 1'b0;
                    state_reg <= init;
                end
            else
                begin
                
                    case(state_reg) 
                        init : // 0 
                            begin
                                tlast <= 1'b0;
                                tdata[31:0] <= 32'd0;
                                s_axis_tready <= 1'b0;
                                m_axis_tdata[31:0] <= 32'd0;
                                m_axis_tkeep <= 4'h0;
                                m_axis_tlast <= 1'b0;
                                state_reg <= SetSlaveTready;
                            end 
                            
                        SetSlaveTready : // 1
                            begin
                                s_axis_tready <= 1'b1;
                                state_reg <= CheckSlaveTvalid;
                            end 
                            
                        CheckSlaveTvalid : // 2
                            begin
                                if (s_axis_tkeep == 4'hf && s_axis_tvalid == 1'b1)
                                    begin
                                        s_axis_tready <= 1'b0;
                                        tlast <= s_axis_tlast;
                                        tdata[31:0] <= s_axis_tdata[31:0];
                                        state_reg <= ProcessTdata;
                                    end
                                else
                                    begin 
                                        tdata[31:0] <= 32'd0;
                                        state_reg <= CheckSlaveTvalid;
                                    end 
                            end
                            
                        ProcessTdata : // 3
                            begin 
                                m_axis_tkeep <= 4'hf;
                                m_axis_tlast <= tlast;
                                m_axis_tvalid <= 1'b1;
                                m_axis_tdata[31:0] <= tdata[31:0];
                                
                                if (m_axis_tready == 1'b1)
                                    begin 
                                        state_reg <= CheckTlast;
                                    end 
                                else
                                    begin 
                                        state_reg <= ProcessTdata;
                                    end 
                            end
                            
                        CheckTlast : // 4
                            begin 
                                if (m_axis_tlast == 1'b1)
                                    begin				
                                        state_reg <= init;
                                    end
                                else if (m_axis_tready == 1'b1)
                                    begin
                                        state_reg <= SetSlaveTready;
                                    end
                                else 
                                    begin 
                                        state_reg <= CheckTlast;
                                    end 
                            end 
                            
                    endcase 
                end
        end
    
endmodule

#include "xaxidma.h"
#include "xil_printf.h"
#include "dma_controller.h"

int RingIndex;

u32 XAxiDma_MM2Stransfer(XAxiDma *InstancePtr, UINTPTR BuffAddr, u32 Length){

	u32 WordBits;

	// Check scatter gather is not enabled
	if (XAxiDma_HasSg(InstancePtr)){
		xil_printf("Scatter gather is not supported\r\n");

		return XST_FAILURE;
	}

	if ((Length < 1) || (Length > InstancePtr->TxBdRing.MaxTransferLen)){
		xil_printf("Invalid transfer length.\r\n");
		return XST_FAILURE;
	}

	if (!InstancePtr->HasMm2S) {
		xil_printf("MM2S channel is not supported.\r\n");

		return XST_FAILURE;
	}

	// If the engine is doing transfer, cannot submit
	if (!(XAxiDma_ReadReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_SR_OFFSET) & XAXIDMA_HALTED_MASK)){
		if (XAxiDma_Busy(InstancePtr,XAXIDMA_DMA_TO_DEVICE)){
			xil_printf("MM2S engine is busy\r\n");
			return XST_FAILURE;
		}
	}

	if (!InstancePtr->MicroDmaMode){
		WordBits = (u32)((InstancePtr->TxBdRing.DataWidth) - 1);
	} else {
		WordBits = XAXIDMA_MICROMODE_MIN_BUF_ALIGN;
	}

	if ((BuffAddr & WordBits)){
		if (!InstancePtr->TxBdRing.HasDRE){
			xil_printf("Unaligned transfer without DRE %x\r\n",(unsigned int)BuffAddr);
			return XST_FAILURE;
		}
	}

	XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_SRCADDR_OFFSET, LOWER_32_BITS(BuffAddr));

	if (InstancePtr->AddrWidth > 32){
		XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_SRCADDR_MSB_OFFSET, UPPER_32_BITS(BuffAddr));
	}

	XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_CR_OFFSET, XAxiDma_ReadReg(InstancePtr->TxBdRing.ChanBase,XAXIDMA_CR_OFFSET)| XAXIDMA_CR_RUNSTOP_MASK);

	// Writing length in bytes to the buffer transfer length register starts the transfer
	XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_BUFFLEN_OFFSET, Length);

	return XST_SUCCESS;
}

u32 XAxiDma_MM2StransferCnfg(XAxiDma *InstancePtr, UINTPTR BuffAddr){

	u32 WordBits;

	// Check scatter gather is not enabled
	if (XAxiDma_HasSg(InstancePtr)){
		xil_printf("Scatter gather is not supported\r\n");

		return XST_FAILURE;
	}

	// If the engine is doing transfer, cannot submit
	if (!(XAxiDma_ReadReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_SR_OFFSET) & XAXIDMA_HALTED_MASK)){
		if (XAxiDma_Busy(InstancePtr,XAXIDMA_DMA_TO_DEVICE)){
			xil_printf("MM2S engine is busy\r\n");
			return XST_FAILURE;
		}
	}

	if (!InstancePtr->MicroDmaMode){
		WordBits = (u32)((InstancePtr->TxBdRing.DataWidth) - 1);
	} else {
		WordBits = XAXIDMA_MICROMODE_MIN_BUF_ALIGN;
	}

	if ((BuffAddr & WordBits)){
		if (!InstancePtr->TxBdRing.HasDRE){
			xil_printf("Unaligned transfer without DRE %x\r\n",(unsigned int)BuffAddr);
			return XST_FAILURE;
		}
	}

	XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_SRCADDR_OFFSET, LOWER_32_BITS(BuffAddr));

	if (InstancePtr->AddrWidth > 32){
		XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_SRCADDR_MSB_OFFSET, UPPER_32_BITS(BuffAddr));
	}

	XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_CR_OFFSET, XAxiDma_ReadReg(InstancePtr->TxBdRing.ChanBase,XAXIDMA_CR_OFFSET)| XAXIDMA_CR_RUNSTOP_MASK);

	return XST_SUCCESS;
}

void XAxiDma_MM2StransferRun(XAxiDma *InstancePtr, u32 Length){

	// Writing length in bytes to the buffer transfer length register starts the transfer
	XAxiDma_WriteReg(InstancePtr->TxBdRing.ChanBase, XAXIDMA_BUFFLEN_OFFSET, Length);

	while(XAxiDma_Busy(InstancePtr,XAXIDMA_DMA_TO_DEVICE)){
		// wait
	}

}

u32 XAxiDma_S2MMtransfer(XAxiDma *InstancePtr, UINTPTR BuffAddr, u32 Length){

	u32 WordBits;
	RingIndex = 0;

	// Check scatter gather is not enabled
	if (XAxiDma_HasSg(InstancePtr)){
		xil_printf("Scatter gather is not supported\r\n");

		return XST_FAILURE;
	}

	if ((Length < 1) || (Length > InstancePtr->RxBdRing[RingIndex].MaxTransferLen)){
		xil_printf("Invalid transfer length.\r\n");
		return XST_FAILURE;
	}

	if (!InstancePtr->HasS2Mm){
		xil_printf("S2MM channel is not supported\r\n");

		return XST_FAILURE;
	}

	// If the engine is doing transfer, cannot submit
	if (!(XAxiDma_ReadReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_SR_OFFSET) & XAXIDMA_HALTED_MASK)){
		if (XAxiDma_Busy(InstancePtr,XAXIDMA_DEVICE_TO_DMA)){
			xil_printf("S2MM engine is busy\r\n");
			return XST_FAILURE;
		}
	}

	if (!InstancePtr->MicroDmaMode){
		WordBits = (u32)((InstancePtr->RxBdRing[RingIndex].DataWidth) - 1);
	} else {
		WordBits = XAXIDMA_MICROMODE_MIN_BUF_ALIGN;
	}

	if ((BuffAddr & WordBits)){
		if (!InstancePtr->RxBdRing[RingIndex].HasDRE){
			xil_printf("Unaligned transfer without DRE %x\r\n", (unsigned int)BuffAddr);
			return XST_FAILURE;
		}
	}

	XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_DESTADDR_OFFSET, LOWER_32_BITS(BuffAddr));

	if (InstancePtr->AddrWidth > 32){
		XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_DESTADDR_MSB_OFFSET, UPPER_32_BITS(BuffAddr));
	}

	XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_CR_OFFSET, XAxiDma_ReadReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_CR_OFFSET)| XAXIDMA_CR_RUNSTOP_MASK);

	// Writing length in bytes to the buffer transfer length register starts the transfer
	XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_BUFFLEN_OFFSET, Length);

	return XST_SUCCESS;
}

u32 XAxiDma_S2MMtransferCnfg(XAxiDma *InstancePtr, UINTPTR BuffAddr){

	u32 WordBits;
	RingIndex = 0;

	// Check scatter gather is not enabled
	if (XAxiDma_HasSg(InstancePtr)){
		xil_printf("Scatter gather is not supported\r\n");

		return XST_FAILURE;
	}

	// If the engine is doing transfer, cannot submit
	if (!(XAxiDma_ReadReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_SR_OFFSET) & XAXIDMA_HALTED_MASK)){
		if (XAxiDma_Busy(InstancePtr,XAXIDMA_DEVICE_TO_DMA)){
			xil_printf("S2MM engine is busy\r\n");
			return XST_FAILURE;
		}
	}

	if (!InstancePtr->MicroDmaMode){
		WordBits = (u32)((InstancePtr->RxBdRing[RingIndex].DataWidth) - 1);
	} else {
		WordBits = XAXIDMA_MICROMODE_MIN_BUF_ALIGN;
	}

	if ((BuffAddr & WordBits)){
		if (!InstancePtr->RxBdRing[RingIndex].HasDRE){
			xil_printf("Unaligned transfer without DRE %x\r\n", (unsigned int)BuffAddr);
			return XST_FAILURE;
		}
	}

	XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_DESTADDR_OFFSET, LOWER_32_BITS(BuffAddr));

	if (InstancePtr->AddrWidth > 32){
		XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_DESTADDR_MSB_OFFSET, UPPER_32_BITS(BuffAddr));
	}

	XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_CR_OFFSET, XAxiDma_ReadReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_CR_OFFSET)| XAXIDMA_CR_RUNSTOP_MASK);

	return XST_SUCCESS;
}

void XAxiDma_S2MMtransferRun(XAxiDma *InstancePtr, u32 Length){

	// Writing length in bytes to the buffer transfer length register starts the transfer
	XAxiDma_WriteReg(InstancePtr->RxBdRing[RingIndex].ChanBase, XAXIDMA_BUFFLEN_OFFSET, Length);

	while(XAxiDma_Busy(InstancePtr,XAXIDMA_DEVICE_TO_DMA)){
		// wait
	}
}

#ifndef SRC_DMA_CONTROLLER_H_
#define SRC_DMA_CONTROLLER_H_

#define DMA_DEV_ID		    XPAR_AXIDMA_0_DEVICE_ID
#define DMA_BASEADDR        XPAR_AXIDMA_0_BASEADDR
#define DDR_BASE_ADDR		XPAR_PS7_DDR_0_S_AXI_BASEADDR
#define MEM_BASE_ADDR		(DDR_BASE_ADDR + 0x1000000)
#define TX_BUFFER_BASE		(MEM_BASE_ADDR + 0x00100000)
#define RX_BUFFER_BASE		(MEM_BASE_ADDR + 0x00300000)
#define RX_BUFFER_HIGH		(MEM_BASE_ADDR + 0x004FFFFF)

#define MAX_PKT_LEN		    0x20 //(32 bytes - 8 DMA R/W cycles)
//#define MAX_PKT_LEN		    0x08 //(8 bytes - 2 DMA R/W cycle)
#define MIN_PKT_LEN		    0x01 //(1 byte)
#define NUM_TRANSFERS	    1

u32 XAxiDma_MM2Stransfer(XAxiDma *InstancePtr, UINTPTR BuffAddr, u32 Length);
u32 XAxiDma_MM2StransferCnfg(XAxiDma *InstancePtr, UINTPTR BuffAddr);
void XAxiDma_MM2StransferRun(XAxiDma *InstancePtr, u32 Length);
u32 XAxiDma_S2MMtransfer(XAxiDma *InstancePtr, UINTPTR BuffAddr, u32 Length);
u32 XAxiDma_S2MMtransferCnfg(XAxiDma *InstancePtr, UINTPTR BuffAddr);
void XAxiDma_S2MMtransferRun(XAxiDma *InstancePtr, u32 Length);

#endif /* SRC_DMA_CONTROLLER_H_ */

#include <stdio.h>
#include "xaxidma.h"
#include "platform.h"
#include "xil_printf.h"
#include "xparameters.h"
#include "dma_controller.h"

XAxiDma AxiDma; //DMA device instance definition

int main(){
    init_platform();

    print("Hello World\n\r");

    XAxiDma_Config *CfgPtr; //DMA configuration pointer

	int Status, Index;
	u8 *TxBufferPtr;
	u8 *RxBufferPtr;
	u8 Value;

	TxBufferPtr = (u8 *)TX_BUFFER_BASE;
	RxBufferPtr = (u8 *)RX_BUFFER_BASE;

	// Initialize memory to all zeros
	for(Index = 0; Index < MAX_PKT_LEN; Index ++){
		TxBufferPtr[Index] = 0x00;
		RxBufferPtr[Index] = 0x00;
	}

	// Initialize the XAxiDma device
	CfgPtr = XAxiDma_LookupConfig(DMA_DEV_ID);
	if (!CfgPtr) {
		xil_printf("No config found for %d\r\n", DMA_DEV_ID);
		return XST_FAILURE;
	}

	Status = XAxiDma_CfgInitialize(&AxiDma, CfgPtr);
	if (Status != XST_SUCCESS) {
		xil_printf("Initialization failed %d\r\n", Status);
		return XST_FAILURE;
	}

	if(XAxiDma_HasSg(&AxiDma)){
		xil_printf("Device configured as SG mode \r\n");
		return XST_FAILURE;
	}

	XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK, XAXIDMA_DEVICE_TO_DMA);
	XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK, XAXIDMA_DMA_TO_DEVICE);

	Value = 0x00;

	for(Index = 0; Index < MAX_PKT_LEN; Index ++){
		TxBufferPtr[Index] = Value;

		Value = (Value + 1) & 0xFF;
	}

	Xil_DCacheFlushRange((UINTPTR)TxBufferPtr, MAX_PKT_LEN);
	Xil_DCacheFlushRange((UINTPTR)RxBufferPtr, MAX_PKT_LEN);

	XAxiDma_Reset(&AxiDma);

	// Setup & kick off S2MM channel first	
	Status = XAxiDma_S2MMtransfer(&AxiDma,(UINTPTR) RxBufferPtr, MAX_PKT_LEN);

	if (Status != XST_SUCCESS){
		xil_printf("XAXIDMA_DEVICE_TO_DMA transfer failed...\r\n");
		return XST_FAILURE;
	}

	Status = XAxiDma_MM2Stransfer(&AxiDma,(UINTPTR) TxBufferPtr, MAX_PKT_LEN);

	if (Status != XST_SUCCESS){
		xil_printf("XAXIDMA_DMA_TO_DEVICE transfer failed...\r\n");
		return XST_FAILURE;
	}

	while(XAxiDma_Busy(&AxiDma,XAXIDMA_DEVICE_TO_DMA) || XAxiDma_Busy(&AxiDma,XAXIDMA_DMA_TO_DEVICE)){
		if (XAxiDma_Busy(&AxiDma,XAXIDMA_DEVICE_TO_DMA) == TRUE){
			xil_printf("S2MM channel is busy...\r\n");
		}

		if (XAxiDma_Busy(&AxiDma,XAXIDMA_DMA_TO_DEVICE)){
			xil_printf("MM2S channel is busy...\r\n");
		}
	}

	for(Index = 0; Index < MAX_PKT_LEN; Index++) {
		xil_printf("Received data packet %d: %x/%x\r\n", Index, (unsigned int)RxBufferPtr[Index], (unsigned int)TxBufferPtr[Index]);
	}

	XAxiDma_Reset(&AxiDma);

    cleanup_platform();
    return 0;
}

Credits

Whitney Knitter

172 projects • 1815 followers

All thoughts/opinions are my own and do not reflect those of any company/entity I currently/previously associate with.

Contact

Comments

Please log in or sign up to comment.

Transfer Data between HDL and Embedded C in FPGA using DMA

Things used in this project

Software apps and online services

Story

AXI Stream Handshaking in Verilog

DMA Register Read/Write Control Sequence

Code

data_processor.v

dma_controller.c

dma_controller.h

main.c

Credits

Whitney Knitter

Comments

Embed the widget on your own site

Transfer Data between HDL and Embedded C in FPGA using DMA

Transfer Data between HDL and Embedded C in FPGA using DMA

Things used in this project

Software apps and online services

Story

AXI Stream Handshaking in Verilog

DMA Register Read/Write Control Sequence

Code

data_processor.v

dma_controller.c

dma_controller.h

main.c

Credits

Whitney Knitter

Comments

Related channels and tags