Generating a sine wave using an FPGA has a vast amount of solutions. And just like in any other situation where there are multiple ways of accomplishing the same outcome, some solutions serve different situations better. I've covered how to generate a sine wave in the past purely in HDL, so I wanted to cover it again but this time using a bare-metal application to control a DDS Compiler IP instantiated in the block design of the Vivado project. For a more in depth explanation of the DDS Compiler IP in Vivado, check out my last project using it here.
For this project I'm using Vivado and Vitis version 2019.2 with the Zynqberry (yes, I haven't had the heart to update to 2020.1 now that I feel as though I've gotten somewhat of a mastery of 2019.2). The bare-metal application will write phase increment/offset data to the DDR connected to the Zynq ARM-core processor and then stream it to the to the DDS Compiler by reading it out of the DDR via AXI DMA (direct memory access).
Starting off, create a new project in Vivado and select the Zynqberry as the target development board for the project:
Select the 'Create Block Design' option from the Project Navigator. When the block design window opens, click the '+' button to bring up the IP repository. Search for 'zynq' then double-click the 'ZYNQ7 Processing System' IP to add it to the block design:
A green banner will appear at the top of the block design window with the option to run block automation.
When you click the option to run block automation, you'll notice this will apply the Zynqberry board presets to the Zynq IP block in the block design.
After block automation is complete, double-click on the Zynq IP block to open its configuration window and navigate to the 'PS-PL Configuration' tab. Under the HP Slave AXI Interface menu, enable the S AXI HP0 interface:
Close the Zynq's configuration window then use the keyboard shortcut ctrl+I to bring up the IP repository again.
Add an instance of the DDS Compiler to the block design. After it appears, double-click on it to open is configuration window.
Leave the Configuration tab set to all of the defaults, then change the Implementation tab to match the following:
Setting the phase offset and phase increment values to streaming allows for the output cosine waveform from the DDS Compiler to be directly calculated from the phase input value every cycle. This means the output frequency and phase offset of the output waveform can be changed often and smoothly. This is ideal in situations where frequency or phase modulation are needed. Since I'm mimicking my last project in that I am wanting to output a chirp waveform (which is a form of frequency modulation), streaming is the necessary setting here.
Under the Detailed Implementation tab, change the AXI Stream interface to allow for packet framing and output of the tready signal. The tuser side also needs User Field set for the input and its width set to 8. This is to match the AXI Stream settings of the AXI DMA block and allows for the proper handshaking between the DMA and DDS cores.
Next, add an AXI Direct Memory Access IP block to the block design and open its configuration window. For the purposes of this demo, we only need to read from memory to output that data to the input of the DDS Compiler for the phase increment and offset.
The stream data width also needs to be changed to 64 bits to match the input data width of the DDS Compiler and the option to allow unaligned transfers needs to be enabled.
The AXI DMA block can be a bit daunting the first time you use it, as a quick reference just remember the following:
- The read channel reads a value from a memory location (DDR or BRAM) and transmits it to a device via a memory map to stream data transfer.
- The write channel writes a value to a memory location (DDR or BRAM) after receiving it from a device via a stream to memory map data transfer.
To monitor the output of the DDS Compiler to verify its input data is being read from memory correctly, add an ILA to the block design. Open its configuration window and change the monitor type to 'Native', increase the number of probes to 4, then increase the sample data depth.
In the second tab, set one of the probe data widths to 8 bits and another one to 32 bits. Leave the other two probes' data widths set to one bit.
After closing the ILA's configuration window, run all of the options for connection automation that will have appeared in the green banner at the top of the block design window.
Add an AXI Stream Data FIFO to the block design and configuration its depth to 64:
Once complete, manually connect the M_AXIS_MM2S output from the DMA to the S_AXIS input of the AXI Stream FIFO and connect the S_AXIS_PHASE input of the DDS Compiler to the M_AXIS output of the AXI Stream FIFO.
For debugging purposes, right-click on the two lines you just manually connected and select the option to 'Debug'. A System ILA will appear with the option for Connection Automation. Run the option for Connection Automation and beside to check the option for the AXIS protocol checker.
Then manually connect the individual output pins of the DDS Compiler to the ILA probes (be sure to match the data widths appropriately).
Finally, add a Constant IP block to the design and configure it for a single bit width, logic level 1 output:
Manually connect this to the s_axis_phase_tvalid of the DDS Compiler's phase input channel. I will explain this later, but yes, it is a cheat....
The final block design should look something like this:
Run validation on the block design by pressing F6 to make sure the design has no critical warnings or errors.
Save the block design then right-click on the block design file from the Sources window and select the option to create an HDL wrapper:
Select the option to let Vivado manage the wrapper and auto-update it.
Run synthesis then implementation from the Project Manager window. After implementation, generate a bitstream from the design. Once the bitstream is generated, the design is ready to be exported for use in Vitis to develop the bare-metal C application to control the DDS Compiler in the design. From the File menu, select 'Export' and 'Export Hardware...'
Be sure to check the box to include the bitstream in the exported design:
From the Tools menu, select the option to launch Vitis:
When prompted, select the desired location for the workspace for the Vitis project.
Unlike it's predecessor, XSDK, Vitis does not automatically generate a hardware platform from the exported hardware from Vivado. So when Vitis opens, it will open into a completely blank workspace. From the project section, select the option to Create Platform Project:
Specify a project name for the platform project and click Next.
Select the option to create the hardware platform from the hardware design exported from Vivado:
Specify the directory path to the hardware design exported from Vivado then select the OS type and target processor to run on.
This project is a bare-metal application so the OS is standalone and the target processor is the ARM core 0 in the Zynq.
Build the new platform project (you can also use the shortcut ctrl+B):
With the base hardware platform in place, create a new application project for the actual software of the application:
Name the application project and leave the system project set to Create New:
Select the hardware platform from the platform project created in the previous steps:
Ensure the Domain is targeting the ARM core 0 of the Zynq and choose your desired programming language for the application:
I like to use the Hello World template for new applications versus the Empty Application template just because it auto-generates a few extras framework components for me:
After clicking Finish, let the application project generate then build the entire Vitis project.
The main function for the Hello World lives in helloworld.c under the /src directory of the application project. Open this file and add the following code:
//Code to add to main function of application
#include <stdio.h>
#include <stdlib.h>
#include "xaxidma.h"
#include "platform.h"
#include "xil_printf.h"
#include "xparameters.h"
#define MEM_BASE_ADDR (XPAR_PS7_DDR_0_S_AXI_BASEADDR + 0x1000000)
#define TX_BUFFER_BASE (MEM_BASE_ADDR + 0x00100000)
#define MAX_PKT_LEN 0x20
XAxiDma AxiDma;
int main()
{
init_platform();
xil_printf("Hello DMA DDS.....\r\n");
// initialize DMA
XAxiDma_Config *CfgPtr;
int Status = 0;
u8 *TxBufferPtr;
TxBufferPtr = (u8 *)TX_BUFFER_BASE;
// Look up hardware configuration for device
CfgPtr = XAxiDma_LookupConfig(XPAR_AXI_DMA_0_DEVICE_ID);
if (!CfgPtr){
xil_printf("ERROR! No hardware configuration found for AXI DMA with device id %d.\r\n", XPAR_AXI_DMA_0_DEVICE_ID);
return XST_FAILURE;
}
// Initialize driver
Status = XAxiDma_CfgInitialize(&AxiDma, CfgPtr);
if (Status != XST_SUCCESS){
xil_printf("ERROR! AXI DMA init failed: %d\r\n", Status);
return XST_FAILURE;
}
// Test for Scatter Gather - it should not have it
if (XAxiDma_HasSg(&AxiDma)){
xil_printf("ERROR! Device configured as SG mode.\r\n");
return XST_FAILURE;
}
// Make sure that the MM2S (read from memory) channel is enabled
if (!AxiDma.HasMm2S){
xil_printf("MM2S channel is not supported\r\n");
return XST_FAILURE;
}
// Check to see if DMA is busy reading from that memory
if(!(XAxiDma_ReadReg(AxiDma.TxBdRing.ChanBase, XAXIDMA_SR_OFFSET) & XAXIDMA_HALTED_MASK)){
if (XAxiDma_Busy(&AxiDma, XAXIDMA_DEVICE_TO_DMA)){
xil_printf("Engine is busy\r\n");
return XST_FAILURE;
}
}
// Disable interrupts for polling
XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK, XAXIDMA_DEVICE_TO_DMA);
XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK, XAXIDMA_DMA_TO_DEVICE);
u32 phase_incr = 0x28f5c2;
u32 phase_input = 0x28f5c2;
u32 PHASE_25MHz = 0x4000000;
size_t transfer_size = 0x04;
// TxBufferPtr[31:16] = phase offset
// TxBufferPtr[15:0] = phase increment
// Flush the buffers before the DMA transfer, in case the Data Cache is enabled
Xil_DCacheFlushRange((UINTPTR)TxBufferPtr, 0x20);
while(1){
u8 b0 = (phase_input >> 24) & 0xff;
u8 b1 = (phase_input >> 16) & 0xff;
u8 b2 = (phase_input >> 8) & 0xff;
u8 b3 = phase_input & 0xff;
TxBufferPtr[0] = b3;
TxBufferPtr[1] = b2;
TxBufferPtr[2] = b1;
TxBufferPtr[3] = b0;
for(int i=4;i<28;i++) {
TxBufferPtr[i] = 0x00;
}
Status = XAxiDma_SimpleTransfer(&AxiDma, (UINTPTR)TxBufferPtr, transfer_size, XAXIDMA_DMA_TO_DEVICE);
if(phase_input < PHASE_25MHz){
phase_input = phase_input + phase_incr;
} else {
phase_input = 0x28f5c2;
}
// Flush the TX buffer before the next DMA transfer or you will infinitely send the same init value
Xil_DCacheFlushRange((UINTPTR)TxBufferPtr, 0x20);
}
cleanup_platform();
return 0;
}
Just like in my last project utilizing and controlling the DDS compiler from HDL, I decided to do a simple chirp from 1MHz to 25MHz in 1MHz steps. The period of each of these 1MHz steps is the part that's notably different from my last project where the DDS was controlled from the programmable logic. I don't have the granular control over the timing to only allow for exactly 1 microsecond to set the whole chirp to be within a span of exactly 26 microseconds (my fabric clock is 100MHz which is 10 nanoseconds per clock cycle).
This is the major differentiator between implementing control of the DDS Compiler in C code versus at the lower level in the programmable logic of the FPGA in HDL. One of my design rules that has held true for me 99.999% of the time is that if control of the timing is my number one priority, I implement the design in HDL.
Since my only concern for the period of each frequency step is that the full chirp be viewable in the ILA window, I used the DMA simple transfer function and loaded each new frequency value into the transmit buffer within the while loop (versus loading everything into a larger buffer and doing a continuous transfer of that buffer) to slow it down enough
Using the DMA simple transfer function is ultimately why I had to cheat and set the s_axis_phase_tvalid of the DDS Compiler's phase input channel to a constant logic level 1. Since each transfer was causing the tvalid output from the DMA to toggle, the tvalid on the input of the DDS was ultimately being toggled, resulting in my output waveform to look choppy:
Adding the AXI Stream Data FIFO held the tdata value constant on the phase input of the DDS compiler while the Constant IP block held the tvalid signal high.
To run the application on the Zynqberry and be able to view the data in the ILAs, the FPGA first needs to be programmed from Vitis and a debug run launched:
Plug in the Zynqberry to your machine and right-click on the application in the Explorer window of Vitis then select the option to 'Program FPGA':
All of the settings will be auto-populated so just click 'Program' in the window that pops up.
To launch a debug run, again right-click on the application in the Explorer window of Vitis then select the option to 'Debug As...' and 'Launch on Hardware (Single Application Debug)':
If you would like to, connect to Vitis serial terminal to view any print statements:
Select the appropriate port on your machine and make sure the baud rate is set to 115200.
Then click 'Resume' or F8 to run the C application.
To view the ILAs, return to Vivado and navigate to the hardware manager in the Flow Navigator window. Click the 'Open Hardware Manager' drop down and select 'Open Target'. For this project, the auto connect option will work just fine:
Once the ILA windows have opened, you can run an immediate trigger on each to watch the data loop and the chirp waveform repeat indefinitely thanks to the while(1) loop:
You'll see the AXI Stream transfers between the DMA and the FIFO, and the FIFO and the DDS Compiler. Notice what I mentioned about the FIFO keeping the tdata value constant on the phase input of the DDS compiler:
You'll also notice the tvalid output of the AXI Stream FIFO toggling with the tvalid output from the DMA, making it clear why the cheat of the constrant IP was needed on the tvalid input of the DDS compiler.
The the other ILA will show you the tdata line for the DDS Compiler output. Right-click on it and change the Radix to Signed Decimal, and change the Waveform Style to Analog to view it as you would on an oscilloscope:
This project ended up being a bit trickier than I initially thought, mainly due to the timing and the control of the DMA. Hopefully this is helpful and gives some insight on how to make design choices when working with an FPGA.
Comments