Xilinx's Zynq-7000 series FPGA chipset is one of the most popular System on a Chip (SoC) options available on the FPGA market. It's one of my personal preferences to work with in the FPGA domain due Xilinx's integrated development environment (IDE) Vivado (despite the fact it's a love/hate relationship at times). One of the main features in any of Xilinx's SoC chipsets (Zynq-7000 or UltraScale+) is the multiple physical processors embedded into the programmable logic of the FPGAs. The Zynq-7000 series FPGAs specifically are equipped with dual-core ARM Cortex-A9 processors. The cores of the Zynq processor are able to share resources on the chip such as on-chip memory (OCM), DDR, UART, interrupts via the Interrupt control distributor (ICD), and global timers to name a few.
While I have quickly adapted to Xilinx's new SDK tool/platform, Vitis, I've noticed there is not a ton of documentation available yet on how to run the multiple processors on their SoC FPGAs simultaneously. Luckily, after a couple of days of poking around, I found that it was a fairly straightforward process.
I've covered how to create a new project in Vitis for a baremetal application on Zynq running just on ARM core 0 in a past project here, so I'll jump straight into how to create the bare metal application for the second ARM core (ARM core 1) and how to create a single boot image for it. I'm using a Zynqberry for this project, but none of the steps in this project are necessarily specific to the Zynqberry and would apply to any Zynq-7000 series development board.
In Vitis, just like in its predecessor XSDK, the software design is based on a hardware platform. This is why a Platform Project has to be created in Vitis prior to the creation of an Application Project. The platform imports the hardware design into Vitis from the XSA file exported from Vivado, which then facilitates the creation of board support packages (BSPs), boot components such as first stage bootloaders (FSBLs), and ultimately a baremetal application or OS.
Each BSP can only support one baremetal application or OS. This hierarchical relationship of BSP to baremetal application/OS is referred to as a domain in Vitis. Thus, since a domain for ARM core 0 already exists with it's Hello World baremetal application, a new domain needs to be created for the second ARM core (ARM 1).
To create a new domain, open platform.xpr and then right-click on the name of the platform project with the green symbol next to it in the window that opens (not in the Explorer menu - the only option that will appear is 'Add Domain'):
Give the new domain a desired name, then select the standalone option for the OS since we're going to create a baremetal application. Finally, select the second ARM core (ps7_cortexa9_1) as the processor.
This provides the hooks for ARM core 1 to be selected as the target processor when creating a new Application Project.
To create a new application to run on the second ARM core, select File > New... > Application Project and give the new application your desired name, then click 'Next'.
Be sure to select the custom hardware platform exported from Vivado (the.XSA file):
Select the second domain that was created for the second ARM core. If you accidentally select the domain for ARM core 0, the 'Next' button will grey out and a warning message will pop up saying that an application already exists for that domain.
For now, I'm using the Hello World application template for the second ARM core (ARM1).
One of the key steps that is easily missed when running both of the dual cores in the Zynq chips is that the DDR memory addresses in the linker scripts is not modified to keep each application in their own parts of memory.
By default, when the applications are created, the linker scripts assigns the application the same base address and with the size denoted from the hardware platform of how much DDR is available of the board. If left unchanged, both the application for ARM0 and ARM1 will try to operate from the space address space in DDR. Which of course, isn't good.
To modify the DDR addressed to allocate one part to ARM0 and the other to ARM1, open the linker script for each application (located in the Explorer > application > src > lscript.ld) and change the values for ps7_ddr_0 at the top.
To keep things simple, I decided to just allocate the lower half of the Zynqberry's DDR to ARM0 and the upper half to ARM1 (just as a note though, since these are baremetal applications doing almost nothing but printing a single line to UART, they absolutely do not need anywhere near this much DDR to operate).
To achieve this, I left ARM0's base address at its default value (0x100000) but cut the size down by half to 0xFF800000. I set ARM1's base address to ARM0's base address plus 0xFF800000 (0x10080000) and it's size was also set to 0xFF800000.
While both of the ARM cores have the same processing power, there are still operators such as the boot process that require one of the cores to act as the master and the other as the slave. In the Zynq, ARM0 is the master so there are certain operations and peripherals only it will have access to. ARM0 will also be responsible for booting up the slave ARM1 at the appropriate point in time.
In order for the application on ARM1 to be aware of the fact that it is operating on the slave processor and be provided the appropriate hooks, the Asymmetric Multi Processing (AMP) compiler flag must be set in its BSP.
To do this, open the platform.xpr window again, select Modify BSP settings... and navigate to ps7_cortexa9_1 > extra_compiler_flags, then add the following to the end of the arguments already there:
-DUSE_AMP=1
Now that all of the hardware and BSP settings configured, the actual application code can be written. As I mentioned before, ARM0 is responsible for booting up ARM1. There are main two steps ARM0 must follow to successfully boot ARM1 (as described in UG585 in section 6.1.10):
- Write the memory space base address in the Zynq's DDR (PS7 DDR) for ARM core 1 to 0xFFFFFFF0 (which is 0x10080000 in this project).
- Execute the SEV instruction (an alert to all cores within a multiprocessor system) to wake up ARM core 1 and cause it to jump into its application.
It's also important to disable the cache in the OCM when running multiple processors at once since it's not a shareable resource between processors.
For safety, executing the ARM data memory barrier instruction right after writing ARM1's base address to 0xFFFFFFF0 is a good practice as it won't allow the processor to move on until the write instruction is completed and the memory space appears as updated.
Finally, the SEV instruction can be executed which acts like a beacon alert across the system to tell all processors present to wake up and jump into their applications.
One more quick feature: I wanted the UART console to show the "Hello World" print outs from ARM0 and ARM1 in an alternating pattern. So instead of trying to hardcore the sleep times to accomplish that, I used a variable that I typecast into the shared memory space at 0xFFFF0000 as something each ARM could poll the value of to know when the other had finished its print statement.
ARM0 code:
#include <stdio.h>
#include <sleep.h>
#include "xil_io.h"
#include "xil_mmu.h"
#include "platform.h"
#include "xil_printf.h"
#include "xpseudo_asm.h"
#include "xil_exception.h"
#define sev() __asm__("sev")
#define ARM1_STARTADR 0xFFFFFFF0
#define ARM1_BASEADDR 0x10080000
#define COMM_VAL (*(volatile unsigned long *)(0xFFFF0000))
int main()
{
init_platform();
COMM_VAL = 0;
//Disable cache on OCM
// S=b1 TEX=b100 AP=b11, Domain=b1111, C=b0, B=b0
Xil_SetTlbAttributes(0xFFFF0000,0x14de2);
print("ARM0: writing startaddress for ARM1\n\r");
Xil_Out32(ARM1_STARTADR, ARM1_BASEADDR);
dmb(); //waits until write has finished
print("ARM0: sending the SEV to wake up ARM1\n\r");
sev();
while(1){
print("Hello World - ARM0\n\r");
sleep(1);
COMM_VAL = 1;
while(COMM_VAL == 1){
}
}
cleanup_platform();
return 0;
}
ARM1 code:
#include <stdio.h>
#include <sleep.h>
#include "xil_io.h"
#include "xil_mmu.h"
#include "platform.h"
#include "xil_cache.h"
#include "xil_printf.h"
#include "xparameters.h"
#include "xpseudo_asm.h"
#include "xil_exception.h"
#define COMM_VAL (*(volatile unsigned long *)(0xFFFF0000))
extern u32 MMUTable;
int main()
{
init_platform();
print("CPU1: init_platform\n\r");
//Disable cache on OCM
// S=b1 TEX=b100 AP=b11, Domain=b1111, C=b0, B=b0
Xil_SetTlbAttributes(0xFFFF0000,0x14de2);
while(1){
while(COMM_VAL == 0){
};
print("Hello World - ARM1\n\r");
sleep(1);
COMM_VAL = 0;
}
cleanup_platform();
return 0;
}
The function init_platform() needs to be left in both applications at it is responsible for initializing the UART console for each application.
Build the project system containing both of the applications for ARM0 and ARM1, then from the Xilinx menu option select 'Create Boot Image'.
By default, the boot image writer will have the paths for the bitstream, FSBL, and ELF file for ARM0. To add the ELF file for ARM1, click 'Edit' and browse to the location of the ELF file for ARM1. Leave the type as datafile, then click 'Ok'.
From the Create Boot Image window, click 'Create Image'. A warning will pop up that the previous version is about to be overwritten. This is ok.
Finally, plugged I plugged in the Zynqberry to my computer and selected the option under the Xilinx menu option for 'Program Flash'. Vitis filled in the locations for the FSBL, and boot image for me.
As I was intending, the two cores polled the shared memory value to alternate their serial outputs:
Hopefully, this project will help provide you a starting point for taking full advantage of your Zynq designs!
Comments