Several times this year I have presented courses in person or online about how to implement the Arm DesignStart FPGA cores in Xilinx FPGAs. I have also showcased several projects here on Hackster in which the Arm Cortex-M1 and M3 cores are integrated on Digilient Arty A7 and Arty S7 boards.
I thought it would be very useful to show how these cores can be implemented with the Arty Z7. As the Arty Z7 already has dual hard core Arm Cortex-A9 cores this means we are adding another core into an already heterogeneous processing system. What this provides however is an system in which the higher performance processor in this case the A9 can off load sensor interfacing or actuator / motor control to a the M1 or M3. This is often much easier and faster than building a custom statemachine.
It will come as no surprise, that the development flow for the Arty Z7 is a little different to the previous development flows for the Arty A7 and Z7 especially when it comes to creating the final system.
To complete this project you need to have the correct Xilinx and Arm Design tools installed. They are all free to download, the projects and webinar here and here show how you can set up your development system to complete this project.
One very important stepisto make sure you dig out the Arm DesignStart Email so you can obtain the correct license for Keil.
Unlike with the previous projects we will not be using a reference design which was created by Arm and building upon it.
This time we are going to start from scratch! We do however need to download the reference Arm DesignStart FPGA project for the M1 for Arty S7 /A7 as that also contains the IP Cores for the Cortex-M1 and M3 along with necessary software repositories. You can download the reference design with the IP repository here
As this builds upon techniques presented in the tutorial I do recommend you read through the lab book and the projects above to understand the base flow.
Differences with Zynq From FPGAIn previous projects with the Arm DesignStart IP cores we have used have the flow below
As the Cortex IP will be within the programmable logic we follow a similar development flow. However, there are differences you will see as the project progresses including
- Clocking - If fabric clocks are used the processing system must be configured, other wise there is no PL clock
- Boot - The Processing System configures the programmable logic as such the bit file with Arm M1 is contained within the boot files.
- Communication - Proper inter processor communication is required to be implemented to communicate between the Arm Cortex-A9 and Arm Cortex-M1 / M3 within the programmable logic.
to ease the development flow we are going to re use the following scripts from the ARM example design with modification.
- MMI File - Contains Block Ram physical placement information
- Make Bitstream - Performs the merge of the bit stream with the updated SW application
- Write Hex - Post compilation batch script which transfers the SW ELF image to the HW directory for the bit stream merge.
There are however a few tweaks needed to settings within these files.
The Instruction memory of the Cortex-M1 processor is contain within BRAM. As such to update the BRAM contents, we need to update the MMI file. This MMI file defines which of the BRAMS in the design are used by the Cortex-M1 instruction memory enabling them to be updated without a full re implementation
Within the provided MMI script we need to update the device to the one used on the Zynq Z7-20
# Set MMI output file name
set mmi_file "z7.mmi"
set part "xc7z020clg400-1"
We also need to update the name of the block diagram name
# Write the file header
puts $fp "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
puts $fp "<MemInfo Version=\"1\" Minor=\"15\">"
puts $fp " <Processor Endianness=\"ignored\" InstPath=\"dummy\">"
puts $fp " <AddressSpace Name=\"design_1_i.CORTEXM1_AXI_0.inst.u_x_itcm\" Begin=\"0\" End=\"[expr {$itcm_size_bytes-1}]\">"
puts $fp " <BusBlock>"
Within the boot image creation script we need to update the location and name of the bit stream to merge in the SW application
# Input files
set mmi_file "./z7.mmi"
set elf_file "./bram_s7.elf"
set source_bit_file "./hw.runs/impl_1/design_1_wrapper.bit"
#set reference_bit_file "./m1_for_arty_s7_reference.bit"
# Output files
set output_bit_file "Zynq_z7.bit"
set output_mcs_file "Zynq_z7.mcs"
Finally within the ELF and Hex copy facility which is used by Arm Keil the path must be updated to enable the files to be copied into the correct directory.
@REM - Copy the files to the relevant directories of the hardware project
copy bram_s7.* ..\..\..\hw
copy qspi_s7.hex ..\..\..\hw
Creating the Base Vivado ProjectThis project is going to developed in two parts, the first part is the implement the basic Arm Cortex-M1 core within the Zynq PL and establishing the development flow. The second will implement a application similar to the Hackster Webinar Example.
As such the first thing we need to do is create a new project targeting the Arty Z7 board.
The project will be a RTL project
As yet we do not have any RTL or constraints so leave the next two dialogs empty an click on next.
On the penultimate tab select the Arty Z7-20 board as the target reference design. If you do not see the Arty Z7-20 board here you have not installed the Digilent boards into Vivado, you can find instructions on how to do this above in the links in the introduction.
One the Vivado project is opened, the next step is to create a block diagram to which we can add the Zynq PS configuration and the Arm Cortex-M1 IP core.
Before we can add in the Arm Cortex-M1 IP core we need to add in the Vivado directory from the Arm DesignStart Example download. This contains the IP and the necessary SW repositories which we will need later.
Once the block diagram is created the next step is to add in the Zynq PS system from the IP catalog.
Once the Zynq PS is included in the design run the block automation to configure the processing system for the Arty Z7-20
Running the block automation
Once the configuration has been completed the next step is to add in the Cortex-M1 IP core. We can do this from the IP catalog.
Once the M1 IP core has been added, the next step is to configure it, as we do not want to complicated a system I am going to set the number of interrupts at one and set the debug for no debug.
Using the no debug option will give us the smallest M1 instance possible in the Arty Z7 (just wait and see how small it really is)
With that completed, the next stage is to add in a AXI UartLite and again run the connection automation. At this point we need to be careful to select the master in the connection automation and make sure it is the Cortex-M1 and not the Zynq PS.
Our Final design should look like the below
The constant blocks are
- Constant Low for interrupts
- Constant Low for NVIC
- Constant 0x03 for the CFGITCMEM
The CFGITCMEM determines if the instruction memory is used internally or on the AXI bus.
Once this is implemented the next step is to synthesise the design and then assign the UARTLite TX and RX pins. These will be connected to the IO0 and IO1 on the shield connector.
Once the implementation is completed, open the implementation, change the directory to the hardware directory and run the command below
source make_mmi_file.tcl
This will update the Memory Map Information file with the location of the BRAMs used to store the instruction tightly coupled memory.
When the design completes we can export the hardware to SDK.
We do not need to export the bitstream as we will be merging the Arm Cortex-M1 SW design in to the bitstream later as we did for the previous projects.
Initial SW Development for the Cortex-M1The next step is to open SDK and map in the software repositories downloaded with the Arm DesignStart package.
Once SDK opens you will see the hardware project imported, this project will contain the entire design as seen from the processor view point.
To use the IP included within the PL e.g. the AXI UART Lite we need to create a BSP for the Cortex-M1. This will provide us with several API's which can be used in the software development.
When the BSP is being generated please check the STDIN STDOUT are set to the AXI UART Lite.
The BSP will be generated and under the project explorer you will see the hardware platform and the generated BSP.
For the Cortex-M1 application this is where we stop using SDK however, we will be using it for the the Zynq application development and to download the bit file and applications for testing.
Before we open Keil however we need to copy from the Arm reference design into the BSP include directory
- Xpsuedo_asm_rvct.c
- Xpseudo_asm_rvct.h
You will find them in
<download>\vivado\Arm_sw_repository\CortexM\bsp\standalone_v6_7\src\arm\cortexm1\armcc
We are then ready to develop our application in Keil, for this application and to ensure correct settings I copied the SW project which came with the Arm Design Start Example.
In the Keil environment we can use the application as it stands however, we need to delete the SPI, I2C, GPIO and BRAM elements as these no longer exist in the hardware.
Within the main loop of the application add a line to output a "hello world" string.
However, before we can compile the program we have a few files which need a little house keeping.
Initial TestingOnce the project has compiled we can run the make bitstream BAT file and a new bitstream will be generated with the just developed SW included.
Within SDK create a Zynq PS Hello World program and debug this on the hardware. This will configure the processor system including the very important clocks and download the new bitstream to the PL.
Once this is downloaded you will see the Cortex-M1 jump to life if all is correct and the Serial Port will start outputting the hello world message.
I captured this on my scope and confirmed the message decoding the received data into ASCII.
We now know that we have a development flow which will enable us to include the Cortex Design Start FPGA cores within our design.
Creating the applicationIt is therefore time to create the application, in this project to keep it inline with the recent Hackster webinar (a few attendees bought the Arty Z7 in place of the Arty S7). I will implement a similar solution so that it is possible to read across from one project to the other and make comparisons.
In this project we are going to add in two Pmod IP one on each of the Pmod ports. We will be using the PmodHYGRO and the PmodNAV, we can obtain programmable logic drivers and SW drivers from the Digilent Vivado library available here.
Once you have this downloaded again you need to add it to the IP repositories within Vivado.
With the IP Repo updated, within the block diagram add in the PmodNAV and PmodHYGRO.
Then run the connection automation to connect them to the Cortex-M1 IP core.
Connect the PmodNAV to PmodA and PmodHYGRO to PmodB.
We now need to be able to communicate between the Arm A9 processors in the PS and the Cortex-M1 processor within the PL. To do this in a safe manner we can use a Mailbox.
If we desire to share resources on the same AXI bus we can use a Mutex
If you are not familiar with the difference between a Mailbox and a Mutex it is
- Mailbox — Allows bi-directional communication between multiple processors using a FIFO based approach to messaging.
- Mutex — Implement mutual exclusion locks, this allows processors to lock shared resources preventing multiple accesses at the same time
In the block diagram add a Mailbox and ensure one AXI port is connected to the Zynq PS and the other the Cortex-M1.
If implemented correctly this will result in a diagram like below where the red and blue highlights are the AXI masters from the PS (Red) and CortexM1 (Blue).
The final step before building the design is to check the address editor to ensure the Pmods are mapped into the correct address range.
Implement the design and when the bitstream is completed, open the implementation view and again run the MMI generation script. We need to run this script each time the FPGA is regenerated as the BRAMS used for the instruction memory might change.
If you check the size of the implementation you might be interested to find out the implemented solution takes less than 8% of the LUT and 5% of the FF.
This is a really impressively small solution!
Now we need to create two solutions one for the Zynq and the other for the Cortex M1. My plan is to have the Pmod sensors read by the Cortex M1 and then send data to the PS at set time intervals e.g. once a second.
Creating the Zynq ApplicationFirst we will create the Zynq PS application.
Export the hardware to SDK and allow the hardware definition to be updated and the BSPs to be regenerated to support the new hardware.
This application will read the mailbox and output the values over the terminal with an indication that the message has been received from the Cortex M1.
#include <stdio.h>
#include "platform.h"
#include "xil_printf.h"
#include "xmbox.h"
#define MSGSIZ 1024
char RecvMsg[MSGSIZ] __attribute__ ((aligned(4)));
int main()
{
XMbox Mbox;
XMbox_Config *ConfigPtr;
init_platform();
ConfigPtr = XMbox_LookupConfig(XPAR_MBOX_0_DEVICE_ID );
XMbox_CfgInitialize(&Mbox, ConfigPtr, ConfigPtr->BaseAddress);
print("Z7 Application\n\r");
XMbox_ReadBlocking(&Mbox, (u32*)(RecvMsg), 24);
printf ("Rcvd the message --> \r\n\r\n\t--[%s]--\r\n\r\n", RecvMsg);
memset(RecvMsg, 0, MSGSIZ);
while(1){
XMbox_ReadBlocking(&Mbox, (u32*)(RecvMsg), 24);
printf ("Rcvd the message --> \r\n\r\n\t--[%s]--\r\n\r\n", RecvMsg);
}
cleanup_platform();
return 0;
}
Creating the Cortex M1 ApplicationsCreating the Cortex M1 application requires a few more steps, once the hardware platform has been updated in SDK you should see the PmodNAV and PmodHYGRO on the Cortex M1 address map
As the BSP is updated please remember to copy across the Xpsuedo_asm_rvct.c and Xpseudo_asm_rvct.h files again.
We then need to update the Keil project, as the project is mapped to pic up the BSP include directory in the workspace. To be able to use the drivers we need to add in the source code so they can be compiled, we have no auto compiled Xillib.a like we do when using SDK.
Use the manage project items box to add in the source code for the PmodNAV, PmodHYGRO and the Mailbox.
We can then start to generate the code, as I am using a blocking method for mailbox communciation we need to send a expected number of bytes between the PS and the Cortex M1. The size of the message also needs to be multiple of four.
The code developed to run on the Cortex M1 can be seen below
HYGRO_begin(&myDevice,XPAR_PMODHYGRO_0_AXI_LITE_IIC_BASEADDR,0x40,XPAR_PMODHYGRO_0_AXI_LITE_TMR_BASEADDR,1,TIMER_FREQ_HZ);
NAV_begin(&nav,XPAR_PMODNAV_0_AXI_LITE_GPIO_BASEADDR,XPAR_PMODNAV_0_AXI_LITE_SPI_BASEADDR);
ConfigPtr = XMbox_LookupConfig(XPAR_MBOX_0_DEVICE_ID );
XMbox_CfgInitialize(&Mbox, ConfigPtr, ConfigPtr->BaseAddress);
NAV_Init(&nav);
STRELOAD = RELOAD_VALUE;
STCTRL = (1<<SBIT_ENABLE) | (1<<SBIT_TICKINT) | (1<<SBIT_CLKSOURCE);
char msg[24]; // = "Arm Cortex Starting Up ";
char msg2[24];
char msg3[24];
char msg4[24];
char msg5[24];
sprintf(msg,"Arm Cortex M1 Starting!!");
XMbox_WriteBlocking(&Mbox, (u32*)((u8*)msg), 24);
// Main loop. Handle LEDs and switches via interrupt
while ( 1 )
{
if (sample == TRUE){
temp_degc = HYGRO_getTemperature(&myDevice);
hum_perrh = HYGRO_getHumidity(&myDevice);
sprintf (debugStr, "Temp is %f Humidity is %f\r\n\n", temp_degc, hum_perrh );
sprintf (msg, "Temp is %3.3f ",temp_degc);
sprintf (msg2, "Humidity is %3.3f ",temp_degc);
print ( debugStr );
NAV_GetData(&nav);
sprintf (debugStr, "X is %f Y is %f Z is %f\r\n\n", nav.acclData.X, nav.acclData.Y, nav.acclData.Z );
sprintf (msg3, "Z is %3.3f ",nav.acclData.Z);
sprintf (msg4, "X is %3.3f ",nav.acclData.X);
sprintf (msg5, "y is %3.3f ",nav.acclData.Y);
print ( debugStr );
sample = FALSE;
XMbox_WriteBlocking(&Mbox, (u32*)((u8*)msg), 24);
XMbox_WriteBlocking(&Mbox, (u32*)((u8*)msg2), 24);
XMbox_WriteBlocking(&Mbox, (u32*)((u8*)msg3), 24);
XMbox_WriteBlocking(&Mbox, (u32*)((u8*)msg4), 24);
XMbox_WriteBlocking(&Mbox, (u32*)((u8*)msg5), 24);
}
}
}
Once this code compiles we are then able to regenerate the bitstream (using the BAT file) and download it to the Arty Z7-20 using the debug manager
As the mailbox calls are blocking at start up the Cortex M1 will wait once it gets to the first transfer and wait for us to start the PS cores.
Once we do this every second we should see a message being output on the terminal screen.
I was also able to see this on the scope too as well that the Cortex M1 was transmitting once a second the environmental data
We now have a solution which is running which is similar to what was developed in the Hackster webinar. However, this provides us with much more flexibility as we can now if we desire use the A9 to connect to the IoT for example and log the received data to the Cloud. Alternatively we could also write the results to a file system and saved the data on a non volatile media such as the SD Card.
Wrap UpSuch an approach also enables a very responsive system as the high performance A9 cores are now freed from working on low level sensor interfacing and able to focus on more demanding applications e.g. the synthesis, analysis and action taking depending upon the data results.
See previous projects here.
Additional Information on Xilinx FPGA / SoC Development can be found weekly on MicroZed Chronicles.
Comments