Image processing at the edge requires not only high performance but also, compact size and power efficiency. The Ultra96 enables us to create a image processing solution which uses the processing system and its programmable logic to its maximum potential. Thanks to the support for MIPI DPhy and DisplayPort we can create a very compact and efficient solution. Once we have this solution passing through the image data we can then expand it to support applications including machine learning and implementing image processing at the edge. In this project we are going to look at how we can create an image processing system which uses the PCam5 or R-PI Camera and its MIPI input to receive an input and then output the image over the DisplayPort Output.
Hardware ArchitectureWe will use the 96 boards MIPI interface, this provides everything we need to communicate with the camera. However, we need to understand the hardware connectivity. PCam5 uses two MIPI Lanes, control of the PCam5 is over I2C this is connected to the PCam5 over the following route
- Ultra96 - Zynq Us+ PS IIC1 (Ultra96 Schematics - U1)
- Ultra96 -TCA9548 IIC Switch (Ultra96 Schematics - U7)
- Ultra96 - High Sped Connector Pins 32 and 34 (Ultra96 Schematics - J5)
- MIPI Board - High Speed Connector Pin 32 and 34 (MIPI Schematics - J7)
- MIPI Board - Header connector Connect I2C_SDA/SCL to CAM1_SDA/SCL with Jumpers (MIPI Schematics J13)
- MIPI Board - Level convert from 1v8 to 3v3 (MIPI Schematics - U18)
- MIPI Board - Connect to Camera Interface (MIPI Schematcs - J5)
To help visualize I have drawn the diagram below - Of course on the PCam5 this is then converted back to 1v8.
We also need to drive a GPIO signal to power up the camera, this takes the path
- Ultra96 - Zynq Us+ PS GPIO MIO 37
- Ultra96 - Low Speed Connector pin 24 (Ultra96 Schematics - J5)
- MIPI Board - Low Speed Connector pin 24 (MIPI Schematics - J7)
- MIPI Board - Connect APQ_GPIO12 pin 1 to 2 CAM_GPIO1 (MIPI Schematics - J15)
- MIPI Board - Level conversion from 1v8 to 3v3 (MIPI Schematics - U4)
- MIPI Board - Connecto to Camera Interface Pin 10 (MIPI schematics - J5)
Again to help visualize I have drawn the diagram below
To talk to the PCam5 over the I2C network the first thing we need to do is set the I2C expander to the correct path.
To determine the address of the I2C Expander and how to drive it we can use the datasheet
Now we understand the hardware architecture we can work on the architecture of the design used in the FPGA.
FPGA ArchitectureThe FPGA architecture will, interface with the PCam5 and then transfer the image to a frame buffer in the PS DDR memory. This then provides maximum flexibility, this enables us to use a non-live DisplayPort output from the PS memory or transfer the data back into the PL add some more image processing ans then output the image using the Live Display Port input.We will use the following IP blocks
- MIPI CSI2 Sub System - Interfaces to the MIPI PCam5
- DeMosaic - Converts the RAW image into a RGB image
- VDMA - Transfers the image to and from the PS DDR
- VTC - Provides the Timing for the live video output
- AXI Stream to Video Out - Converts the AXIS stream to Parallel Video Output format for the Display Port Live output.
- Clock Wizard - Used to generate the timing for the Pixel Output (74.25 MHz) and the DPHY reference (200 MHz) clocks
- ILA - Used to debug the design and provide confidence in the bring up
The MIPI pin out on the Ultra96 is connected to the high speed output. When we instantiate the MIPI CSI-2 RX Subsystem we need to set the MIPI interface to use the correct pins, number of lanes and line rate.
As when we create the project we should be using the Ultra96V2 as the target board the PS IO should be set up correctly. This includes the GPIO on MIO37 and the IIC1 as can be seen in the diagram below.
The DisplayPort Controller is able to support up to six non-live sources from the PS DDR memory. It is also capable of supporting live video and graphics from the PL, of course, we can also mix both the non-live and live video feeds.To be able to support a wide range of applications, the DisplayPort Controller supports 6, 8, 10 or 12 bits per component, while also supporting a range of color spaces including RGB, YCrCb 4:2:2, YCrCb 4:4:4 and YCrCb 4:2:0.What is really exciting is the additional capabilities provided by the DisplayPort Controller:
- Chroma sampling and sub sampling
- Color space conversion
- Alpha blending
- Audio mixer
Both the final blended video and audio are optionally available to the programmable logic if desired.
To ensure we can use the display port controller from the PL we need to enable the live input under the PS-PL configuration settings.
If you want to double check the correct application of the GT Lanes for the DisplayPort this is available under the I/O Configuration.
The live video formats supported by the Display port controller are defined in its product guide however, you can see the following formats supported in the extract.
The completed Vivado project looks like the diagram below
The software architecture of the design is the following
- Initialize all peripherals for access - MIPI CSI-2, IIC, GPIO, VDMA, Display Port, VTC
- Configure the VTC to provide a 720P output format
- Configure the correct path through the I2C Switch
- Enable the PCam5 using the GPIO MIO37
- Configure the PCAM5 over the I2C (720p)
- Receive Images and transfer them to and from the PL using the VDMA.
- Configure the DisplayPort Output for live video output
To do this one of the first things we need to do when we have generated the BSP is configure the DisplayPort Driver correctly.We do this by opening the MSS file and changing the driver which is used by the DisplayPort Controller.We need to do this as there are four possible configurations of the DisplayPort Controller and depending upon which configuration is used a different driver is needed.
- Memory to Data Path — For this case, use dppsu API.
- PL to Data Path — For this case, use dppsu API.
- PL to PL — For this case, use avbuf API.
- Memory to PL — For this case, use avbuf API.
We can select the desired API by re-customizing the BSP settings and selecting the necessary driver. Note that, when we use the DPPSU, the BSP will still contain the AVBUF API as they are required too.While the AVBUF defines the configuration and use of the audio / visual pipeline, the DPPSU defines the configuration of the DisplayPort Transmitter. Therefore, when we want to transit video external to the MPSoC, we need to use it as well as the AVBUF drivers.
Once we have the BSP correctly installed we can configure the DisplayPort transmitter.
One of the largest tasks is to configure the PCam5 over the I2C, this can be achieved using the settings below.
//config_word_t const cfg_init_[] =
static u32 cfg_init [][2] =
{
//[7]=0 Software reset; [6]=1 Software power down; Default=0x02
{0x3008, 0x42},
//[1]=1 System input clock from PLL; Default read = 0x11
{0x3103, 0x03},
//[3:0]=0000 MD2P,MD2N,MCP,MCN input; Default=0x00
{0x3017, 0x00},
//[7:2]=000000 MD1P,MD1N, D3:0 input; Default=0x00
{0x3018, 0x00},
//[6:4]=001 PLL charge pump, [3:0]=1000 MIPI 8-bit mode
{0x3034, 0x18},
//PLL1 configuration
//[7:4]=0001 System clock divider /1, [3:0]=0001 Scale divider for MIPI /1
{0x3035, 0x11},
//[7:0]=56 PLL multiplier
{0x3036, 0x38},
//[4]=1 PLL root divider /2, [3:0]=1 PLL pre-divider /1
{0x3037, 0x11},
//[5:4]=00 PCLK root divider /1, [3:2]=00 SCLK2x root divider /1, [1:0]=01 SCLK root divider /2
{0x3108, 0x01},
//PLL2 configuration
//[5:4]=01 PRE_DIV_SP /1.5, [2]=1 R_DIV_SP /1, [1:0]=00 DIV12_SP /1
{0x303D, 0x10},
//[4:0]=11001 PLL2 multiplier DIV_CNT5B = 25
{0x303B, 0x19},
{0x3630, 0x2e},
{0x3631, 0x0e},
{0x3632, 0xe2},
{0x3633, 0x23},
{0x3621, 0xe0},
{0x3704, 0xa0},
{0x3703, 0x5a},
{0x3715, 0x78},
{0x3717, 0x01},
{0x370b, 0x60},
{0x3705, 0x1a},
{0x3905, 0x02},
{0x3906, 0x10},
{0x3901, 0x0a},
{0x3731, 0x02},
//VCM debug mode
{0x3600, 0x37},
{0x3601, 0x33},
//System control register changing not recommended
{0x302d, 0x60},
//??
{0x3620, 0x52},
{0x371b, 0x20},
//?? DVP
{0x471c, 0x50},
{0x3a13, 0x43},
{0x3a18, 0x00},
{0x3a19, 0xf8},
{0x3635, 0x13},
{0x3636, 0x06},
{0x3634, 0x44},
{0x3622, 0x01},
{0x3c01, 0x34},
{0x3c04, 0x28},
{0x3c05, 0x98},
{0x3c06, 0x00},
{0x3c07, 0x08},
{0x3c08, 0x00},
{0x3c09, 0x1c},
{0x3c0a, 0x9c},
{0x3c0b, 0x40},
//[7]=1 color bar enable, [3:2]=00 eight color bar
{0x503d, 0x00},
//[2]=1 ISP vflip, [1]=1 sensor vflip
{0x3820, 0x46},
//[7:5]=010 Two lane mode, [4]=0 MIPI HS TX no power down, [3]=0 MIPI LP RX no power down, [2]=1 MIPI enable, [1:0]=10 Debug mode; Default=0x58
{0x300e, 0x45},
//[5]=0 Clock free running, [4]=1 Send line short packet, [3]=0 Use lane1 as default, [2]=1 MIPI bus LP11 when no packet; Default=0x04
{0x4800, 0x14},
{0x302e, 0x08},
//[7:4]=0x3 YUV422, [3:0]=0x0 YUYV
//{0x4300, 0x30},
//[7:4]=0x6 RGB565, [3:0]=0x0 {b[4:0],g[5:3],g[2:0],r[4:0]}
{0x4300, 0x6f},
{0x501f, 0x01},
{0x4713, 0x03},
{0x4407, 0x04},
{0x440e, 0x00},
{0x460b, 0x35},
//[1]=0 DVP PCLK divider manual control by 0x3824[4:0]
{0x460c, 0x20},
//[4:0]=1 SCALE_DIV=INT(3824[4:0]/2)
{0x3824, 0x01},
//MIPI timing
// {0x4805, 0x10}, //LPX global timing select=auto
// {0x4818, 0x00}, //hs_prepare + hs_zero_min ns
// {0x4819, 0x96},
// {0x482A, 0x00}, //hs_prepare + hs_zero_min UI
//
// {0x4824, 0x00}, //lpx_p_min ns
// {0x4825, 0x32},
// {0x4830, 0x00}, //lpx_p_min UI
//
// {0x4826, 0x00}, //hs_prepare_min ns
// {0x4827, 0x32},
// {0x4831, 0x00}, //hs_prepare_min UI
//[7]=1 LENC correction enabled, [5]=1 RAW gamma enabled, [2]=1 Black pixel cancellation enabled, [1]=1 White pixel cancellation enabled, [0]=1 Color interpolation enabled
{0x5000, 0x07},
//[7]=0 Special digital effects, [5]=0 scaling, [2]=0 UV average disabled, [1]=1 Color matrix enabled, [0]=1 Auto white balance enabled
{0x5001, 0x03}
};
static u32 cfg_720p_60fps[][2] =
{//1280 x 720 binned, RAW10, MIPISCLK=280M, SCLK=56Mz, PCLK=56M
//PLL1 configuration
{0x3008, 0x42},
//[7:4]=0010 System clock divider /2, [3:0]=0001 Scale divider for MIPI /1
{0x3035, 0x21},
//[7:0]=70 PLL multiplier
{0x3036, 0x46},
//[4]=0 PLL root divider /1, [3:0]=5 PLL pre-divider /1.5
{0x3037, 0x05},
//[5:4]=01 PCLK root divider /2, [3:2]=00 SCLK2x root divider /1, [1:0]=01 SCLK root divider /2
{0x3108, 0x11},
//[6:4]=001 PLL charge pump, [3:0]=1010 MIPI 10-bit mode
{0x3034, 0x1A},
//[3:0]=0 X address start high byte
{0x3800, (0 >> 8) & 0x0F},
//[7:0]=0 X address start low byte
{0x3801, 0 & 0xFF},
//[2:0]=0 Y address start high byte
{0x3802, (8 >> 8) & 0x07},
//[7:0]=0 Y address start low byte
{0x3803, 8 & 0xFF},
//[3:0] X address end high byte
{0x3804, (2619 >> 8) & 0x0F},
//[7:0] X address end low byte
{0x3805, 2619 & 0xFF},
//[2:0] Y address end high byte
{0x3806, (1947 >> 8) & 0x07},
//[7:0] Y address end low byte
{0x3807, 1947 & 0xFF},
//[3:0]=0 timing hoffset high byte
{0x3810, (0 >> 8) & 0x0F},
//[7:0]=0 timing hoffset low byte
{0x3811, 0 & 0xFF},
//[2:0]=0 timing voffset high byte
{0x3812, (0 >> 8) & 0x07},
//[7:0]=0 timing voffset low byte
{0x3813, 0 & 0xFF},
//[3:0] Output horizontal width high byte
{0x3808, (1280 >> 8) & 0x0F},
//[7:0] Output horizontal width low byte
{0x3809, 1280 & 0xFF},
//[2:0] Output vertical height high byte
{0x380a, (720 >> 8) & 0x7F},
//[7:0] Output vertical height low byte
{0x380b, 720 & 0xFF},
//HTS line exposure time in # of pixels
{0x380c, (1896 >> 8) & 0x1F},
{0x380d, 1896 & 0xFF},
//VTS frame exposure time in # lines
{0x380e, (984 >> 8) & 0xFF},
{0x380f, 984 & 0xFF},
//[7:4]=0x3 horizontal odd subsample increment, [3:0]=0x1 horizontal even subsample increment
{0x3814, 0x31},
//[7:4]=0x3 vertical odd subsample increment, [3:0]=0x1 vertical even subsample increment
{0x3815, 0x31},
//[2]=0 ISP mirror, [1]=0 sensor mirror, [0]=1 horizontal binning
{0x3821, 0x01},
//little MIPI shit: global timing unit, period of PCLK in ns * 2(depends on # of lanes)
{0x4837, 36}, // 1/56M*2
//Undocumented anti-green settings
{0x3618, 0x00}, // Removes vertical lines appearing under bright light
{0x3612, 0x59},
{0x3708, 0x64},
{0x3709, 0x52},
{0x370c, 0x03},
//[7:4]=0x0 Formatter RAW, [3:0]=0x0 BGBG/GRGR
{0x4300, 0x00},
//[2:0]=0x3 Format select ISP RAW (DPC)
{0x501f, 0x03},
{0x3008, 0x02},
};
Enabling the PCam5
GPIO_Config = XGpioPs_LookupConfig(cam_gpio);
Status= XGpioPs_CfgInitialize(&gp_cam,GPIO_Config,GPIO_Config->BaseAddr);
XGpioPs_SetOutputEnablePin(&gp_cam,37,1);
XGpioPs_SetDirectionPin(&gp_cam,37,1);
XGpioPs_WritePin(&gp_cam,37,0x0);
usleep(1000000);
XGpioPs_WritePin(&gp_cam,37,0x1);
Setting up the Mux Channel
SendBuffer[0]= 0x04;
Status = XIicPs_MasterSendPolled(&iic_cam, SendBuffer, 1, SW_IIC_ADDR);
if (Status != XST_SUCCESS) {
print("I2C write error\n\r");
return XST_FAILURE;
}
Configuring the Camera
int Initial_setting_1 ( u32 *cfg_init , int cfg_init_QTY )
{
s32 Status , byte_count;
int i ;
u8 SendBuffer[10];
for(i=0;i<(cfg_init_QTY*2);i+=2){
SendBuffer[1]= *(cfg_init + i);
SendBuffer[0]= (*(cfg_init + i))>>8;
SendBuffer[2]= *(cfg_init + i + 1);
Status = XIicPs_MasterSendPolled(&iic_cam, SendBuffer, 3, IIC_CAM_ADDR);
if (Status != XST_SUCCESS) {
print("I2C read error\n\r");
return XST_FAILURE;
}
usleep(1000);
}
return XST_SUCCESS;
}
Bring Up and TestingOne of the first things we want to test is the PLL in the MIPI CSI-2 Subsystem locks correctly. To do this I used a ILA on the MIPI core outputs, this enable me to see this PLL was locking. If this does not lock we cannot correctly receive the MIPI signals from the camera.
The next stage is to check we can correctly interface with the camera over the I2C bus.
This first means that we need to ensure we are enabling the power supplies on the camera this is on pin 11 of the MIPI interface
Checking the I2C the first thing we need to do is check the I2C switch is set correctly. We can see this in the plot below that it is indeed set to the right channel.
The next step is to see the camera is detected correctly, we can do this by reading a register on the camera itself. This camera detect function reads and writes I2C bus to determine the PCam5 is present.
As we progress with the software development we result in the images below being output from the PCam5 on my DisplayPort Monitor.
Now we have a simple FPGA Implementation which enables us to expand the design and add in new algorithms developed using HLS and other techniques to create high performance image processing systems. We also have significant resources free in the FPGA to implement this processing thanks to the capability of the UltraScale+ ZU3EG device.
See previous projects here.
Additional information on Xilinx FPGA / SoC development can be found weekly on MicroZed Chronicles.
Comments