I have started this project to realize the smallest possible hardware configuration to achieve a stereo vision system accelerated by Zynq SoC.
ZedBoard is chosen as a hardware prototype. A custom PCB is made based on ZedBoard by reducing unnecessary components. Zynq is also downgraded to XC7Z010 which is the lowest grade available at Internet shops.
The stereo vision algorithm is based on a block-matching algorithm contained in OpenCV. This algorithm is converted to custom RTL written in Verilog-HDL to maximize Zynq's performance by parallel computing of FPGA.
Selection of Image SensorOmniVision OV5640 was chosen as an image sensor. They are available at several Internet shops and there are useful design resources on the Internet. They also have autofocus control which is not used in this design. These sensors were connected to ZedBoard using a custom extension board to confirm their operation.
I made PL and PS portion of the design so that input images were stored in DDR and transferred to host PC through gigabit ethernet (GbE) in real-time. In this design, PS is operating in standalone and lwIP is used to handle TCP/IP. Vivado 2017.3 and Xilinx SDK 2017.3 were used.
A sample host application is also built using Visual Studio Express. This application receives image data through ethernet and displays them on the PC screen.
Stereo Vision AlgorithmStereo vision algorithms were tested by running OpenCV on Windows PC. Actual images taken by prototype hardware were used as input data. Having tested several algorithms, the following parameters were chosen.
- Algorithm: block-matching
- Image size: 640 x 480
- Disparity search range: 128
- Window size: 21
- Speckle filter is omitted.
A stereo calibration process is also necessary which will place the left and right images to the same plane and make them horizontally aligned. For stereo calibration, a 7x5 chessboard pattern and Bouguet's algorithm are used.
Below is an example of image processing.
Hardware resource usage and processing time were roughly estimated at this stage. It was expected that this design would fit the target device.
Hardware ConfigurationSince major components were chosen, the hardware configuration was considered.
A pair of stereo images is fed to Zynq, image processings are made, and the resultant depth maps are sent to host PC through GbE.
Stereo rectification is performed by PL in real-time but necessary parameters are calculated by host PC using OpenCV functions. Those parameters are hard-coded in firmware.
Behavior Model in C LanguageBefore coding RTL, the behavior model was made by rewriting C code in OpenCV. More specifically, the following conversions were made.
- SIMD operations were replaced with simple loops.
- floating-point numbers were converted to fixed-point numbers by left shifting them before casting them to integers.
Conversions to fixed-point numbers generally result in performance degradation. The numbers of fractional bits were decided so that the differences in the final result were barely noticeable for a particular set of data.
The input and output of the main functions are stored in files. They can be used as input test data and output reference respectively in RTL simulation.
RTL Coding / SimulationAt this stage, the behavior model C code was converted to Verilog-HDL. This process was done carefully to maximize the effect of parallel processing of FPGA while minimizing the hardware resource consumption.
Shown below is the top module of RTL expressed as a block diagram.
Most modules are connected to the internal bus through which firmware can program their memory-mapped registers.
Image processing is made by the following functions in this order.
- Stereo rectification
- X-Sobel filter
- Block matching
For each stage, the input is read from DDR, and the output is written to DDR except the stereo rectification process whose input is fed directly from image sensors. The final result is also written to DDR which firmware can read.
In the RTL simulation, the output of each stage was compared with the output reference of the behavior model. RTL was modified until they were exactly matched.
Below is the top-level block design which is seen in Vivado.
Zynq PS and custom IP block are connected by two AXI Interfaces. The AXI interface for control is connected to the GP master AXI Interface while the AXI interface for DMA access is connected to the HP slave AXI interface. Interrupt signal is also connected to PS IRQ input.
PCB DesignA custom PCB was designed based on ZedBoard. Major modifications are as follows.
- Zynq is downgraded from XC7Z020 to XC7Z010.
- Dual image sensors are connected to PL. They are mounted on daughter boards so that they can be changed to other image sensors without changing the mainboard.
- GbE transceiver chip is replaced with KSZ901RNX because datasheet is not available for the device which is used in ZedBoard.
- Unnecessary components are removed.
EAGLE was chosen as a CAD tool. This tool was used both for a schematic editor and a PCB layout editor.
Eight layers were needed to route signals between FPGA and DDR. Minimum trace width and minimum drill diameter were 0.1mm and 0.25mm respectively. They were necessary to route signals between vias placed in a 0.8mm interval. This is the ball grid pitch of Zynq and DDR.
Eurocircuits was chosen as a PCB manufacturer. They can manufacture fine-pitch multi-layered PCB and they also have assembly service which is necessary for this project because this design has BGA components. Good point is that their design rule file is already included in EAGLE software.
The appearance of the new PCB is shown below.
Usual hardware test was performed for new PCB. There were several mistakes in schematics but I could fix them by pattern cuttings and jumpers. These errors are fixed in the attached board design files.
The depth maps could be acquired successfully by running this hardware. An example is shown below.
PL clock was initially set to 100MHz then reduced to 80MHz because there were persistent timing errors. The frame rate is set to 10 but occasional frame drops may happen.
[Performance Summary]
- Algorithm: block-matching
- Image size: 640 x 480
- Disparity search range: 128
- Window size: 21
- Frame rate: 10
- PL clock frequency: 80MHz
- Power consumption: 3W (typical)
[Hardware Resource Utilization]
- LUT 15023/17600 85%
- FF 16125/32500 46%
- BRAM 34.5/60 58%
- DSP 33/80 41%
CPU utilization is not measured. I guess there is a lot of computing resource left in PS side because currently only one ARM core is used and actual processing is done on PL side.
ConclusionIt is shown that a stereo vision system is achieved using the lowest grade Zynq SoC thanks to parallel computing of FPGA. As for the performance, we still need to verify the output accuracy with various data set.
My next step is to build practical applications such as visual odometry and SLAM based on this project. I want to see how much potential this tiny device has. I'm also working on porting this project to the Ultra96-V2 board.
RTL and other design files are attached. They are based on a custom hardware platform. If you try to use this design on other Zynq platforms, dual image sensors similar to OV5640 must be connected to the PL side of Zynq.
Comments