Published January 6, 2025 © GPL3+

Lights, Lens, and Logic

Everything you ever wanted to know about image sensors, and programmable logic applications

IntermediateFull instructions provided3 hours5,766

Things used in this project

Hardware components

Digilent Genesys 2

Adiuvo Engineering & Training Custom Project

Software apps and online services

AMD Vivado Design Suite

AMD Vitis Unified Software Platform

Story

Introduction

Throughout my 24+ years as an FPGA engineer, one application I have often developed is image processing. FPGAs are used across a range of image processing applications, from medical and scientific imaging, space imaging, automotive, and defense.

The algorithms running on these solutions at a higher level might vary, but at the lowest level, there is a common denominator: the need to interface with an image sensor or camera, process the received image stream, and format the video stream for output - either on a display or for further processing over a network.

We have looked at several image processing projects before. However, in this project we are going to look in detail at the different stages and elements involved when working with an image sensor.

We are going to start at the beginning: how an image sensor actually works.

How Does an Image Sensor Work?

Image sensors are amazing devices, as they enable us to not only see what is in the visible spectrum, but they also provide us with the ability to see outside the human visible range, such as an X-ray and infrared.

There are two main technologies in which image sensors are based:

Charge-Coupled-Device (CCD)
CMOS Image Sensors (CIS)

Both work by converting photons that are striking a semiconductor into a voltage.

Charge-Coupled-Devices - Forms the pixel array using potential wells, which form when they are struck by incident photons. During the integration time (the time an image is captured) the charge from the photons striking the pixel will accumulate, filling the well like water into a bucket. At the end of the integration time, the pixels are clocked out one at a time to convert the charge to voltage.

This operates like a shift register: timing signals shift storage through the pixel array. To speed up readouts, several output channels may be implemented. CCDs are analog, and the timing and voltage levels of the signals can affect data transfer and the overall quality of the image. An external ADC is typically used to convert the pixel voltage into a digital representation for further processing.

CCDs are less common today but remain in use for high-end imaging applications such as astronomy and space imaging due to their superior performance.

CMOS Imaging Sensor - A CIS forms a pixel array using photodiodes to convert photons into voltage at each pixel. This analog voltage is converted into a digital output directly on the chip.

This conversion allows for faster readouts than CCDs, although CIS often has worse noise performance. Most cameras today use CIS because they are easier to operate and integrate digitally.

Imaging Outside the Visible Spectrum

If we want to image outside the visible spectrum, we must select the appropriate device. Both CMOS and CCD sensors can capture X-ray to near-infrared (NIR) wavelengths.

As wavelengths decrease toward the infrared spectrum, electron energy also decreases, requiring more advanced semiconductors than silicon. Depending on the spectrum observed, typical devices include:

Charge-Coupled Device (CCD): X-ray to visible, extending to near-infrared.
CMOS Imaging Sensor (CIS): X-ray to visible, extending to near-infrared.
Uncooled IR: Microbolometers, typically operating in the mid-IR range.
Cooled IR: HgCdTe or InSb-based solutions requiring cooling.

Line or 2D Scan

When we look at a still image or frame of video, it is always in two dimensions. However, how that 2D image is created depends upon the application.

For example, if the target object is moving (e.g., on a production line), a single row of pixels can be used, and movement generates the 2D image. This approach is popular in production line inspection and orbital satellite imaging, where the movement of the orbit provides the motion needed to generate the image.

The more common alternative approach uses a 2D sensor, requiring no movement to capture a 2D image.

Sensor Performance

One key performance metric for an image sensor is Quantum Efficiency (QE). QE measures the ratio of incident photons striking the device to the photons detected in a pixel.

When an image sensor is fabricated, structures on the device can reduce QE in front-illuminated designs (i.e., photons strike the front of the sensor).

To achieve better QE, back-illuminated designs are used, reducing the impact of structures on photon detection. However, back illumination requires additional processing, which reduces yield and increases costs.

Rolling or Global Shutter

When working with 2D image sensors, we often need to make a decision as to the type of shutter we want on the image sensor. The two main type of shutters are:

Rolling Shutters: Each line is read out after its integration time, and the captured image can be corrupted by movement.
Global Shutters: Entire array is synchronized and read out as one.

I am sure we have all seen the video on the internet where helicopters are flying but the rotor does not appear to be moving (link here). This is due to the rolling shutter being synchronized with the rotor blades on a rolling shutter, resulting in the appearance of a stationary rotor blade

Color or Greyscale

Our perception of the world through our eyes is, for most of us, in color vision. However, so far we have only discussed pixels, the accumulation of charge, and the conversion of that charge into voltage and then into a digital format.

Photons of all wavelengths mix on a pixel and are converted into a voltage that represents the image. If we process this information as-is in our image processing application, the result will be a greyscale image.

Greyscale images are used in many applications because they provide luminance information, which is vital for analyzing brightness, contrast, edges, shapes, contours, texture, perspective, and shadows—without requiring color data.

Greyscale operations are also computationally efficient because only a single channel of data needs to be processed. Moreover, it is easy to convert greyscale images into binary images using thresholding, which enables morphological operations.

To obtain a color image, a specific optical filter is applied directly to the sensor. This filter is known as a Bayer mask, and it covers each pixel, allowing only one wavelength of light (red, green, or blue) to pass through.

Each pixel captures only red, green, or blue photons. These filters are typically arranged in a 2x2 grid comprising one red pixel, one blue pixel, and two green pixels. This arrangement emphasizes green because it lies in the middle of the visible spectrum, and human eyes are more sensitive to green light.

Debayering Process

The Bayer mask requires post-processing to reconstruct a full-color image. In an FPGA, we can process the pixel stream to debayer the raw data, converting each pixel into an RGB value using the 2x2 grid.

This process involves interpolation between the neighboring pixels in the grid. While effective, it can lead to a small loss of image resolution because of the interpolations required to fill in missing color information.

Color Space

If we decide to work with RGB images, we also need to consider the color space. Typically, we begin in the RGB color space. Assuming 8 bits per color channel (R, G, B), each pixel will require 24 bits.

Within an FPGA, this is generally not a problem, as arbitrary bus sizes can be easily implemented. However, storing this data in memory (e.g., DDR3 or DDR4) is not very efficient due to the 24-bit format.

To improve memory efficiency, we can use a more compact color space, such as YUV, which separates luminance (Y) and chrominance (U and V) channels. In the YUV color space, it is possible to share the U and V channels between two pixels, reducing the storage requirement to 16 bits per pixel, which is much more efficient.

Additionally, a narrower bus width simplifies routing within the FPGA, making implementation easier and potentially more cost-effective.

Interfacing with Sensors and Cameras

Now that we understand how an image sensors work, we can explore how sensors are interfaced with FPGAs.

There are two main approaches:

Sensor Integrated in a Camera: The sensor is embedded in a camera that performs most of the interfacing, and outputs an image for further processing.
Direct Sensor Interfacing: The FPGA directly interfaces with the sensor, requiring additional control and signal processing.

Regardless of the method, FPGA I/O is versatile and capable of interfacing with both cameras and sensors.

Let’s take a look at some commonly used interface standards:

HDMI (High-Definition Multimedia Interface) - HDMI is commonly used with cameras, especially compact action cameras. FPGAs can interface with HDMI directly using the Transmission-Minimized Differential Signaling (TMDS) standard. This is supported by the AMD 7 Series FPGAs, UltraScale™ devices, and UltraScale+™ devices.

For higher-resolution images [BB1] that may exceed the performance of the high-definition (HD) banks, Gigabit transceivers can be employed. HDMI transmits video data over three differential channels (for red, green, and blue) and an additional channel for the clock signal.

SDI (Serial Digital Interface) Video - SDI is a professional standard for transmitting uncompressed digital video, audio, and metadata over coaxial cables with BNC connectors or fiber optics.

Supported Resolutions: Ranges from standard definition (SD-SDI) to ultra-high-definition (12G-SDI).
Applications: Ideal for broadcast and live production due to its high-quality, low-latency performance and support for long cable runs.
FPGA Support: When interfaced with AMD FPGAs or SoCs, SDI uses Gigabit transceivers.

SDI video signals are processed through dedicated hardware IP cores that provide:

Support for various SDI standards.
Features like video scaling, color space conversion, and multiplexing.

These cores enable robust and flexible integration into professional video workflows.

Camera Link - Camera Link uses several LVDS (Low-Voltage Differential Signaling) channels to transmit the data from a camera to a frame grabber (which in the case of the camera link standard, is our FPGA). Camera Link uses four LVDS pairs for the data and a fifth for the clock.

[BB1]Adam, did you mean “images” or “imagers” here?

Parallel / Serial - Many cameras or sensors provide an output which is either parallel or serialized in a manner which it can be de-serialized to recreate pixel data and associated frames, and line valid. This can be implemented using LVDS / SLVS (Scalable Low-Voltage Signaling) etc. If serialized, the IO structures provided within the FPGA IO can be used to synchronize and decode the data stream correctly.

MIPI(Mobile Industry Processor Interface) - MIPI is one of the most widely used sensor interfaces. It is a high-bandwidth, point-to-point protocol designed for transferring image sensors or display data over multiple differential serial lanes.

Protocol Layers: MIPI operates across various OSI model layers, with the lowest being the DPHY layer.

DPHY defines the number of lanes, clocking, and the transition between differential signaling (SLVS) and single-ended signaling (LVCMOS).
This combination supports high-bandwidth data transfer for protocols such as CSI-2 (Camera Serial Interface) and DSI (Display Serial Interface).
Low-speed communication allows for efficient transfer of control information at lower power levels.

Performance:

Each MIPI DPHY link can support 1 to 4 high-speed serial lanes operating at up to 2.5 Gbps per lane or 10 Gbps across all four lanes.
Data transfer occurs at double the data rate, synchronous to the clock lane.

FPGA Support:

AMD UltraScale+ devices and Versal™ adaptive SoCsnatively support MIPI DPHY.
For AMD 7 Series FPGAs and UltraScale devices, MIPI DPHY can be implemented with an external resistor network or custom DPHY circuitry.

Leveraging Programmable Logic and IP Libraries

The core idea within programmable logic is to leverage as much existing IP (Intellectual Property) as possible from the Vivado™ Design Suite and Vitis™ platform IP libraries. These libraries provide a rich set of pre-designed components, enabling efficient implementation of complex functionality.

AXI Stream for Video Transfer

Most interfaces in video processing pipelines use the AXI Stream protocol to transfer the video stream between modules. At its core, AXI Stream operates with the following primary signals:

TData: Carries the data payload from the master to the slave.
TValid: Indicates that valid data is available on the TData bus.
TReady: Signals from the slave that it is ready to accept the data.

Markers for Video Data

For video streams, additional markers are required to indicate the start of a frame and the end of a line so that a complete 2D image can be constructed and processed.

To achieve this, AXI Stream introduces:

TUse Signal: Indicates the start of a new frame.
TLast Having understood the concepts of AXI Stream and its use for communicating image processing streams, we can now explore ways to exploit the parallel nature of FPGAs. One effective method is to include multiple pixels in a single AXI Stream data stream.

By transferring multiple pixels per clock cycle, the throughput of the image processing pipeline can be significantly increased.Signal: Marks the end of a line within the video stream.

These markers ensure proper synchronization and reconstruction of the video data, making the AXI Stream protocol well-suited for handling 2D images and video processing in FPGAs.

Exploiting FPGA Parallelism in AXI Stream

Having understood the concepts of AXI Stream and its use for communicating image processing streams, we can now explore ways to exploit the parallel nature of FPGAs. One effective method is to include multiple pixels in a single AXI Stream data stream.

By transferring multiple pixels per clock cycle, the throughput of the image processing pipeline can be significantly increased.

Pixel Parallelism in AXI Stream

Typically, AXI Stream can be configured to transfer 1, 2, or 4 pixels per clock cycle, depending on the application and system requirements. For instance, when 4 pixels are output per clock cycle, the overall data rate and processing efficiency are greatly improved, as demonstrated in the example below.

This parallelism not only enhances the performance of the FPGA-based image processing pipeline, but also ensures that high-resolution and high-frame-rate video streams are handled seamlessly.

Architectures for FPGA Image Processing Pipelines

When implementing image processing pipelines in programmable logic, there are two primary architectures to consider:

1. Direct Architecture

In a direct architecture, the input is connected directly to the processing stage and the output, with minimal buffering and no frame buffering.

Advantages: This approach provides the lowest latency between the input and output, making it ideal for applications where latency is critical, such as in autonomous vehicles or real-time video analysis.

Limitations: Since there is no frame buffering, this architecture is less flexible for tasks requiring temporal data storage or synchronization.

2. Frame-Buffered Architecture

A frame-buffered architecture leverages memory to buffer one or more frames.

Advantages:

This approach is used when:

The image needs to be made available to processors in an associated processing system.
There is a need to modify the output timing of the video stream (e.g., for synchronization or compatibility with other components).

Use Cases: Frame-buffered architectures are common in applications where flexibility and timing adjustments outweigh latency concerns.

Pipeline Configuration

Regardless of the image processing architecture you choose—whether direct or frame-buffered—the IP cores used in the design require proper configuration via AXI Lite.

AXI Lite Configuration:

Sets the image size (width and height).
Enables the IP core functionality.
Controls the core's processing algorithms.

Configuration Process

For configuring these IP cores, the following options are typically used:

AMD Versal adaptive SoCs, AMD Zynq™ MPSoCs or Zynq processing cores:

If the design involves an Arm® processor-based architecture, the processing cores can handle configuration tasks efficiently.

AMD MicroBlaze™ V Processor:

For pure FPGA designs without an embedded processor, the AMD MicroBlaze V processor is the recommended choice. This soft processor core is well-suited for managing AXI Lite configurations and other control tasks.

Example: Direct Method Image Processing Pipeline

For this application, we will create an example image processing pipeline implemented using the direct method.

This means there will be no frame buffering from input to output, ensuring minimum latency from the input frame to the output frame. To achieve this, we must minimize buffering throughout the pipeline.

Target Device

The target device for this design is an AMD Kintex™ 7 FPGA, specifically using the Digilent Genesys 2 development board, which features:

HDMI Input and Output interfaces: Ideal for capturing images from a sports camera or test equipment and displaying them on a screen.

The design will utilize Vivado and can be divided into two key sections:

Image processing pipeline
Control and configuration using AMD MicroBlaze V

Pipeline Design

The pipeline will:

Receive data over HDMI: Converting it from a parallel video format with vertical and horizontal sync signals, to an AXI Stream.
Convert the video stream to AXI Stream: AXI Stream is the standard interface used by most image processing blocks.
Output data via AXI Stream to video out: This generates parallel video under the control of a Video Timing Generator.

Control Using the AMD MicroBlaze V Processor

The pipeline and the associated Video Timing Generator will be controlled by the AMD MicroBlaze V processor, which is based on the RISC-V Instruction Set Architecture.

Unlike previous examples that used VDMA (Video Direct Memory Access), this application will not use VDMA to ensure the lowest latency between input and output.
The Digilent Genesys 2 board with 1GB of DDR3 is selected to support optional image buffering if needed by certain algorithms.

AMD Vivado Design Suite Components

The image processing pipeline will use the following IP cores:

DVI2RGB: Digilent IP core for converting DVI to RGB format.
Video In to AXI Stream: Vivado Design Suite IP block for converting RGB video to AXI Stream format.
AXI Stream to Video Out: Vivado Design Suite IP block for converting AXI Stream back to RGB format.
Video Timing Controller: Configured to detect incoming timing and generate output timing. This configuration will also support future VDMA applications if required.
AXI Stream FIFO: Configured in packet mode to buffer a line before passing it through.
AXIS Register Slices: Added within the pipeline to aid with timing closure.

AMD MicroBlaze V Processor Subsystem

The AMD MicroBlaze V processor controller subsystem is configured as a microcontroller. This configuration enables both AXI peripheral data and instruction interfaces, connected via an AXI Interconnect to:

UartLite: Vivado Design Suite IP block for UART console communication.
AXI GPIO: Monitors display and camera hot plug detect signals.
MIG 7 Series: Vivado IP block for interfacing with the DDR3 memory on the Digilent Genesys 2 board.
Processor Reset Block: Manages system resets.
AXI Interrupt Controller: Handles processor interrupts.
MicroBlaze V Processor Debug Module: Enables debugging using the Vitis platform.

Clocking Configuration

The MIG (Memory Interface Generator) is provided with the board differential clock running at 200 MHz.
The MIG generates a UI clock at 100 MHz, reflecting the 4:1 clocking scheme used.
An additional UI clock at 200 MHz is generated as a reference clock for the RGB2DVI module.

For this design, the output is configured for 720P resolution, as provided by the sports camera. The AXI Stream clock will run at 150 MHz, which is twice the pixel clock frequency.

The device utilization can be seen below:

The software running on the AMD MicroBlaze V processor developed in AMD Vitis platform.

#include <stdio.h>
#include "platform.h"
#include "xil_printf.h"
#include "xvtc.h"
#include "xgpio.h"
#include "vga.h"
#include "xparameters.h"

XVtc	VtcInst;
XVtc_Config *vtc_config ;
XGpio hpd_in;
XVtc_SourceSelect SourceSelect;

int main()
{
    u16 result;
    VideoMode video;
    XVtc_Timing vtcTiming;

    init_platform();

    printf("Setting up VTC\n\r");

    vtc_config = XVtc_LookupConfig(XPAR_XVTC_0_BASEADDR);
    XVtc_CfgInitialize(&VtcInst, vtc_config, vtc_config->BaseAddress);


     //configure and assert the HPD
    XGpio_Initialize(&hpd_in, XPAR_XGPIO_0_BASEADDR);
    XGpio_DiscreteWrite(&hpd_in,1,0x1);
    sleep(20);
    XGpio_DiscreteWrite(&hpd_in,2,0x1); ///needs time here


    video = VMODE_1280x720;
	vtcTiming.HActiveVideo = video.width;	/**< Horizontal Active Video Size */
	vtcTiming.HFrontPorch = video.hps - video.width;	/**< Horizontal Front Porch Size */
	vtcTiming.HSyncWidth = video.hpe - video.hps;		/**< Horizontal Sync Width */
	vtcTiming.HBackPorch = video.hmax - video.hpe + 1;		/**< Horizontal Back Porch Size */
	vtcTiming.HSyncPolarity = video.hpol;	/**< Horizontal Sync Polarity */
	vtcTiming.VActiveVideo = video.height;	/**< Vertical Active Video Size */
	vtcTiming.V0FrontPorch = video.vps - video.height;	/**< Vertical Front Porch Size */
	vtcTiming.V0SyncWidth = video.vpe - video.vps;	/**< Vertical Sync Width */
	vtcTiming.V0BackPorch = video.vmax - video.vpe + 1;;	/**< Horizontal Back Porch Size */
	vtcTiming.V1FrontPorch = video.vps - video.height;	/**< Vertical Front Porch Size */
	vtcTiming.V1SyncWidth = video.vpe - video.vps;	/**< Vertical Sync Width */
	vtcTiming.V1BackPorch = video.vmax - video.vpe + 1;;	/**< Horizontal Back Porch Size */
	vtcTiming.VSyncPolarity = video.vpol;	/**< Vertical Sync Polarity */
	vtcTiming.Interlaced = 0;

    memset((void *)&SourceSelect, 0, sizeof(SourceSelect));
	SourceSelect.VBlankPolSrc = 1;
	SourceSelect.VSyncPolSrc = 1;
	SourceSelect.HBlankPolSrc = 1;
	SourceSelect.HSyncPolSrc = 1;
	SourceSelect.ActiveVideoPolSrc = 1;
	SourceSelect.ActiveChromaPolSrc= 1;
	SourceSelect.VChromaSrc = 1;
	SourceSelect.VActiveSrc = 1;
	SourceSelect.VBackPorchSrc = 1;
	SourceSelect.VSyncSrc = 1;
	SourceSelect.VFrontPorchSrc = 1;
	SourceSelect.VTotalSrc = 1;
	SourceSelect.HActiveSrc = 1;
	SourceSelect.HBackPorchSrc = 1;
	SourceSelect.HSyncSrc = 1;
	SourceSelect.HFrontPorchSrc = 1;
	SourceSelect.HTotalSrc = 1;
	XVtc_RegUpdateEnable(&VtcInst);
	XVtc_SetGeneratorTiming(&VtcInst, &vtcTiming);
	XVtc_SetSource(&VtcInst, &SourceSelect);
	XVtc_EnableGenerator(&VtcInst);
	XVtc_Enable(&VtcInst);

	XVtc_EnableDetector(&VtcInst);
	XVtc_Enable(&VtcInst);
	xil_printf("Video Mode = %i ", result);
	xil_printf("\n\r");


    printf("VTC Set Up\n\r");
    cleanup_platform();
    return 0;
}

Other Imaging Projects

Over the years, I have created several projects which look at different elements of the image processing.

Image processing on the Zybo: A foundational project showcasing basic image processing on the Zybo platform.
Sobel Image processing on the Zybo: Demonstrating edge detection using the Sobel filter.
AMD Artix™ 7 FPGA Test Pattern Generation: Generating and testing patterns for validation and calibration.
AMD Artix™ 7 FPGA Image processing platform: A complete platform for developing and testing image processing pipelines.
Auto White Balancing with the AMD Artix™ 7 FPGA Image processing platform: Implementing automatic white balancing techniques for enhanced image quality.
Image processing with AMD PYNQ Framework on the PynqZU: Leveraging the PynqZU for flexible, Python-based image processing.
Dual Camera Processing with AMD Spartan™ 7 FPGAs: Synchronizing and processing streams from two cameras.
Image Processing with AMD PYNQ Framework on the Snickerdoodle: Exploring image processing on this compact FPGA platform.
High Performance Imaging on the Genesys ZU: Pushing the limits of image processing capabilities on the Genesys ZU board.
Vision Outside the Visible: Implementing imaging solutions beyond the visible spectrum, such as infrared and X-ray imaging.

Wrap-Up

Hopefully this document has provided a deeper understanding of image processing systems and their implementation using AMD FPGAs and SoCs.

These projects and examples serve as valuable guides to help you build and customize your own image processing systems, showcasing the power and flexibility of FPGAs in handling complex image processing tasks.

AMD, and the AMD Arrow logo, Artix, Kintex, MicroBlaze, Spartan, UltraScale, UltraScale+, Versal, Vitis, Vivado, Zynq and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.