Published June 23, 2021 © Apache-2.0

USB Webcam H264 Decode with the KV260 Vision AI Starter Kit

Learn how to use a 1080p H264 USB webcam to achieve high frame rate inference on the Xilinx Kria KV260 Vision AI starter kit.

BeginnerProtip1 hour4,013

USB Webcam H264 Decode with the KV260 Vision AI Starter Kit

Things used in this project

Hardware components

AMD Kria KV260 Vision AI Starter Kit

Logitech HD Pro C920 Webcam

Story

In the world of USB webcams not all are created equal. The popular Logitech BRIO USB webcam features full HD (1080p30) resolution for raw YUYV or NV12 data. However, some less expensive webcams, such as the Logictech HD Pro C920, only offer 1080p30 resolution and frame rate with encoding. This project will look at using the KV260 Vision AI Starter Kit from Xilinx to decode and process 1080p30 H264 data using the smart camera app from the Kria app store.

The KV260 Smart Camera app is one of the applications that is available in the Kria app store. The application comes with prebuilt examples to perform face or pedestrian detection (see Mario's great write up - Introducing Xilinx Kria™ for Vision AI Applications for more information).

When running the application with a USB webcam as the video source, the highest frame rate supported by the USB camera for raw pixel formats (i.e. YUV or BGR/RGB) is automatically selected. However, this is not desirable in the case of the C920 webcam since the highest frame rate supported for 1080p raw data is 5 FPS.

This is demonstrated by running the application with the C920 webcam. The image below shows the terminal output with Logitech C920 (notice 5 FPS):

Face Detection Using Smart Camera App with Logitech C920 Webcam

Running the face detection application with the Logitech BRIO webcam shows 30 FPS at 1080p resolution for raw data. The image below shows the terminal output with Logitech BRIO (notice 30 FPS):

Face Detection Using Smart Camera App with Logitech BRIO Webcam

The specifications for the C920 certainly indicate that it is capable of 1080p30. Performing a dump of the camera capabilities using the v4l2-ctl command will provide valuable insight. The figure below shows the available formats for 1080p data with the C920.

xilinx-k26-starterkit-2020_2:~$ v4l2-ctl -d /dev/video0 --list-formats-ext 
ioctl: VIDIOC_ENUM_FMT
   	Type: Video Capture
   
   	[0]: 'YUYV' (YUYV 4:2:2)
                ...
   		Size: Discrete 1920x1080
   			Interval: Discrete 0.200s (5.000 fps)
                ...
   	[1]: 'H264' (H.264, compressed)
                ...
   		Size: Discrete 1920x1080
   			Interval: Discrete 0.033s (30.000 fps)
   			Interval: Discrete 0.042s (24.000 fps)
   			Interval: Discrete 0.050s (20.000 fps)
   			Interval: Discrete 0.067s (15.000 fps)
   			Interval: Discrete 0.100s (10.000 fps)
   			Interval: Discrete 0.133s (7.500 fps)
   			Interval: Discrete 0.200s (5.000 fps)
   	[2]: 'MJPG' (Motion-JPEG, compressed)
                ...
   		Size: Discrete 1920x1080
   			Interval: Discrete 0.033s (30.000 fps)
   			Interval: Discrete 0.042s (24.000 fps)
   			Interval: Discrete 0.050s (20.000 fps)
   			Interval: Discrete 0.067s (15.000 fps)
   			Interval: Discrete 0.100s (10.000 fps)
   			Interval: Discrete 0.133s (7.500 fps)
   			Interval: Discrete 0.200s (5.000 fps)

Looking at the figure above we see that 30 FPS at 1080p is only possible with compressed formats. So, how to do we actually achieve 30 FPS with the C920 webcam?

The solution: forgo using the smartcam binary and create our own custom GStreamer pipeline. The Smart Camera application is based on the IVAS framework from Xilinx as well as the GStreamer framework. The IVAS framework is essentially a set of GStreamer plugins created by Xilinx to facilitate smart vision and machine learning tasks. The IVAS framework is particularly useful for interfacing with hardware accelerators located in programmable logic fabric of the Xilinx device. If we chose not to use the IVAS framework we would need to write our own OpenCL based code to interface with accelerator(s).

So, how do we use IVAS? It's as simple as creating a GStreamer pipeline, but before we launch the GStreamer pipeline we need to make sure the kv260-smartcam app is loaded using the xmutil utility. Once the kv260-smartcam app is loaded we can launch the GStreamer pipeline to process 1080p30 H264 data from the USB webcam using the following commands:

export MODEL=facedetect

gst-launch-1.0 v4l2src device=/dev/video0 \
! video/x-h264, width=1920, height=1080, framerate=30/1 \
! h264parse \
! omxh264dec internal-entropy-buffers=3 \
! queue leaky=2 name=vcu_buffer_0 \
! tee name=t \
t.src_0 \
! queue \
! ivas_xmultisrc kconfig="/opt/xilinx/share/ivas/smartcam/$MODEL/preprocess.json" \
! queue \
! ivas_xfilter kernels-config="/opt/xilinx/share/ivas/smartcam/$MODEL/aiinference.json" \
! queue \
! ima.sink_master ivas_xmetaaffixer name=ima ima.src_master \
! queue \
! fakesink \
t.src_1 \
! queue \
! ima.sink_slave_0 ima.src_slave_0 \
! queue \
! ivas_xfilter kernels-config="/opt/xilinx/share/ivas/smartcam/$MODEL/drawresult.json" \
! video/x-raw, width=1920, height=1080, format=NV12 \
! queue \
! fpsdisplaysink text-overlay=false video-sink="kmssink driver-name=xlnx plane-id=39 fullscreen-overlay=true" sync=false \
-v

Note 1: the USB device is enumerated by the KV260 as /dev/video0 - modify accordingly if you have multiple video capture devices attached to your hardware.

Note 2: to run pedestrian detection instead, modify the MODEL environment variable shown above to be equal to refinedet.

The terminal output indicates that we are processing at 30 FPS using the C920 webcam.

I hope you enjoyed this quick tutorial on using the KV260 with an H264 encoded webcam feed.

Update 6/24/2021: Newer versions of the C920 webcam may no longer support H264 encoding per the following article - https://www.logitech.com/en-us/video-collaboration/resources/think-tank/articles/article-logitech-and-h264-encoding.html