In this project, I will showcase Deep Eye, a rapid prototyping platform for NVIDIA DeepStream based Video Analytics applications.
The project consists in 3 maincomponents:
- a Hardware Platform that can be used with the Jeston Nano
- DeepLib, an easy to use Python Library, which allows creating DeepStream based video processing pipelines in an easy way
- a Web IDE that allows the easy creation of DeepStream based applications
The Hardware Platform allows building video analytics projects with the Jetson Nano. It supports up to 2 MIPI CSI cameras, which are mounted on a rotating platform:
The rotating platform allows both horizontal and vertical movements of two cameras. The movement of the cameras can be controlled by a simple library.
DeepLib is a Python library based on the recently released DeepStream Python Bindings library. It allows the easy creation of DeepStream based video processing pipelines.
Complex DeepStream(and GStreamer) video processing pipelines can get quite complicated, both in terms of structure and the source code them. DeepLib tries to solve this issues by grouping pipeline components, in bigger and much easier to understand logical blocks:
The pipelines are build using a fluent API, allowing pipelines like the above to be built and ran as simple as this:
A number of different input and outputelements are supported. Processingelements like the NVInfer
and NVTracker
, allows to implement object detection, tracking and classification easily, and to create powerful applications quickly and easily.
The Web IDE is an interface allows building DeepStream / DeepLib pipelines using a graphical interface:
When done, the pipeline can be exported to the JSON configuration file, which then can directly used by DeepLib to build the DeepStream pipeline.
The IDE also has a View Preview component, which allows previewing network based video streams like WebRTC and TCP.
The Web IDE can be tried on this Github.io page :)
The source code for DeepLib, and other components and as well the CAD files can be found in the GitHub repository of the project
Hardware PlatformAlong the NVIDIA's Jetson product family, the Jetson Nano is the most accessible one with its $99 price tag.
Although the Jetson Nano is the lowest powered one of the Jetson modules, with its 128 CUDA cores it's still powerful enough to be used for Computer Vision, Artificial Intelligence and Intelligent Video Analytics applications.
Here are some of the relevant specs for the Jetson Nano Development Kit / Module:
- MIPI CSI Camera Inputs: up to 4 x cameras(up to 1 cameras with the original Jetson Nano Developer Kit, up to 2 cameras with the rev B01 Develop Kit, up to 4 cameras with Jetson Nano Module + custom carrier board)
- USB VideoInputs: up to to 4 x cameras
- Video Decoding: 4K @ 30 fps, 4 x 1080p @ 30 fps
- Video Encoding: 4K @ 60 fps, 8 x 1080p @ 30 fps
- Video Outputs: HDMI and DisplayPort
- Power over Ethernet (PoE) support (with custom powering circuit)
One of the cool things about the Jetson Nano can be used with relatively easily available Raspberry Pi cameras(V2 cams are supported out of the box, V1 camesneeds manually built kernel modules).
Having the camera(s) hanging just hanging on their cables can be a little bit annoying, so I thought it would be fun to build a platform for the cameras:
The prototype was designed in FreeCAD(a open source parametric modeler software), and it can be built from the following parts:
- a set of 3D printed parts
- 2 x 5 gr servo motors + servo driver
- 1 x 5V stepper motor + stepper driver
The final prototype consist in 3 parts:
- a Camera Mount for Raspberry Pi cameras
- a Rotating Platform
- and a Base Stand
The parts were 3D Printed:
and assembled:
The movement of the cameras can be controlled using a small Python library, (located in the CameraMove folder in the linked GitHub repo)
The NVIDIA DeepStream SDK is NVIDIA's toolkit for streaming analytics applications like AI based video and image analysis, and other applications:
It is build over GStreamer, the popular open-source streaming multimedia framework.
DeepSteam add a series of useful GStreamer plugins for different use-cases:
For example it allows implementing hardware accelerated inference and detection using TensorRT.
> Installing on Jetson NanoInstalling DeepStream on the Jetson Nano can be a little bit tricky. To install it we can follow one of the methods the DeepStream Development Guide.
The method that worked for me was the one using the the NVDIA SDK Manager:
After we select packages we want, the SDK Manager will automatically download the necessary components and re-flash an SD card image of the Jetson Nano.
If we did everything correctly, we should end up with a NVIDIA L4T install (the JetPack OS) with DeepStream SDK4.0 installed in /opt/nvidia/deepstream/deepstream-4.0
The DeepStream SDK comes with an example application that can be launched in multiple configuration.
We can launch the example application as follows:
# boost clocks
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
$ cd /opt/nvidia/deepstream/deepstream-4.0
$ deepstream-app -c samples/configs/deepstream-app/source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt
This will an object tracking application run on 8 video streams in parallel. The result should look like:
A couple of mounts ago, NVIDIA released the DeepStream Python Binding, adding support for writing DeepStream applications in Python. The library is based on the GStreamer Python bindings library, and thus exposes almost the full feature set of DeepStream, combined with the easy of use Python.
The library and and example application are available on GitHub, in the DeepStream Python Apps repository.
To install it we need to follow the HOWTO.md guide. The steps are something like these:
- download and build the GStreamer Pyhton bindings
$ sudo apt-get install python-gi-dev
$ export GST_LIBS="-lgstreamer-1.0 -lgobject-2.0 -lglib-2.0"
$ export GST_CFLAGS="-pthread -I/usr/include/gstreamer-1.0 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include"
$ git clone https://github.com/GStreamer/gst-python.git
$ cd gst-python
$ git checkout 1a8f48a
$ ./autogen.sh PYTHON=python3
$ ./configure PYTHON=python3
$ make
$ sudo make install
- download the latest release from NVIDIA-AI-IOT/deepstream_python_apps/releases
- extract it over the DeepStream 4.0 installation
$ tar xf ~/Downloads/ds_pybind_0.5.tbz2 -C /opt/nvidia/deepstream/deepstream-4.0/sources
$ tar xf /opt/nvidia/deepstream/deepstream-4.0/sources/deepstream_python_v0.5/ds_pybind_0.5.tbz2 -C /opt/nvidia/deepstream/deepstream-4.0/sources
At this we should have a set of example applications in /opt/nvidia/deepstream/deepstream-4.0/sources/python/apps/,
each with it own README:
- the 1st application demonstrates an object (car, people, etc.) detection pipeline
$ cd deepstream-test1
$ python3 deepstream_test_1.py sample_720p.h264
- the 2nd example adds a set of secondary detectors, so we get more information about each object
- the 3rd example apps show how use URI sources
- the 4th example app add Azure IoT, Kafka support over the previous examples
Thanks to a recent commit the apps are also available as Jupyter notebooks:
Complex DeepStream(and GStreamer) video processing pipelines can get quite complicated, both in terms of structure and the source code them.
Certain tasks, like reading H264 encoded video file, require the use of multiple pipeline elements linked together. Sometimes, a given set of parameters are required for this elements to be able to linked together.
DeepLib is a Python library I created which tries to solve this problem. It is build over the recently released DeepStream Python Bindings library, and it allows the easy creation of DeepStream based video processing pipelines.
The library solves the above issues by grouping pipeline components, in bigger and much easier to understand logical blocks.
For example the following diagram shows how a pipeline that does object detection on a video input, look in plain DeepStream and DeepLib:
DeepLib has a fluent API, which allows building video processing pipelines with just a couple of lines of code. Building and running the above pipeline in DeepLib is as easy as this:
# initialize DeepLib
DeepLib.init()
# build pipeline
pipeline = DeepLib() \
.pipeline() \
.withFileSource("sample-720p.h264") \
.withNVInfer("dstest1_pgie_config.txt") \
.withNVOsd() \
.withEGLOutput() \
.build()
# run on main thread
DeepLib.runOnMain(pipeline)
A DeepLibpipeline is built up from multiple pipeline elements. Each DeepLib pipeline element implements a specific functionality, and is built-up from one ore more DeepLib pipeline elements.
There are 3 types of pipeline elements:
- inputelements - representing different types of video inputs, like file, MIPI camera, USB camera, RTSP, etc.
- outputelements - representing different types of video outputs like, EGL, RTSP, TCP, WebRTC
- processingelements - for processing tasks like object detection, object tracking, queuing, multiplexing, etc.
In terms of input elements, there are 3 implemented right now:
- File Input - load video from a file, using hardware accelerated decoding
- MIPI CSI Camera input - stream video from a MIPI CSI camera like the RaspyCam
- USB Camera Input - stream video from an USB camera
The File Input element is able to stream frames from a video file. The input file is first acceded using a filesrc
element, then media container is parsed with an element like h264parse
. The next step is to use a nvv4l2decoder
to do the hardware accelerated decoding. Next the stream is passed through an nvstreammux
which implements functions like scaling and batching.
The MIPI Camera Input element can be used to stream video from a CSI camera, like the RaspiCam V2. An nvarguscamerasrc
element is used to access the raw (NV12) video stream from the camera. A capsfilter
is used to set parameters like the desired resolution and frame rate.
The USB Camera Input element can be used with USB cameras / webcams, like the Logitech C920. The v4l2src
element is used is used to access the video stream of the camera, either as raw stream of as an encoded stream. Then, a h264parse
and a nvv4l2decoder
is optionally used to do the decoding.
On the output side, we have the following elements:
- EGL Output - show the output on display directly connected to the device, using hardware accelerated rendering
- RTSP Output - stream the output over the network using the RTSP standard
- TCP Output - stream the output over the network over TCP
- WebRTC Output(work in progress) - stream video to a Web Browser using the WebRTC standard
To show the video output on the screen the EGL Output element can be used. Internally we use the nveglglessink
element to do the displaying. On ARM based devices like the Jetson Nano, an additional nvegltransform
is needed.
In the following example two video pipelines are used show the image of a CSI and a USB camera. To access the video streams of the two cameras, a MIPI Camera Input and a USB Camera Input element were used. Two EGL outputs are used show the image on a display connected to the Jetson Nano over HDMI.
For real world application a video output on a display directly connected to the end device may not be possible. In these case we stream the video output over the network, using a standard protocol.
The RTSP Output implements video streaming over the standard RTSP protocol. GStreamer has supports RTSP by default, but integrating it to a pipeline is not trivial. To do it, we need to stream a H264 encoded and packed RTP stream, over an UDP on a local address (127.0.0.1). The next step is to instantiate a RTSP. When a RTSP client connects, a pipeline consisting in a UDP source is created on-the-fly. The data received by the newly created pipeline is then streamed to the RTSP client.
The RTSP Output element implements all this functionally with support for one or more outputs. Using it is as easy as calling .withRtspOutput("/myPath")
.
The TCP Output can be used to stream video over TCP, encoded (H264) and packed is some media container. The stream can be viewed in video players (ex. VLC) and some browsers (ex. Firefox).
The WebRTC Output will provide video streaming capabilities over the WebRTC protocol. This part is still work in progress. I have a working prototype, but the code is not yet on GitHub. Using it required building some GStreamer bad / ugly plugins from source, and this may cause some compatibility issues
> Processing Elements - NV Infer, NV Tracker and othersAs we covered the inputs and output, we can go on with the interesting part: video processing.
Right now the following elements are implemented:
- NV Infer - implements hardware accelerated inference
- NV Tracker - implements hardware accelerated object tracking
- NV OSD - add overlays over a video stream
- Multiplexer - implements automatic splitting outputs connected to multiple elements
The NV Infer element can be used to implement primary and secondaryinference, useful for task like object detection and classification. Internally an nvinfer
element is used with an user provided configuration file.
The NV Tracker element implements object tracking using an nvtracker
element. Like NV Infer, this element is initialized with a standard configuration.
NV OSD is an element the takes the output (video and metadata) from the about two elements, and displays a visualization over the original video stream.
The following screenshot shows an output of the DeepStream example 2 implemented using DeepLib:
pipeline = DeepLib() \
.pipeline() \
.withFileInput(path) \
.withNVInfer("dstest2_pgie_config.txt") \
.withNVTracker("dstest2_tracker_config.txt") \
.withNVInfer("dstest2_sgie1_config.txt") \
.withNVInfer("dstest2_sgie2_config.txt") \
.withNVInfer("dstest2_sgie3_config.txt") \
.withNVOsd() \
.withRtspOutput() \
.build()
The example uses a primary detector, an object trackers and three secondary detectors. The output is streamed over RTSP, and shown in a VLC player instance:
Multiplexer is an element used internally to implement automatic multiplexing when an output pad is connected to multiple input pad.
> Pipeline Builder & JSON Pipeline ConfigWhen using DeepLib, the video pipelines can be built in two ways:
- using the Pipeline Builder, a fluent API the allows building pipelines with just a couple of lines of code:
# build pipeline
pipeline = DeepLib() \
.pipeline() \
.withFileSource("sample-720p.h264") \
.withNVInfer("dstest1_pgie_config.txt") \
.withNVOsd() \
.withEGLOutput() \
.build()
- using the JSON Pipeline Config Loader - by calling
DeepLib.pipelineFromJsonConfig(filePath)
, a pipeline can be build from a JSON file like a above (it is also used by the Web IDE presented bellow)
"elements" : [
{
"id": "1",
"type": "file-input",
"properties" : {
"path": "/tmp/sample_720p.h264"
}
},
{
"id": "2",
"type": "nv-infer",
"properties" : {
"path": "dstest1_pgie_config.txt"
},
"links": {
"in" : "1/out"
}
},
...
Web IDEWhen experimenting with DeepStream and Gstreamer, I always found a little bit hard to get a good overview of more complex video pipeline.
So, I thought it would be fun to create a visual interface that allows easily creating and editing video pipelines.
Bellow is the DeepEye Web IDE, a simple editor that can be used to create and editDeepLib based video pipelines:
The editor is built using HTML5, CSS, and a couple of libraries. The notable of these is retejs / rete, which is a diagram editor / visual programming framework.
In the editor, the pipeline elements are represented as nodes. Each element can have one or more type sensitive inputs and output, and as well as a list of properties.
Then nodes can linked together, and the pipeline can be exported in a JSON configuration file understood by Deeplib:
The Web IDE also has a Video Preview part, which can be used preview network video streams.
The following screenshot shows the preview of an USB camera streamed over TCP:
There are a lot of thing to improve on DeepLib and Web IDE. Some of the first ones I plan to implement are:
- more processing components
- adding support for more input & output types
- allow deploying pipeline directly from the WebIDE
One other product I want to try out is the NVIDIA Transfer Learning Toolkit, which allows allows adjusting existing Deep Neural Networks for our needs:
An another topic I want cover is Stereo Vision implemented on the Jeston Nano. The newly released rev. B01 of the Jetson Nano Developer Kit allows connecting two MIPI CSI cameras. It would be interesting to try to implement hardware accelerated 3D reconstruction the Jeston Nano.
Cheers!
Comments