Published February 5, 2025 © GPL3+

Accelerating AI on MaaXBoard OSM93 Part 3: Camera Pipeline

Using NNStreamer to accelerate the camera pipeline

AdvancedProtip2 hours393

Accelerating AI on MaaXBoard OSM93 Part 3: Camera Pipeline

Things used in this project

Hardware components

Tria Technologies MaaXBoard OSM93

Tria Technologies AES-ACC-MAAX-DISP2

Webcam, Logitech® HD Pro

Software apps and online services

TensorFlow

NNStreamer

Story

INTRO

In our previous projects, we looked at quantizing models for MaaXBoard OSM93, and converting models to the Vela format.

Today we'll be looking at optimizing the camera pipeline to run even faster ML applications.

What is NNStreamer?

NNStreamer is an open source software built on top of GStreamer. GStreamer is a library that handles media pipelines. Not only can NNStreamer handle media (e.g. streaming video from a webcam), it can also parse tensors. This extends its capabilities to be able to integrate AI/ML models into streaming data flows.

NNStreamer supports the most popular inference engines (open source or not), like TensorFlow Lite, TensorFlow, PyTorch, ONNX, and more. On this NXP release, TensorFlow Lite and TVM engines are supported.

NXP has adapted NNStreamer to be accelerated on the VPUs, GPUs, and NPUs of its i.MX series boards.

Core components

NNstreamer core components

NNstreamer consists of components called elements. If you're familiar with GStreamer, you'll be familiar with source (input) and sink (output) elements. NNStreamer extends GStreamer with the stream filters that apply multiple operations to tensors. The 4 most important are below:

tensor_converter: Converts video frames to tensors.
tensor_filter: Executes the AI model (e.g., TFLite, TVM, etc.).
tensor_decoder: Post-processes outputs (e.g., bounding boxes, classifications).
tensor_sink: Captures tensor data for further use.

You can find the rest in NNStreamer's documentation here.

Similar software

A similar GStreamer plugin, GstInference, is less modular than NNStreamer, and lacks tensor_converter and tensor_decoder. If you used the i.MX8M+ Edge AI Kit, you might be familiar with this. On i.MX93 and other NXP i.MX boards it has been replaced with NNStreamer.

Image Classification example NNShark, a profiling tool based on GstShark, to monitor several pipeline metrics useful to assess the SoC hardware usage. NNShark can be used on the i.MX 8M Plus only.

Image Classification example

We'll start with a super simple image classification example so that you can get the hang of NNStreamer. We'll be using Tensorflow Lite (you can find basic instructions for using with TVM in the i.MX Machine Learning User's Guide).

For our NNStreamer pipeline, we'll need the following elements:

Input Source: Feed an image into the pipeline.
Tensor Preprocessor: Resize, normalize, and format the input image.
Tensor Filter: Run inference using the MobileNet model.
Tensor Decoder: Decode the results into human-readable labels.

Download the Mobilenet TFLite model and labels.txt from here:

wget https://raw.githubusercontent.com/nnsuite/testcases/master/DeepLearningModels/tensorflow-lite/Mobilenet_v1_1.0_224_quant/labels.txt
wget https://github.com/nnsuite/testcases/raw/refs/heads/master/DeepLearningModels/tensorflow-lite/Mobilenet_v1_1.0_224_quant/mobilenet_v1_1.0_224_quant.tflite

In the same directory as the model and labels, run the following pipeline:

gst-launch-1.0 v4l2src name=cam_src ! videoconvert ! videoscale ! \
video/x-raw,width=640,height=480,format=RGB ! \
tee name=t_raw \
t_raw. ! queue ! textoverlay name=overlay font-desc="Sans, 26" ! \
videoconvert ! autovideosink name=img_tensor  \
t_raw. ! queue ! videoscale ! video/x-raw,width=224,height=224 ! \
tensor_converter ! \
tensor_filter framework=tensorflow-lite \
model=mobilenet_v1_1.0_224_quant.tflite ! \
tensor_decoder mode=image_labeling \
option1=labels.txt ! \
overlay.text_sink

Here's a graph explanation of this pipeline:

graph explanation of this pipeline

Tee divides one input into multiple outputs. One flow from the tee is converted into tensor stream (tensor_converter). Tensor_filter predicts classifications using the TFLite Mobilenet model.

The result of tensor_decoder (labels) is combined with the textoverlay on the video stream and autovideosink displays it.

If you test with an object in labels.txt (like a banana or an orange), you should see output like this on your screen:

photo of a banana with the label text "banana" overlaid

TEST YOUR KNOWLEDGE:

Given the previous example, and the following files for object detection, how would you construct an object detection pipeline using NNstreamer?

If you get stuck, you can find a solution here.

Accelerating the pipeline

NXP provides their own Gstreamer plugins, such as imxvideoconvert, and imxcompositor, which are accelerated on the MaaXBoard OSM93's pixel pipeline engine (PXP). Instead of a 3D GPU, i.MX93 features a PXP, which is a dedicated hardware block inside the processor. This 2D PXP can scale, flip, rotate, and do color space conversion. It has limited features compared to a GPU, but also it uses much less power. Using these plugins in place of gstreamer's videoconvert and compositor plugins can provide a speedup.

It's also possible to accelerate the tensor_filter element (the ML model) on the NPU.

First, convert the Mobilenet model to vela format with this command:

vela mobilenet_v1_1.0_224_quant_vela.tflite

The mobilenet_v1_1.0_224_quant_vela.tflite file can now be seen under the newly created "output" directory. I moved my vela model into the same directory as my other model for ease of access:

mv output/mobilenet_v1_1.0_224_quant_vela.tflite .

The edited pipeline below is accelerated on the NPU and PXP:

gst-launch-1.0 v4l2src name=cam_src ! \
imxvideoconvert_pxp ! video/x-raw,width=640,height=480,format=BGRx ! \
tee name=t_raw \
t_raw. ! queue ! textoverlay name=overlay font-desc="Sans, 26" ! \
imxvideoconvert_pxp ! fpsdisplaysink name=img_tensor sync=false text-overlay=false \
t_raw. ! queue ! imxvideoconvert_pxp ! video/x-raw,width=224,height=224 ! \
videoconvert ! video/x-raw,format=RGB ! \
tensor_converter ! \
tensor_filter framework=tensorflow-lite \
model=mobilenet_v1_1.0_224_quant_vela.tflite accelerator=true:npu \
custom=Delegate:External,ExtDelegateLib:libethosu_delegate.so ! \
tensor_decoder mode=image_labeling option1=labels.txt ! \
overlay.text_sink

The graph of this updated pipeline is below:

graph explanation of the accelerated pipeline

Note the changes.

I've replaced videoconvert and videoscale with imxvideoconvert_pxp. This accelerates image manipulation on the PXP. I have to add one videoconvert back in, because imxvideoconvert_pxp has limited output formats.

To see what formats imxvideoconvert_pxp supports, run:

gst-inspect-1.0 imxvideoconvert_pxp

It supports output to RGB16, but Tensorflow requires 24-bit RGB as input to the model. I've stuck with BGRx for the rest of the pipeline, because imxvideoconvert_pxp supports it and it won't result in resolution loss.

Choosing the correct color space conversions is important both for speed and accuracy.

I've also changed autovideosink to fpsdisplaysink and added text-overlay=false in order to print the frames per second (FPS) to the terminal after the pipeline terminates:

print the fps to the terminal: 27.494 FPS

You can add fpsdisplaysink to the previous un-accelerated pipeline to compare speeds:

Unaccelerated pipeline ran slowly - only 1.059 FPS

Wow! A 27x speedup!

Coding this example in Python

The gst-launch tool is convenient for testing pipelines. For actual applications, you'll need to use NNStreamer elements embedded in code.

Thankfully, it's fairly easy to use NNStreamer elements within python or c++ code. We've already named our pipeline elements (e.g. img_tensor) so it will be easy to call them from the code.

Run the example on the MaaXBoard OSM93:

git clone git@github.com:zebular13/nnstreamer_image_classification.git
cd nnstreamer_image_classification
python3 nnstreamer_example_image_classification_tflite.py

You can also run it on cpu using the --cpu flag:

python3 nnstreamer_example_image_classification_tflite.py --cpu

Explanation:

self.pipeline = Gst.parse_launch sets up the pipeline.

Notice that it is missing the tensor_decoder element. In its place is a tensor_sink. This is called back later and updates the classification result to display in textoverlay.

My basic benchmarks on MaaXBoard OSM93 are below:

Run on NPU: 25.4 FPS

Run on CPU: 16.2 FPS

Resources

Credits

Monica Houston

80 projects • 464 followers

I don't live on a boat anymore.

Contact

Comments

Please log in or sign up to comment.

Accelerating AI on MaaXBoard OSM93 Part 3: Camera Pipeline

Things used in this project

Hardware components

Software apps and online services

Story

INTRO

What is NNStreamer?

Image Classification example

Accelerating the pipeline

Coding this example in Python

Resources

Code

NNStreamer Image Classification

Credits

Monica Houston

Comments

Embed the widget on your own site

Accelerating AI on MaaXBoard OSM93 Part 3: Camera Pipeline

Accelerating AI on MaaXBoard OSM93 Part 3: Camera Pipeline

Things used in this project

Hardware components

Software apps and online services

Story

INTRO

What is NNStreamer?

Image Classification example

Accelerating the pipeline

Coding this example in Python

Resources

Code

NNStreamer Image Classification

Credits

Monica Houston

Comments

Related channels and tags