In our previous projects, we looked at quantizing models for MaaXBoard OSM93, and converting models to the Vela format.
Today we'll be looking at optimizing the camera pipeline to run even faster ML applications.
What is NNStreamer?NNStreamer is an open source software built on top of GStreamer. GStreamer is a library that handles media pipelines. Not only can NNStreamer handle media (e.g. streaming video from a webcam), it can also parse tensors. This extends its capabilities to be able to integrate AI/ML models into streaming data flows.
NNStreamer supports the most popular inference engines (open source or not), like TensorFlow Lite, TensorFlow, PyTorch, ONNX, and more. On this NXP release, TensorFlow Lite and TVM engines are supported.
NXP has adapted NNStreamer to be accelerated on the VPUs, GPUs, and NPUs of its i.MX series boards.
Core components
NNstreamer consists of components called elements. If you're familiar with GStreamer, you'll be familiar with source (input) and sink (output) elements. NNStreamer extends GStreamer with the stream filters that apply multiple operations to tensors. The 4 most important are below:
tensor_converter
: Converts video frames to tensors.tensor_filter
: Executes the AI model (e.g., TFLite, TVM, etc.).tensor_decoder
: Post-processes outputs (e.g., bounding boxes, classifications).tensor_sink
: Captures tensor data for further use.
You can find the rest in NNStreamer's documentation here.
Similar software
A similar GStreamer plugin, GstInference, is less modular than NNStreamer, and lacks tensor_converter
and tensor_decoder
. If you used the i.MX8M+ Edge AI Kit, you might be familiar with this. On i.MX93 and other NXP i.MX boards it has been replaced with NNStreamer.
Image Classification example NNShark, a profiling tool based on GstShark, to monitor several pipeline metrics useful to assess the SoC hardware usage. NNShark can be used on the i.MX 8M Plus only.
Image Classification exampleWe'll start with a super simple image classification example so that you can get the hang of NNStreamer. We'll be using Tensorflow Lite (you can find basic instructions for using with TVM in the i.MX Machine Learning User's Guide).
For our NNStreamer pipeline, we'll need the following elements:
- Input Source: Feed an image into the pipeline.
- Tensor Preprocessor: Resize, normalize, and format the input image.
- Tensor Filter: Run inference using the MobileNet model.
- Tensor Decoder: Decode the results into human-readable labels.
Download the Mobilenet TFLite model and labels.txt from here:
wget https://raw.githubusercontent.com/nnsuite/testcases/master/DeepLearningModels/tensorflow-lite/Mobilenet_v1_1.0_224_quant/labels.txt
wget https://github.com/nnsuite/testcases/raw/refs/heads/master/DeepLearningModels/tensorflow-lite/Mobilenet_v1_1.0_224_quant/mobilenet_v1_1.0_224_quant.tflite
In the same directory as the model and labels, run the following pipeline:
gst-launch-1.0 v4l2src name=cam_src ! videoconvert ! videoscale ! \
video/x-raw,width=640,height=480,format=RGB ! \
tee name=t_raw \
t_raw. ! queue ! textoverlay name=overlay font-desc="Sans, 26" ! \
videoconvert ! autovideosink name=img_tensor \
t_raw. ! queue ! videoscale ! video/x-raw,width=224,height=224 ! \
tensor_converter ! \
tensor_filter framework=tensorflow-lite \
model=mobilenet_v1_1.0_224_quant.tflite ! \
tensor_decoder mode=image_labeling \
option1=labels.txt ! \
overlay.text_sink
Here's a graph explanation of this pipeline:
Tee
divides one input into multiple outputs. One flow from the tee is converted into tensor stream (tensor_converter
). Tensor_filter
predicts classifications using the TFLite Mobilenet model.
The result of tensor_decoder
(labels) is combined with the tex
toverlay
on the video stream and autovideosink
displays it.
If you test with an object in labels.txt (like a banana or an orange), you should see output like this on your screen:
TEST YOUR KNOWLEDGE:
Given the previous example, and the following files for object detection, how would you construct an object detection pipeline using NNstreamer?
If you get stuck, you can find a solution here.
Accelerating the pipelineNXP provides their own Gstreamer plugins, such as imxvideoconvert
, and imxcompositor
, which are accelerated on the MaaXBoard OSM93's pixel pipeline engine (PXP). Instead of a 3D GPU, i.MX93 features a PXP, which is a dedicated hardware block inside the processor. This 2D PXP can scale, flip, rotate, and do color space conversion. It has limited features compared to a GPU, but also it uses much less power. Using these plugins in place of gstreamer's videoconvert and compositor plugins can provide a speedup.
It's also possible to accelerate the tensor_filter
element (the ML model) on the NPU.
First, convert the Mobilenet model to vela format with this command:
vela mobilenet_v1_1.0_224_quant_vela.tflite
The mobilenet_v1_1.0_224_quant_vela.tflite file can now be seen under the newly created "output" directory. I moved my vela model into the same directory as my other model for ease of access:
mv output/mobilenet_v1_1.0_224_quant_vela.tflite .
The edited pipeline below is accelerated on the NPU and PXP:
gst-launch-1.0 v4l2src name=cam_src ! \
imxvideoconvert_pxp ! video/x-raw,width=640,height=480,format=BGRx ! \
tee name=t_raw \
t_raw. ! queue ! textoverlay name=overlay font-desc="Sans, 26" ! \
imxvideoconvert_pxp ! fpsdisplaysink name=img_tensor sync=false text-overlay=false \
t_raw. ! queue ! imxvideoconvert_pxp ! video/x-raw,width=224,height=224 ! \
videoconvert ! video/x-raw,format=RGB ! \
tensor_converter ! \
tensor_filter framework=tensorflow-lite \
model=mobilenet_v1_1.0_224_quant_vela.tflite accelerator=true:npu \
custom=Delegate:External,ExtDelegateLib:libethosu_delegate.so ! \
tensor_decoder mode=image_labeling option1=labels.txt ! \
overlay.text_sink
The graph of this updated pipeline is below:
Note the changes.
I've replaced videoconvert
and videoscale
with imxvideoconvert_pxp
. This accelerates image manipulation on the PXP. I have to add one videoconvert back in, because imxvideoconvert_pxp
has limited output formats.
To see what formats imxvideoconvert_pxp
supports, run:
gst-inspect-1.0 imxvideoconvert_pxp
It supports output to RGB16, but Tensorflow requires 24-bit RGB as input to the model. I've stuck with BGRx for the rest of the pipeline, because imxvideoconvert_pxp
supports it and it won't result in resolution loss.
Choosing the correct color space conversions is important both for speed and accuracy.
I've also changed autovideosink
to fpsdisplaysink
and added text-overlay=false
in order to print the frames per second (FPS) to the terminal after the pipeline terminates:
You can add fpsdisplaysink
to the previous un-accelerated pipeline to compare speeds:
Wow! A 27x speedup!
Coding this example in PythonThe gst-launch
tool is convenient for testing pipelines. For actual applications, you'll need to use NNStreamer elements embedded in code.
Thankfully, it's fairly easy to use NNStreamer elements within python or c++ code. We've already named our pipeline elements (e.g. img_tensor) so it will be easy to call them from the code.
Run the example on the MaaXBoard OSM93:
git clone git@github.com:zebular13/nnstreamer_image_classification.git
cd nnstreamer_image_classification
python3 nnstreamer_example_image_classification_tflite.py
You can also run it on cpu using the --cpu
flag:
python3 nnstreamer_example_image_classification_tflite.py --cpu
Explanation:
self.pipeline = Gst.parse_launch
sets up the pipeline.
Notice that it is missing the tensor_decoder
element. In its place is a tensor_sink
. This is called back later and updates the classification result to display in textoverlay
.
My basic benchmarks on MaaXBoard OSM93 are below:
Run on NPU: 25.4 FPS
Run on CPU: 16.2 FPS
Resources
Comments
Please log in or sign up to comment.