The coronavirus disease (COVID-19) has seriously affected the world and there is no end to it. According to scientific research papers, wearing a face mask at public places reduces the risk of transmission significantly. In today’s tutorial, we will learn how we can build face mask detection system using Nvidia Jetson board. Also, I want to share my experience how we can optimize a deep learning model using TensorRT to get a faster inference time.
This tutorial consists of four phases:
- Prerequisites
- Data Gathering
- Training the model
- Inference
There are many steps involved to this tutorial, so there's a lot that is about to be thrown your way.
Required Hardware PartsTo complete this tutorial you will need:
- Nvidia Jetson Xavier NX Developer Kit (You could also use the Nvidia Jetson Nanowhich is slightly cheaper and consumes less energy.)
- USB WebCam for use with real-time applications. For live camera demonstrations, cameras like the Raspberry Pi Camera module are required. Here we’ll be using Arducam Complete High Quality Camera Bundle.
- M.2 NVMe Internal SSD(optional)
- A minimum 16GB micro SD card(optional)
- USB keyboard and mouse
- Monitor and HDMI cable
- Host computer with VMWare or VirtualBox installed and 16.04 or 18.04 LTS Ubuntu on the VM. It can be also a computer running Ubuntu 16.04 or 18.04 LTS. Here we’ll be using a VirtualBox.
For this project, I’ve used the Arducam IMX477 HQ Camera Board (cost: $89.99), as it’s relatively cheap, easy to install, and it provides good results.
In a nut shell, It's a camera module with IMX477 sensor with a resolution of 12.3 MP for NVIDIA Jetson Nano/Xavier.
By using this camera, combined with a Jetson Nano/ Xavier NX Development Kits, you can simply implement any AI vision applications like text and object recognition. Also, you can experience better quality video capture from this camera and build more demanding projects!
Download the corresponding Arducam camera driver package for Jetson boards from this link.
The NVIDIA Jetson Xavier NX Developer KitAs the brain of our face mask detector, we used a Jetson Xavier NX Developer Kit. NVIDIA Jetson devices are small low-power AI accelerators that can run machine learning algorithms in real-time. However, deploying complex deep learning models on such devices with limited memory is challenging. In this case, we need to use inference optimization tools, such as TensorRT, to be able to run deep learning models on these platforms, on the edge devices.
The Nvidia JetPack has in-built support for TensorRT, cuDNN (CUDA-powered deep learning library), OpenCV, and other developer tools.
Desktop PCs or server-level graphics cards (such as Nvidia 3080Ti, etc.) are very expensive, not suitable for edge computing needs, and are too bulky. Therefore, this embedded artificial intelligence development board Jetson boards launched by NVIDIA is very suitable for current industry needs. It is believed that embedded development boards empowered by deep learning will become the mainstream of the industry in the future.
Ok ready… Lets get started.
Launching AI and DL applications from NVMe SSD (optional for Xavier boards)The instructions are below and will guide you on booting the Nvidia Jetson Xavier Nx without an SD Card.
By default, Nvidia Jetson boards boot up and store all of its programs on a microSD memory card, which has a maximum theoretical read speed of 100 MB/s and just 40 MB/s of write speed. This is unacceptable for most modern AI and DL applications. Also, continuous writing over time to a Micro SD card is not a good idea and will at some point cause data corruption or even worse the complete failure of the card leading to data loss.
Here I collect the results of my measurements for MicroSD. Gnome Disk Utility is capable of benchmarking your storage devices.
Installing the SSD into Nvidia Jetson NX board is pretty much a no-brainer. Make sure that your Jetson NX is powered off.
Follow the instructions provided by Jim from JetsonHacks just below.
Using an external NVMe SSD as your main storage drive could speed things up significantly and, with a few commands, you can do just that. There is a wonderful video by JetsonHacks on YouTube.
Also, JetsonHacks website provides helpful tips and tricks to work with Jetson boards. So, I followed the Native Boot for Jetson Xaviers instruction posted by the JetsonHacks community, and did the following below:
In order to flash an operating system on the Jetson Xavier, it is necessary to have a computer running Ubuntu. If you do not have a computer running Ubuntu 18.04 pr 16.04, you can use a virtual machine in VMware or Virtual Box. Go to this link to download Ubuntu 18.04.
VM is not set to recognize the host usb devices automatically. So you need to add the device into VM’s USB setting in order to let the guest recognize the device as well.
Then simply follow the instructions provided by JetsonHacks. If everything goes well, you will finish flashing the OS image on the Jetson using VM
Benchmark results for NVMe SSD M.2 disk is shown below.
The results here are clear enough.
Getting the Jetson Board up to dateBefore installing the TensorFlow and other dependencies, the Jetson board needs to be fully updated. Use the below commands to update to its latest version:
Let’s update our package list and upgrade the software on the system.
Open the terminal, and type:
sudo apt-get update
sudo apt-get upgrade
Type Y and press Enter to upgrade everything.
Once you are done, you can shut down the Jetson Nano board with this command:
sudo shutdown -r now
Maximize the Jetson board performanceAs you may already know, OpenCV has no support for GPU, it is CPU-based implementation and you should see improvement to run CPU at max clocks using jetson_clocks command.You should see something like this:
sudo jetson_clocks
Also, you can change the power model of Jetson board with the nvpmodel command
sudo nvpmodel -m 8
The default mode of Jetson Xavier NX is only 4 CPU, so if you want to open its maximum performance, then try it following appears and switch the mode.
Also, you can check the currently set mode by entering the following command.
sudo nvpmodel -q
following appears,
NV Fan Mode:quiet
NV Power Mode: MODE_20W_6CORE
8
A reboot is required for the mod to take effect.
Change GDM3 to lightdm (optional)First, let’s free up some RAM to allow our Jetson to run heave loaded AI and DL applications.
Nvidia Jetson boards use GDM3 (GNOME display manager) as the default display manager. Gdm3 is the common ubuntu interface, which can be changed to the lightdm interface through the below commands. LightDM became one of the most popular display managers. It aims to be light and fast, with low code complexity. It is a lightweight graphical interface.
At some point you are going to want to know some information about the RAM on your Jetson — how much you have, how much is used, how much is free, etc. Thankfully, there is a simple terminal command to give you this information.
Open a terminal window and enter this command.
free -h
This will give you a quick glance at the RAM usage.
total used free shared buff/cache availabe
Mem: 7,6G 2,1G 4,0G 129M 1,4G 5,3G
Swap: 3,8G 0B 3,8G
Now open a terminal window, and run the below command to change the display manager from gdm3 to lightdm.
sudo dpkg-reconfigure gdm3
You will see a window pop up. Press Enter.
This will bring up a menu where you can select lightdm as default display manager.
The change will be applied in the next boot. You need to reboot for this to take effect. After restart, run the command shown below:
cat /etc/X11/default-display-manager
Output
/usr/sbin/lightdm
Check RAM usage
free -h
total used free shared buff/cache available
Mem: 7,6G 1,2G 5,7G 126M 708M 6,1G
Swap: 3,8G 0B 3,8G
You can find that gdm3 originally used about 2.1G of RAM capacity, but it became about 1.2G after being converted to lightdm.
Give Jetson board more swap(optional)By default, the original swap space is low, but we can increase the swap space through jetson_swap command.
sudo jetson_swap -d ~ -s 8 -a
The output will be as follows:
Creating Swapfile at: /home/jetson
Swapfile Size: 8G
Automount: Y
-rw-r--r-- 1 root root 8,0G Қаз 30 19:57 /home/jetson/swapfile
-rw------- 1 root root 8,0G Қаз 30 19:57 /home/jetson/swapfile
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=6f4dc316-683a-45a2-8b0f-306f8bcd3dba
Filename Type Size Used Priority
/dev/zram0 partition 993972 0 5
/dev/zram1 partition 993972 0 5
/dev/zram2 partition 993972 0 5
/dev/zram3 partition 993972 0 5
/home/jetson/swapfile file 8388604 0 -1
Modifying /etc/fstab to enable on boot
Added in /etc/fstab "/home/jetson/swapfile none swap sw 0 0"
Finally, you need to reboot the Jetson with below command:
sudo reboot now
Once the Nano is done rebooting, see if can check swap space.
free -h
total used free shared buff/cache available
Mem: 7,6G 1,3G 5,6G 126M 715M 6,0G
Swap: 11G 0B 11GInstalling CUDA on the Nvidia Jetson board
Installing CUDA on Nvidia Jetson boardWe can run detection system either in CPU or run with GPU acceleration. In order to do GPU acceleration, you may need a good Nvidia based graphics card with CUDA cores. Compute unified device architecture (CUDA) is an Nvidia-developed platform for parallel computing on CUDA-enabled GPUs. So, make sure you have CUDA installed.
Nvidia Jetson boards installs the L4T operating system through JetPack, which already contains the complete stable version of CUDA, but sometimes the problem of not catching CUDA still occurs. We can check through nvcc:
nvcc -V
If you cannot find it, you need to manually add it to the environment variable:
cd /usr/local/src
then run below command
echo "export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> ~/.bashrc
Source the bashrc file.
source ~/.bashrc
After installation, run the command to check it. If the CUDA install was done correctly, the PATH environment variable will be properly set up.
dpkg -l | grep cuda
Lastly, please test that CUDA is installed properly by running: nvcc --version. The output should say the version of CUDA installed on your Jetson.
nvcc --version
You may get following below output if everything is successful.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
If it returns Command 'nvcc' not found, you need to install cuda properly.
Monitoring Jetson board performance in real time using jtopBefore starting, install this useful third-party tool jetson_stats(jtop), which can monitor the performance of the Jetson board in real-time.
To put simply, jtopis an enhanced version of top for Jetsons which shows real-time GPU and memory usage developed by Raffaello Bonghi.
The installation method is as follows:
sudo -H pip install -U jetson-stats
Reboot your Jetson board.
sudo reboot
After restarting, you can execute jtop to see if it can run successfully!
jtop
We can check the version of OpenCV and whether it can use CUDA (compiled CUDA) through jetson_release:
jetson_release
You should see something like this:
- NVIDIA Jetson Xavier NX (Developer Kit Version)
* Jetpack 4.6 [L4T 32.6.1]
* NV Power Mode: MODE_10W_DESKTOP - Type: 5
* jetson_stats.service: active
- Libraries:
* CUDA: 10.2.300
* cuDNN: 8.2.1.32
* TensorRT: 8.0.1.6
* Visionworks: 1.6.0.501
* OpenCV: 4.1.1 compiled CUDA: NO
* VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
* Vulkan: 1.2.70
It can be noticed that the option of compiled CUDA is NO, so we need to spend a little time to rebuild OpenCV.
Build OpenCV from the source with CUDA supportOpenCV included in JetPack by default is not built for CUDA, so you need to reinstall OpenCV for GPU (CUDA) to use OpenCV's CUDA functions in Python or C ++.
This section describes the procedure for using GPU (CUDA) with OpenCV.
Install the required dependencies:
sudo apt-get install build-essential cmake git unzip pkg-config
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install libgtk2.0-dev libcanberra-gtk*
sudo apt-get install libxvidcore-dev libx264-dev libgtk-3-dev
sudo apt-get install libtbb2 libtbb-dev libdc1394-22-dev
sudo apt-get install libv4l-dev v4l-utils
sudo apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
sudo apt-get install libavresample-dev libvorbis-dev libxine2-dev
sudo apt-get install libfaac-dev libmp3lame-dev libtheora-dev
sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev
sudo apt-get install libopenblas-dev libatlas-base-dev libblas-dev
sudo apt-get install liblapack-dev libeigen3-dev gfortran
sudo apt-get install libhdf5-dev protobuf-compiler
sudo apt-get install libprotobuf-dev libgoogle-glog-dev libgflags-dev
If you want the Qt5 support enabled in OpenCV, you have to download the library as shown in the command below.
sudo apt-get install qt5-default
When all third-party software is installed, OpenCV itself can be downloaded. There are two packages needed; the basic version and the additional contributions.
Move to the home directory and download the OpenCV source package and decompress it:
cd ~
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.5.4.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.5.4.zip
unzip opencv.zip && unzip opencv_contrib.zip
After unzipping opencv and opencv_contrib, move folders
mv opencv-4.5.4 opencv
mv opencv_contrib-4.5.4 opencv_contrib
Clean up the zip files
rm opencv.zip
rm opencv_contrib.zip
Once the download is complete, create a temporary build directory, and switch to it:
cd ~/opencv
mkdir build && cd build
We are now ready to use cmake to configure our build.
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
-D EIGEN_INCLUDE_PATH=/usr/include/eigen3 \
-D WITH_OPENCL=OFF \
-D WITH_CUDA=ON \
-D CUDA_ARCH_BIN=5.3 \
-D CUDA_ARCH_PTX="" \
-D WITH_CUDNN=ON \
-D WITH_CUBLAS=ON \
-D ENABLE_FAST_MATH=ON \
-D CUDA_FAST_MATH=ON \
-D OPENCV_DNN_CUDA=ON \
-D ENABLE_NEON=ON \
-D WITH_QT=OFF \
-D WITH_OPENMP=ON \
-D WITH_OPENGL=ON \
-D BUILD_TIFF=ON \
-D WITH_FFMPEG=ON \
-D WITH_GSTREAMER=ON \
-D WITH_TBB=ON \
-D BUILD_TBB=ON \
-D BUILD_TESTS=OFF \
-D WITH_EIGEN=ON \
-D WITH_V4L=ON \
-D WITH_LIBV4L=ON \
-D OPENCV_ENABLE_NONFREE=ON \
-D INSTALL_C_EXAMPLES=OFF \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-D BUILD_opencv_python3=TRUE \
-D OPENCV_GENERATE_PKGCONFIG=ON \
-D BUILD_EXAMPLES=OFF ..
The output is as follows.
--
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jetson/opencv/build
With all compilation directives in place, you can start the build with the following command.
make -j4
We launch the make command - make -j4. With all the four cores (j4 stands for four cores here). You can adjust the j option with respect to the hardware available. The building/installation process of could take a couple of hours, though. So be patient…
If the build is completed successfully, you will see the following files.
[100%] Built target opencv_python2
[100%] Built target opencv_python3
Then, install OpenCV with:
sudo rm -r /usr/include/opencv4/opencv2
sudo make install
sudo ldconfig
Finally, now that OpenCV is installed, let’s perform a bit of cleanup and remove the unnecessary files.
make clean
sudo rm -rf ~/opencv
sudo rm -rf ~/opencv_contrib
Through jetson_release, you can notice that the option of complied CUDA has become YES:
jetson_release
The output is as follows.
- NVIDIA Jetson Xavier NX (Developer Kit Version)
* Jetpack 4.6 [L4T 32.6.1]
* NV Power Mode: MODE_10W_DESKTOP - Type: 5
* jetson_stats.service: active
- Libraries:
* CUDA: 10.2.300
* cuDNN: 8.2.1.32
* TensorRT: 8.0.1.6
* Visionworks: 1.6.0.501
* OpenCV: 4.5.4 compiled CUDA: YES
*VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
* Vulkan: 1.2.70
If the installation is finished without error, you can check OpenCV 4.5 as follows.
python3 -c "import cv2; print(cv2.__version__)"
We should see this result;
4.5.4
Please be aware that installing OpenCV and Dlib with CUDA support does not automatically speed up your application. You most likely need to apply changes to your program code in order to utilize GPU instead of CPU.
Configuring your TF Python environment(optional)Using a virtualenv for your Tensorflow projects, especially if you have a lot of different packages or Python versions floating around. It's great for keeping different development environments with different Python packages and libraries isolated. That it is almost mandatory to create a virtual environment in order to properly install tensorflow, scipy and keras, and always a best practice.
We will configure our Python environment. For managing virtual environments we'll be using virtualenv, which can be installed like below:
sudo pip3 install virtualenv virtualenvwrapper
To get virtualenv to work we need to add the following lines to the ~/.bashrc file:
nano ~/.bashrc
And add the below lines
# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh
To activate the changes the following command must be executed:
source ~/.bashrc
Now we can create a virtual environment using the mkvirtualenv command.
mkvirtualenv tf -p python3
OpenCV is also need to be there in virtualenv. You can create symbolic link of cv2.so inside virtualenv, if you have installed openCV globally.
Create a symbolic link from the OpenCV install directory to your virtual environment
cd ~/.virtualenvs/xxxx/lib/python3.6/site-packages/
ln -s /usr/lib/python3.6/dist-packages/cv2/python-3.6/cv2.cpython-36m-aarch64-linux-gnu.so cv2.s
xxxx = virtual environment folder name
Install TensorflowTensorFlow is a library developed by Google that was released as open source in 2015. TensorFlow makes it very simple to build and train a Machine Learning model.
Make sure you are inside the tf virtual environment by using the workon command:
workon tf
First, we should update the system and install the necessary files;
sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
Install and upgrade pip3.
sudo apt-get install python3-pip
Install the Python package dependencies.
pip3 install -U pip testresources setuptools numpy==1.16.1 future==0.17.1 mock==3.0.5 h5py==2.9.0 keras_preprocessing==1.0.5 keras_applications==1.0.8 gast==0.2.2 futures protobuf pybind11
Install the latest version of TensorFlow.
pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==2.3.1+nv20.12
The processes below may take a couple of minutes to finish;
Check tensorflow version,
python3 -c 'import tensorflow as tf; print(tf.__version__)'
2.3.1
If you see the result above, Tensorflow is ready for use. You may see the below error:
ImportError: /usr/lib/aarch64-linux-gnu/libgomp.so.1: cannot allocate memory in static TLS block
You can fix it by running below command:
export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1
Reboot your Jetson board.
Download face mask dataset from KaggleFirst, you need to prepare a dataset - a collection of images to work with. Kaggle datasets are the best place to discover, explore and analyze open data. To use Kaggle resources and you need to log in to the Kaggle website
I will use this face-mask classification dataset on Kaggle which has images categorized into two folders as with_mask and without_mask. This dataset is available for download at the following link. Unzip a zip file.
The more images of the same person from different angles, the more precisely the system learns to recognize them from different angles.
Training the model for face mask detection systemIt is a well-known fact that NVIDIA Jetson Boards are an embedded device, which means it will likely be slower than any modern desktop or laptop computer you might encounter. As a result, it’s not intended to be used as a development system to train machine learning models. It is recommend to use server-level graphics cards (such as Nvidia 3080Ti, etc.) or cloud-based services like Google's Teachable Machine, if you wish to train deep models from scratch. Teachable Machine is a browser application that you can train with your webcam to recognize objects or expressions.
Here, I've tried training the model both on my Jetson Board and using Google's Teachable Machine service.
Method 1- Training on the edge using Nvidia Jetson board(not recommended)
At this stage, we will collect face photos that will be downloaded.
Install below packages
pip3 install matplotlib
pip3 install scipy
pip3 install scikit-learn==0.18
pip3 install imutils
Run the following command in the main directory of your project.
python3 train.py --dataset dataset --plot myplot.png --model mymodel.model
You might get "System throttled due to Over-current" messages on the desktop. It is called a mechanism warning device to prevent under voltage or over current of Jetson board, which is operational behavior of board itself.
Please be noticed that Jetson Nano only has 4GB RAM memory and this will limit you from training a complex model. I recommend to use either desktop GPU for training your model since Nano is designed as an edge device (inference).
You can also monitor kernel messages during training with this command.
dmesg --follow
If you see below message, you have now successfully trained your model! That wasn't too hard, was it?
[INFO] evaluating network...
precision recall f1-score support
with_mask 1.00 1.00 1.00 45
without_mask 1.00 1.00 1.00 43
avg / total 1.00 1.00 1.00 88
[INFO] saving mask detector model...
Let's plot the train/validation accuracy and loss. Figure below shows the training accuracy, as well as the model training and validation loss when having trained for 20 epochs.
From the plot of loss, we can see that the model has comparable performance on both train and validation datasets. This is important so that the model is not undertrained and not overtrained. You should seek to minimize your loss and maximize your accuracy. Ideally the difference between your validation data results and your training data results should be similar.
Method 2 - Training the model using Teachable Machine
Teachable Machine is platform developed by Google. It's a visual, no-code environment, where without writing a single line of code you can build a model, capable of recognizing your own custom images, sounds or even human poses. Process is designed into three steps, namely gathering (Gather), training (Train) and exporting (Export).
First, go to Google’s Teachable Machine and click Get Started. Then, select Image Project.
- Then Standard image model
- Create 2 classes called with_mask and without_mask
- Upload the images of face mask dataset you want to detect.
- Once you have set up all of your classes and are happy with your datasets, it is time to train the model! Click the Train Model button.
- To export the model, click the "Export Model" button. A new window will pop up. Click the "Tensorflow" tab and select the "Keras" model conversion type.
Finally, Teachable Machine helps in every step of computer vision problem right from data collection to deployment. I would definitely recommend using Teachable Machine if you don’t have a super fast computer with GPU.
Running inference with the TF modelNow let’s test the model we just trained. Launch the tm.py script.
python3 tm.py
Here’s a demo showing that it runs in real time:
After a few seconds, you should see your camera view pop-up window. If the mask is detected, you will see a label message "with_mask" and the label "without_mask" if no mask detected. Note that the model is not always the most accurate.
If camera crashes, it's possible to reinitialize it using below command.
sudo systemctl restart nvargus-daemon
Congratulations! you have trained your Jeson board to recognize your face mask.
ONNX overviewONNX is an open format for machine learning and deep learning models. It allows you to convert deep learning and machine learning models from different frameworks such as TensorFlow, PyTorch, MATLAB, Caffe, and Keras to a single format.
There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:
TF -> UFF -> TRT
TF -> ONNX -> TRT
The workflow consists of the following steps:
- Convert the TensorFlow/Keras model to a.pb file. pb stands for protobuf. In TensorFlow, the protbuf file contains the graph definition as well as the weights of the model.
- Convert the.pb file to the ONNX format.
- Create a TensorRT engine.
- Run inference from the TensorRT engine.
First, you need to convert the tensorflow model to onnx format. We will use the tf2onnx package. Before installation, you need to make sure that onnx has been installed. Here, we provide the installation command of the dependent packages to onnx:
sudo apt-get install protobuf-compiler libprotoc-dev
pip3 install onnx==1.4.1
pip3 install onnxruntime
Upgrade numpy (optional):
python3 -m pip install -U numpy --no-cache-dir --no-binary numpy
Install tf2onnx:
pip3 install tf2onnx
We can convert to onnx model by executing the following commands:
python3 -m tf2onnx.convert --saved-model ./tf --output ./mymodel.onnx
Visualize ONNX ModelNetron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Caffe, Keras, Darknet, PaddlePaddle, ncnn, MNN, Core ML, RKNN, MXNet, MindSpore Lite, TNN, Barracuda, Tengine, CNTK, TensorFlow.js, Caffe2 and UFF.
Now, let’s visualize our ONNX graph using Netron. Open web browser and go to https://netron.app.
Open your onnx model.You will see the full network graph. Check that input and output have the expected size. It’s possible to export PNG images of the models.
After the ONNX conversion, the next step is to convert the ONNX model into a TensorRT network, also called a TensorRT engine.
Convert ONNX to TensorRT engineCurrent releases of TensorRT support 3 kinds of "parsers": Caffe, UFF and ONNX. There are two main methods to convert ONNX files to TensorRT engine:
- Use trtexec
- Use TensorRT API
In this guide, we will focus on using trtexec. Use the following command to convert one of the above ONNX models to the TensorRT engine trtexec, we can run this conversion as follows:
/usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/Desktop/facemask/mymodel.onnx --saveEngine=/home/jetson/Desktop/facemask/engine.trt --shapes=input0:1x3x224x224
This will convert our onnx model to an engine called TensorRT engine format.
At the same time, you need to install pycuda.
pip3 install pycuda
Also note that the larger the model, the more effect CUDA has. It has to do with the relatively time-consuming memory swapping between CPU and GPU. The more tensors the GPU can process at one time, the better the performance boost.
Since we will use the example common.py provided by TensorRT, we will first copy it directly:
cp /usr/src/tensorrt/samples/python/common.py ./common.py
Running inference with the optimized modelAfter tedious operations, you can finally run the program:
python3 tm_tensorrt.py
Below result visualization:
Seems like converting to TensorRT improves performance significantly. With the TensorRT, the Jetson NX performs fairly well, achieving a frame rate higher than 50 FPS. This is fast enough for most real-time object detection applications
After our test runs we have the following performance results:
- Standard TensorFlow graph — 4 FPS;
- TensorRT optimized graph — 60 FPS;
Finally, It can be concluded that TF-TRT is dramatically faster than raw TensorFlow on the Jetson board.
Two things you could try to speed up inference also:
- To use larger batch size which I’m not sure if it works on Jetson boards since it has resource limitations.
- In OpenCV, main format is BGR which is not supported by most hardware engines in Jetson, and requires significant CPU usage and better to run CPUs at max clock. Basically, It limits performance. For running deep learning inference, it’s recommended to use MMAPI or Deepstream SDK.
That’s it for today! You have a Face mask detection system installed and ready to use!
I hope you found this guide useful and thanks for reading it. If you have any questions or feedback? Leave a comment below. Stay tuned!
Here is the full source code on GitHub.
References:- How to Build a Face Mask Detector with Raspberry Pi
- Visualize Neural Network Model using Netron
- How to change the default display manager in Debian Linux
- A Guide to using TensorRT on the Nvidia Jetson Nano
- Save 1GB of Memory! Use LXDE on your Jetson
- NVIDIA Jetson Nano application-multi-threaded parallel processing, take the project-"Are you wearing a mask?" as an example
- Native Boot for Jetson Xaviers
- How To Install PyCUDA On NVIDIA Jetson Xavier NX & Jetson Nano Devices
- Install TensorFlow 2.3.1 on Jetson Nano
- 4-screen display with NVIDIA Jetson's GStreamer
- Benchmarking TF-TRT on the Raspberry Pi and Jetson Nano
- Getting a Running Start with the NVIDIA Jetson Nano
Comments
Please log in or sign up to comment.