Instead of collecting data locally and sending it to the cloud for processing and inference as is performed in typical machine learning workflows, the advent of powerful, yet low-power, single board computers has meant each step can be done on the device itself. This has meant applications such as industrial monitoring, health tracking, and automated agriculture can all be made far more efficient and accurate.
The SK-TDA4VM and Jetson NanoReleased in 2019, NVIDIA's Jetson Nano development kit features a 128-core Maxwell GPU along with a quad-core ARM A57 CPU clocked at 1.43GHz. Additionally, the kit features 4GB of LPDDR4 memory, HDMI/Display Port connectors, gigabit Ethernet, and four USB 3.0 ports as well as a 40-pin GPIO header and dual CSI camera connectors.
Conversely, the SK-TDA4VM kit from Texas Instruments contains a dual core Arm Cortex-A72 CPU, DSP, deep learning, vision, and multimedia accelerators, 4GB of LPDDR4 memory, four USB ports, gigabit Ethernet, and HDMI/Display Port display outputs. To add even more cameras, there are two CSI camera connectors along the edge and a 40-pin Semtec connector on the underside of the board as well. Unlike the Jetson Nano, the SK-TDA4VM contains both an M.2 E-key slot for a WiFi/Bluetooth card and an M.2 M-key slot for an SSD or other PCIe x 4 device. Take a look at the previous getting started guide for more information about how to set up the kit and run a simple demo.
The starting point for this project is a simple example written in Python 3.6 using JetPack4.5 that takes 20 resized images from the COCO17 dataset and passes them to an SSD MobileNet V1 TensorFlow Lite model taken from the TensorFlow website. Once the input data has been set, the Jetson Nano runs the TFLite interpreter on the CPU and times how long it takes for the result to be outputted. After running for a total of 20 iterations with a maximum power draw of 5W, the average running-time was 223
milliseconds
per inference. It should be noted that due to the limited power draw and lack of GPU compute for TFLite models on the Jetson Nano, this number is higher than what an optimized run would be.
The architecture of the TDA4VM requires that existing pretrained machine learning models, such as .tflite
files, must first be compiled before they can be run on the hardware to take advantage of hardware accelerators. As seen in my getting started guide, TI provides a model zoo from which pretrained models can be downloaded in the correct format. They include not only the .tflite
file, but also a param.yaml
file with information about the model and various other artifacts.
In order to import a custom TensorFlow Lite model, one has to first set up the compilation environment. The toolset is validated for Ubuntu 18.04 running on either x86 or aarch64 architecture. I was able to install the Linux environment under WSL 2 on Windows 10 for a more simplified setup process. From here, I ran the following commands to clone the repository and configure the folder:
$ git clone https://github.com/TexasInstruments/edgeai-tidl-tools.git
$ cd edgeai-tidl-tools
Before executing the script below, I edited the requirements_pc.txt file to change the line onnx
to onnx==1.4.1
since there is an issue installing the latest version.
$ source ./setup.sh --skip_cpp_deps
Select J721E
as the target device if prompted. Run
$ ./scripts/run_python_examples.sh
to ensure compilation succeeds. Additionally, look at the ./model-artifacts
and ./models
directories to view the resulting artifacts.
In order to compile a model, first navigate to the examples/osrt_python
directory and open the model_configs.py
file. To add a new entry, simply append a dictionary with your model's parameters, along with changing the model_path
to reflect where your model is stored. For example, this is the entry for my SSD_MobileNet_V1 TensorFlow Lite model:
'od-tfl-ssd_mobilenet_v1_1' : {
'model_path' : os.path.join(models_base_path,'ssd_mobilenet_v1_1_metadata_1.tflite'),
'mean': [127.5, 127.5, 127.5],
'scale' : [1/127.5, 1/127.5, 1/127.5],
'num_images' : numImages,
'num_classes': 91,
'model_type': 'od',
'session_name' : 'tflitert',
'od_type' : 'HasDetectionPostProcLayer'
}
There are many other entries listed that can be examined as well if your model is of a differing type. Edit line 231 in tfl/tflrt_delegate.py
to replace the existing array of entries with your newly added one instead, then run
$ cd tfl
$ python3 tflrt_delegate.py -c
to compile the model without running inference. It should be noted that TIDL provides multiple deployment options which cover TFLite, ONNX, and TVM/Neo-AI runtimes. See the README file in their repository here for more information.
Integrating the modelNow that the model has been compiled, the artifacts from the corresponding folders/files within model-artifacts
and models
can be copied to the TDA4VM kit over SFTP. Just like with the Jetson Nano program, the Python code written for the SK-TDA4VM creates several randomized images and passes them as inputs to the tflite model while timing how long inference takes. And for other projects, the Python demo application found under /opt/edge_ai_apps/apps_python
that comes with the default SK-TDA4VM OS image is a great starting point.
On average, the TDA4VM was able to perform an inference using the same 320x320 UINT8 images in a mere 9 milliseconds
, giving a speedup of 24X compared to the Jetson Nano due to its onboard accelerator hardware. For more detailed inferencing data, including results and performance metrics, you can copy the following directories into the TDA4VM starter kit's local installation of the repo:
./model-artifacts
./models
./dockers/J721E/PSDKRA/setup.sh
then run the script with:
$ cd examples/osrt_python/tfl
$ python3 tflrt_delegate.py
Accuracy can be benchmarked by following the directions in the edgeai-benchmark repository. For more information regarding the TIDL tools and SDK, be sure to check out the repository and the documentation for the SK-TDA4VM kit.
Going furtherRather than grabbing a pre-trained TensorFlow model, converting it to TensorFlow Lite, and then using the TIDL utilities to generate artifacts, Edge Impulse makes it extremely simple, as projects can deploy models with the click of a button. This repository contains instructions for creating a new project, downloading the training data, building a custom learning block, and running the Docker container to output trained tflite
and onnx
models. The edge-impulse-linux-runner command will download an optimized model to the device and begin classifying, with output available in a web browser.
Comments