Published March 16, 2023 © GPL3+

Porting Existing Jetson Nano Projects to the TI SK-TDA4VM

See how to take an existing edge machine vision project from the Jetson Nano to the SK-TDA4VM starter kit

BeginnerProtip1,594

Porting Existing Jetson Nano Projects to the TI SK-TDA4VM

Things used in this project

Hardware components

NVIDIA Jetson Nano Developer Kit

Texas Instruments SK-TDA4VM Edge AI starter kit

Software apps and online services

TensorFlow

Story

Edge machine learning applications

Instead of collecting data locally and sending it to the cloud for processing and inference as is performed in typical machine learning workflows, the advent of powerful, yet low-power, single board computers has meant each step can be done on the device itself. This has meant applications such as industrial monitoring, health tracking, and automated agriculture can all be made far more efficient and accurate.

The SK-TDA4VM and Jetson Nano

Released in 2019, NVIDIA's Jetson Nano development kit features a 128-core Maxwell GPU along with a quad-core ARM A57 CPU clocked at 1.43GHz. Additionally, the kit features 4GB of LPDDR4 memory, HDMI/Display Port connectors, gigabit Ethernet, and four USB 3.0 ports as well as a 40-pin GPIO header and dual CSI camera connectors.

Conversely, the SK-TDA4VM kit from Texas Instruments contains a dual core Arm Cortex-A72 CPU, DSP, deep learning, vision, and multimedia accelerators, 4GB of LPDDR4 memory, four USB ports, gigabit Ethernet, and HDMI/Display Port display outputs. To add even more cameras, there are two CSI camera connectors along the edge and a 40-pin Semtec connector on the underside of the board as well. Unlike the Jetson Nano, the SK-TDA4VM contains both an M.2 E-key slot for a WiFi/Bluetooth card and an M.2 M-key slot for an SSD or other PCIe x 4 device. Take a look at the previous getting started guide for more information about how to set up the kit and run a simple demo.

The existing project

The starting point for this project is a simple example written in Python 3.6 using JetPack4.5 that takes 20 resized images from the COCO17 dataset and passes them to an SSD MobileNet V1 TensorFlow Lite model taken from the TensorFlow website. Once the input data has been set, the Jetson Nano runs the TFLite interpreter on the CPU and times how long it takes for the result to be outputted. After running for a total of 20 iterations with a maximum power draw of 5W, the average running-time was 223milliseconds per inference. It should be noted that due to the limited power draw and lack of GPU compute for TFLite models on the Jetson Nano, this number is higher than what an optimized run would be.

Compiling an optimized TensorFlow model

The architecture of the TDA4VM requires that existing pretrained machine learning models, such as .tflite files, must first be compiled before they can be run on the hardware to take advantage of hardware accelerators. As seen in my getting started guide, TI provides a model zoo from which pretrained models can be downloaded in the correct format. They include not only the .tflite file, but also a param.yaml file with information about the model and various other artifacts.

In order to import a custom TensorFlow Lite model, one has to first set up the compilation environment. The toolset is validated for Ubuntu 18.04 running on either x86 or aarch64 architecture. I was able to install the Linux environment under WSL 2 on Windows 10 for a more simplified setup process. From here, I ran the following commands to clone the repository and configure the folder:

$ git clone https://github.com/TexasInstruments/edgeai-tidl-tools.git
$ cd edgeai-tidl-tools

Before executing the script below, I edited the requirements_pc.txt file to change the line onnx to onnx==1.4.1 since there is an issue installing the latest version.

$ source ./setup.sh --skip_cpp_deps

Select J721E as the target device if prompted. Run

$ ./scripts/run_python_examples.sh

to ensure compilation succeeds. Additionally, look at the ./model-artifacts and ./models directories to view the resulting artifacts.

In order to compile a model, first navigate to the examples/osrt_python directory and open the model_configs.py file. To add a new entry, simply append a dictionary with your model's parameters, along with changing the model_path to reflect where your model is stored. For example, this is the entry for my SSD_MobileNet_V1 TensorFlow Lite model:

'od-tfl-ssd_mobilenet_v1_1' : {
    'model_path' : os.path.join(models_base_path,'ssd_mobilenet_v1_1_metadata_1.tflite'),
    'mean': [127.5, 127.5, 127.5],
    'scale' : [1/127.5, 1/127.5, 1/127.5],
    'num_images' : numImages,
    'num_classes': 91,
    'model_type': 'od',
    'session_name' : 'tflitert',
    'od_type' : 'HasDetectionPostProcLayer'
}

There are many other entries listed that can be examined as well if your model is of a differing type. Edit line 231 in tfl/tflrt_delegate.py to replace the existing array of entries with your newly added one instead, then run

$ cd tfl
$ python3 tflrt_delegate.py -c

to compile the model without running inference. It should be noted that TIDL provides multiple deployment options which cover TFLite, ONNX, and TVM/Neo-AI runtimes. See the README file in their repository here for more information.

Integrating the model

Now that the model has been compiled, the artifacts from the corresponding folders/files within model-artifacts and models can be copied to the TDA4VM kit over SFTP. Just like with the Jetson Nano program, the Python code written for the SK-TDA4VM creates several randomized images and passes them as inputs to the tflite model while timing how long inference takes. And for other projects, the Python demo application found under /opt/edge_ai_apps/apps_python that comes with the default SK-TDA4VM OS image is a great starting point.

Performance comparisons

On average, the TDA4VM was able to perform an inference using the same 320x320 UINT8 images in a mere 9 milliseconds, giving a speedup of 24X compared to the Jetson Nano due to its onboard accelerator hardware. For more detailed inferencing data, including results and performance metrics, you can copy the following directories into the TDA4VM starter kit's local installation of the repo:

./model-artifacts
./models
./dockers/J721E/PSDKRA/setup.sh

then run the script with:

$ cd examples/osrt_python/tfl
$ python3 tflrt_delegate.py

Accuracy can be benchmarked by following the directions in the edgeai-benchmark repository. For more information regarding the TIDL tools and SDK, be sure to check out the repository and the documentation for the SK-TDA4VM kit.

Going further

Rather than grabbing a pre-trained TensorFlow model, converting it to TensorFlow Lite, and then using the TIDL utilities to generate artifacts, Edge Impulse makes it extremely simple, as projects can deploy models with the click of a button. This repository contains instructions for creating a new project, downloading the training data, building a custom learning block, and running the Docker container to output trained tflite and onnx models. The edge-impulse-linux-runner command will download an optimized model to the device and begin classifying, with output available in a web browser.

import numpy as np
import time
import glob
import tensorflow as tf
from PIL import Image

interpreter = tf.lite.Interpreter(model_path='<MODEL PATH>')
interpreter.allocate_tensors()

input_fmt = interpreter.get_input_details()
output_fmt = interpreter.get_output_details()

input_shape = (input_fmt[0]['shape'][1], input_fmt[0]['shape'][2])
print(f'Model expects {input_shape[0]}x{input_shape[1]} image', input_fmt[0]['shape'])

image_array = []
image_files = glob.glob('./images/original/*.jpg')

for file in image_files:
    im = Image.open(file)
    image_array.append(np.reshape(np.asarray(im.resize(input_shape)), input_fmt[0]['shape']))
    im.close()

total_time = 0
NUM_ITERATIONS = 20

for i in range(NUM_ITERATIONS):
    input_data = image_array[i]
    interpreter.set_tensor(input_fmt[0]['index'], input_data)

    start = time.time()

    interpreter.invoke()
    
    time_delta = time.time() - start
    total_time += time_delta
    
    results = [interpreter.get_tensor(output_detail['index']) \
                    for output_detail in output_fmt]

    print('Time taken:', f'{time_delta:.5f}', f'seconds - iteration {i}')
    
print('Average:', f'{total_time / NUM_ITERATIONS:.5f}', 'seconds')

from run_times import tflitert
import time
import numpy as np
from types import SimpleNamespace
import glob
from PIL import Image

params = {
    "artifacts": "<ARTIFACTS PATH>",
    "model_path": "<MODEL PATH>"
}

options = SimpleNamespace(**params)

print('options:', options)

runtime = tflitert(options)

input_shape = runtime.input_details[0]['shape']
image_size = (input_shape[1], input_shape[2])
print(f'Model expects {image_size[0]}x{image_size[1]} image', input_shape)

image_array = []
image_files = glob.glob('./images/original/*.jpg')

for file in image_files:
    im = Image.open(file)
    image_array.append(np.reshape(np.asarray(im.resize(image_size)), input_shape))
    im.close()

total_time = 0
NUM_ITERATIONS = 20

for i in range(NUM_ITERATIONS):
    input_data = image_array[i]

    start = time.time()

    results = runtime(input_data)
    
    time_delta = time.time() - start
    total_time += time_delta

    print('Time taken:', f'{time_delta:.5f}', f'seconds - iteration {i}')
    
print('Average:', f'{total_time / NUM_ITERATIONS:.5f}', 'seconds')

Credits

Evan Rust

123 projects • 1111 followers

IoT, web, and embedded systems enthusiast. Contact me for product reviews or custom project requests.

Contact

Comments

Please log in or sign up to comment.

Porting Existing Jetson Nano Projects to the TI SK-TDA4VM

Things used in this project

Hardware components

Software apps and online services

Story

Edge machine learning applications

The SK-TDA4VM and Jetson Nano

The existing project

Compiling an optimized TensorFlow model

Integrating the model

Performance comparisons

Going further

Code

Jetson Model Runner

TDA4VM Runner

Texas Instruments EdgeAI TIDL Tools

Credits

Evan Rust

Comments

Embed the widget on your own site

Porting Existing Jetson Nano Projects to the TI SK-TDA4VM

Porting Existing Jetson Nano Projects to the TI SK-TDA4VM

Things used in this project

Hardware components

Software apps and online services

Story

Edge machine learning applications

The SK-TDA4VM and Jetson Nano

The existing project

Compiling an optimized TensorFlow model

Integrating the model

Performance comparisons

Going further

Code

Jetson Model Runner

TDA4VM Runner

Texas Instruments EdgeAI TIDL Tools

Credits

Evan Rust

Comments

Related channels and tags