INTRO
What is Vela
What does Vela actually do
How does Vela compare to other optimization methods
Limitations
HOW TO CONVERT USING VELA
1. Compile on board
2. Use the eIQ Toolkit
3. Use the command line tool
CONVERTED MODELS
PERFORMANCE RESULTS

Published December 18, 2024 © GPL3+

Accelerating AI on MaaXBoard OSM93 - Part 2: Vela Conversion

Part 2 optimizing ML series shows using ethos-u vela to compile a model to a smaller, faster version to run MaaXBoard OSM93

AdvancedProtip2 hours290

Accelerating AI on MaaXBoard OSM93 - Part 2: Vela Conversion

Things used in this project

Hardware components

Tria Technologies MaaXBoard OSM93

Software apps and online services

TensorFlow

NXP eIQ Toolkit

Story

INTRO

This series covers optimizing machine learning model to run on the MaaXBoard OSM93. This board comes with a neural processing unit (NPU) that's capable of running low-power, fast machine learning. To squeeze as much performance as possible out of the NPU, some optimization is required.

In Part 1 of this series, we looked at quantization. Today we'll look at Vela conversion.

What is Vela?

Vela is a tool that is used to compile a TensorFlow Lite for Microcontrollers neural network model into an optimized version that can run on an embedded system containing an Arm Ethos-U NPU. The original Arm Vela compiler has been adapted by NXP for their boards that use the Arm Ethos-U NPU.

After compilation, the model is still in Tensorflow Lite format but can ONLY run on supported hardware.

What does Vela actually do?

Vela does graph optimization of the model. This includes things like fusing operations, pruning, batching and reordering, and memory optimizations like sharing and reuse.

How does Vela compare to other optimization methods?

There are many methods for optimizing machine learning models to run on machine learning accelerators.

Some processors, like NXP's i.MX8M+, doesn't require any model compilation. It can run Tensorflow models directly on the NPU. This makes it easier by removing the compilation step, but it has the tradeoff of running with higher power and less throughput than if it were fully optimized. Other methods of optimization, such as TVM, can get deeply technical.

Because Vela is so specific - only targeting the Arm Ethos-U NPU - it's possible to optimize the model with great results without much effort.

Limitations

NXP's Vela Tool doesn't support all of the operators that Tensorflow Lite Supports.

Supported operators:

Supported operators for the latest version of Vela can be found here, along with specific constraints for each operator.

Supported operators. Newly added operators in blue. More info here: https://github.com/nxp-imx/ethos-u-vela/blob/lf-6.6.52_2.2.0/SUPPORTED_OPS.md

HOW TO CONVERT USING VELA

Prerequisites

To convert a model using Vela, it must be quantized to UINT8 or INT8 format (check out the project on how to do that here).

Methods

There are three methods for conversion:

Compile on board
Use the eIQ Toolkit
Use the command line tool

1. Compile on board

MaaXBoard OSM93

This is the easiest way. The vela tool comes preloaded with the board's Linux image. Set up your board as detailed in the project "Getting Started With Machine Learning on MaaXBoard OSM93."

Move your quantized model to the MaaXBoard OSM93. Now just run one command, e.g.:

vela pose_detection_int_only_quant.tflite

Your terminal will print a summary of the operations that were quantized, as well as the estimated inference time. To learn more about how it gets these performance numbers, see Vela Performance Estimation Summary. You can also use the --verbose-performance command to print per-layer performance stats.

With that single command, you are done!

The vela compiled model will be in a folder named "output" and will be named pose_detection_int_only_quant_vela.tflite. There will also be a CSV file with the same details that were printed in the terminal summary about the converted model.

Note: add a swap file to the board if converting models larger than a couple GB because conversion is memory intensive.

2. Use the eIQ Toolkit

Another easy way to convert to Vela on your host PC is by using the NXP eIQ Toolkit. This tool runs on both Windows and Linux. Installation is simple.

Once the tool is installed, open it.

Select "Model Tool"
Select "Open Model" and select your quantized model.
Open the hamburger menu and select the "Convert" option.
Under conversion options, select Tensorflow Lite Vela/i.MX93 (.tflite) (eiq-converter-armvela)
Ciick "Convert":

1 / 7

Your converted Tensorflow Lite model will show up in the folder, along with a CSV file showing a summary of the converted model.

3. Use the command line tool

The command line tool offers the most options. The Vela converter is open source, so changes can even be made to the source code if that level of control is desired.

INSTALLATION

The Vela command line tool runs on Linux and Windows 10. Check the ethos-u vela repository on github for the latest install instructions.

Currently, Vela depends on the following versions of Tensorflow and Python:

Vela 3.12.0 to current supports TensorFlow 2.16
Vela 3.10.0 to current supports Python 3.10 (3.9)

Install the development version of Python 3.10 containing the Python/C API header files, e.g. apt install python3.10-dev or yum install python310-devel

Additionally, install:

pip3
C99 capable compiler and associated toolchainFor Linux operating systems, a GNU toolchain is recommended.For Microsoft Windows 10, the Microsoft Visual C++ 14.2 Build Tools are recommended. See https://wiki.python.org/moin/WindowsCompilers

Install Vela from PyPi using the following command:

pip3 install ethos-u-vela

Alternatively: clone the git repository and running pip install:

git clone https://review.mlplatform.org/ml/ethos-u/ethos-u-vela.git
cd ethos-u-vela
pip3 install .

OR if you want to modify the source code, run:

pip3 install -e .[dev]

The -e.[dev] installs the editable package in order to avoid reinstallation after every modification.

CONVERSION

Similar to compiling onboard, when running command line vela tool you can simply run this command:

vela pose_detection_full_quant.tflite

After conversion, network details, including total MACs (instead of the estimated inferene time), are printed to terminal:

The optimized version of the Tensorflow Lite model will be output to ./output/my_network_vela.tflite along with the CSV file.

GOING FURTHER

There are many different options that can be selected when compiling with Vela, such as various memory configurations, as well as trade-offs of performance vs peak SRAM usage. The ethos-u-vela repo contains instruction for selecting additional options when compiling via the command line.

Let me know in the comments if you would like me to cover these in more detail in a future project.

CONVERTED MODELS

After conversion, if you open the model in eIQ Model Tool (or Netron) you'll notice that most of the layers have been condensed into a single layer named "ethos-u." These layers are denoted in dark gray, while the CPU operators are shown in black.

In the pose detection model, the DepthToSpace operators are not converted because they aren't supported yet by the vela compiler (DepthToSpace was first supported in Tensorflow 2.16.1 so it's likely it will be supported in the next version of the compiler).

Pre-conversion vs Post-conversion pose detection model shown in Netron

If you converted both the full integer quantized and integer only quantized models, you'll see a difference in how many operators are placed on the CPU.

For the Full Integer Quantized model, the inputs and must first run through a quantize operator, and the outputs must be dequantized. Quantize and Dequantize aren't supported by the Ethos-U NPU (remember, the NPU only supports INT8, UINT8, and UINT16 operations).

The Integer Only Quantized model is able to put more operations on the NPU, and will likely run faster.

1 / 2 • Pose detection - full integer quantized vs integer only quantized

PERFORMANCE RESULTS

So what are the final performance results after converting the model to Vela? We already got a sneak peak at what the performance could be after conversion.

First, you may notice a difference in the model's size between the quantized model we started with and the output vela model. Here's the difference between the integer only quantized pose detection model:

Original quantized pose_detection model size: 3.6MB
Vela converted pose_detection model size: 1.8MB

Let's see if Vela's performance estimate was close to what the benchmark tool actually gets by using the benchmark tool in "/usr/bin/tensorflow-lite-[VERSION]/examples." Don't forget to include the external delegate path in the command:

/usr/bin/tensorflow-lite-2.10.0/examples/benchmark_model --graph=pose_detection_int_only_quant_vela.tflite --external_delegate_path=/usr/lib/libethosu_delegate.so

For the pose detection model, here are the stats the I benchmarked:

Quantized model performance: 6.86ms
Estimated Vela performance: 8.58ms
Benchmarked Vela performance: 6.78ms

The Vela estimated performance is worse than the actual benchmarked performance.

It's also interesting to note that the Vela performance isn't that much higher than the quantized model performance. This is likely due to the DepthToSpace operators falling back to CPU.

The Landmark model had more significant performance gains when converted to Vela. This is likely because all operations on this model are able to run on the NPU.

I'd love to hear about your Vela conversion results. Thanks for reading!

Credits

Monica Houston

80 projects • 463 followers

I don't live on a boat anymore.

Contact

Comments

Please log in or sign up to comment.

Accelerating AI on MaaXBoard OSM93 - Part 2: Vela Conversion

Things used in this project

Hardware components

Software apps and online services

Story

INTRO

What is Vela?

What does Vela actually do?

How does Vela compare to other optimization methods?

Limitations

HOW TO CONVERT USING VELA

1. Compile on board

2. Use the eIQ Toolkit

3. Use the command line tool

CONVERTED MODELS

PERFORMANCE RESULTS

Code

MediaPipeTfliteQuantization

Credits

Monica Houston

Comments

Embed the widget on your own site

Accelerating AI on MaaXBoard OSM93 - Part 2: Vela Conversion

Accelerating AI on MaaXBoard OSM93 - Part 2: Vela Conversion

Things used in this project

Hardware components

Software apps and online services

Story

INTRO

What is Vela?

What does Vela actually do?

How does Vela compare to other optimization methods?

Limitations

HOW TO CONVERT USING VELA

1. Compile on board

2. Use the eIQ Toolkit

3. Use the command line tool

CONVERTED MODELS

PERFORMANCE RESULTS

Code

MediaPipeTfliteQuantization

Credits

Monica Houston

Comments

Related channels and tags