•

gianluca filippini

Published November 21, 2023 © Apache-2.0

Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

Supercharge Your ZUBoard: Unleasing Unprecedented Performance with the Hailo-8 AI Accelerator.

IntermediateFull instructions provided4 hours4,020

Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

Things used in this project

Hardware components

Tria Technologies ZUBoard 1CG

Tria Technologies 96Boards ON Semiconductor Dual Camera Mezzanine

Software apps and online services

AMD Vitis Unified Software Platform

AMD PetaLinux

OpenCV – Open Source Computer Vision Library OpenCV

Story

Introduction

In my previous projects, I illustrated how to get the Deep Processing Unit (DPU) provided with Vitis-AI up and running on the ZUBoard.

We also explored two HSIO add-on modules that could be used with the ZUBoard:

HSIO : Dual Camera Mezzanine
HSIO : DiplayPort with eMMC

In this project, we will look at a new HSIO add-on module that provides exciting new expansion for the ZUBoard:

HSIO : M2 expansion

I will attempt to implement an HSIO hat-trick (hockey jargon for 3 goals).

The ZUBoard HSIO hat-trick

The M2 expansion used will be the Hailo-8 AI accelerator.

ZUBoard + HSIO M2 + Hailo-8

Going beyond the Programmable Logic

The following table illustrates the Peak TOPS available with the Vitis-AI DPU on the Zynq UltraScale+ devices.

Ideal Peak TOPS for Zynq UltraScale+ designs

This assumes that the entire programmable logic (PL) is available to implement the Vitis-AI DPU.

In reality, however, when we add a MIPI capture pipeline for our Dual Camera HSIO, we have to scale back the DPU to leave enough resources for this peripheral logic. The following table adds a lower (more realistic) range for the Peak TOPS that would typically be available in a real design.

Realistic Peak TOPS for Zynq UltraScale+ designs

To be more specific, in our ZUBoard Dual Camera design, we are able to implement the B128 DPU, which has a peak of 0.038 TOPS.

By adding a Hailo-8 AI accelerator module, we can theoretically boost the Peak TOPS of the ZUBoard beyond what is available in any of the Zynq UltraScale+ devices. In fact we are now in the realm of what is available with only the next-generation Versal AI Edge.

Potential Peak TOPS with Hailo-8

The Peak TOPS available for our ZUBoard Dual Camera design will theoretically increase by a factor of 26.0/0.038 = 684x !!!

This will, of course, have to be benchmarked on the board.

But the motivation definitely is there :)

Introducing the HSIO M2 module

The M.2 High Speed I/O module provides support for M2 expansion.

The following connectors are supported:

B-Key
E-Key

The following modules sizes are supported:

B-Key : 2230-2242-2260-2280
E-Key : 2230

The following modules types are supported:

B-Key: SSDs or accelerators via PCIe (2 lanes)
E-Key: WiFi Modules via SDIO/UART

Introducing the Hailo-8 AI Acceleration module

The Hailo-8 M.2 Module is an AI accelerator module for AI applications, compatible with NGFF M.2 form factor M, B+M and A+E keys.

The AI module is based on the 26 tera-operations per second (TOPS) Hailo-8 AI processor with high power efficiency. The M.2 AI accelerator features a full PCIe Gen-3.0 2-lane interface (4-lane in M-key module), delivering unprecedented AI performance for edge devices.

The M.2 module can be plugged into an existing edge device with M.2 socket to execute in real-time and with low power deep neural network inferencing for a broad range of market segments.

Leveraging Hailo’s comprehensive Dataflow Compiler and its support for standard AI frameworks, customers can easily port their Neural Network models to the Hailo-8 and introduce high-performance AI products to the market quickly.

Reference : https://hailo.ai/products/ai-accelerators/hailo-8-m2-ai-acceleration-module/#hailo8-m2-overview

Hailo-8 AI Accelerator - M.2 modules

The Hailo-8 comes in three form factors, each with a convenient starter kit:

M-Key : 4 PCIe lanes, Size 2280, Starter Kit HM218B1C2XAE
B+M-Key : 2 PCIe lanes, Size 2280, Starter Kit HM218B1C2ZAE
A+E-Key : 2 PCIe lanes, Size 2230, Starter Kit HM218B1C2YAE

The highest performing module is the M-Key with 4 PCIe lanes. However, our HSIO M.2 adapter does not support this module.

For this project, I chose to use the B+M-Key with 2 PCIe lanes. The impact of having 2 lanes instead of 4 lanes does not affect the Peak TOPS of the core AI engine. It does, however, limit the I/O bandwidth to this core AI engine, which will affect the FPS that can be achieved for larger networks.

The starter kit comes with a lot of thermal options. In order to choose the right solution for my use use, I registered and referred to the abundant documentation on Hailo's developer zone.

https://developer.hailo.ai/developer-zone

I chose to use the natural thermal convection solution, with the heat sink provided in the starter kit, which is sufficient for 4W operation in 25C ambient air.

Hailo-8 B+M-Key with natural thermal convection solution (Heat Sink)

Getting Hailo-8 Up and Running

For this step, I referred to the experience of my colleague Gianluca Filippini (EBV), in order to get the Hailo-8 up and running with the following milestones:

Milestone 1 - Hailo-8 detected on PCI express bus
Milestone 2 - Hailo-8 detected by driver and runtime
Milestone 3 - Hailo-8 working with TAPPAS

Milestone 1 - Hailo-8 detected on PCI express bus

The starting point for our design is the ZUBoard Dual Camera design, which can be re-built on a correctly installed Linux machine with Vitis 2022.2 and Vitis-AI 3.0 as follows:

git clone https://github.com/Avnet/bdf bdf
git clone -b 2022.2 https://github.com/Avnet/hdl hdl
git clone -b 2022.2 https://github.com/Avnet/petalinux petalinux
cd petalinux
./scripts/make_zub1cg_sbc_dualcam.sh

Since the first milestone is to detect the Hailo-8 module via PCI express, we need to enable this functionality in our design.

The ZUBoard Dual Camera design has the HSIO DP-eMMC populated on the J2 connector, as shown in the following image:

ZUBoard Dual Camera design : original HSIO placement

The HSIO M.2 module, however, must be placed on the J2 connector, since it requires the lower transceiver lanes for the PCI express functionality. For this reason, we must move the HSIO DP-eMMC module to the J1 connector, as shown below:

ZUBoard Dual Camera design : modified HSIO placement

This starts with our Vivado project, which can be edited as follows:

cd ../hdl/projects/zub1cg_sbc_dualcam_2022_2
vivado zub1cg_sbc_dualcam.xpr &

Open the block diagram, and double-click on the PS block.

Enable the following option to get access to the PCIe Configuration.

Switch to Advanced Mode : Enabled

Select the PCIe Configuration and make the following changes:

Device Port Type : Root Port

PS Config 1 - Configuring PCIe as Root Port

Next, Select the I/O Configuration, and open the High Speed Peripheral section, and make the following changes to enable PCIe, and move DP to the upper transceivers.

Make the following modifications for PCIe:

PCIe : Enabled
Rootport Mode Reset : MIO30
Reset Polarity : Active Low
Lane Selection : x2

Make the following modifications for DisplayPort

DPAUX : EMIO
Lane Selection : Dual Higher

PS Config 2 - Enabling PCIe, Moving DisplayPort

Select the Clock Configuration, and the Input Clock tab, and make the following modifications:

GT Lane Reference Frequency (PCIe) : REFCLK0, 100MHz
GT Lane Reference Frequency (DisplayPort) : REFCLK1, 135MHz

PC Config 3 - Configuring the GT Reference Freqencies for PCIe and DisplayPort

Click OK to save the modifications.

Connect the new DPAUX EMIO ports as External Ports in the block diagram, as shown below:

1 / 2 • Block Design - Connecting the DP AUX port as External Ports

NOTE : the dp_aux_data_oe_n port requires a polarity inversion, which can be implemented with a util_vector_logic module.

Finally, the pin constraints for the DPAUX EMIO ports must be determined from the following hardware schematics:

1 / 3 • ZUBoard schematics - Bank65 pin mapping

Add the following constraints for the DPAUX EMIO ports in the design's XDC file.

#######################################################################
# DisplayPort HPD & AUX
#######################################################################
set_property IOSTANDARD LVCMOS12 [get_ports {dp_hot_plug_detect*}]
set_property IOSTANDARD LVCMOS12 [get_ports {dp_aux_data*}]
set_property PACKAGE_PIN K1 [get_ports dp_aux_data_out_0 ]; # HP_DP_15_P
set_property PACKAGE_PIN J1 [get_ports dp_hot_plug_detect_0 ]; # HP_DP_15_N
set_property PACKAGE_PIN D2 [get_ports dp_aux_data_oe_0 ]; # HP_DP_24_P
set_property PACKAGE_PIN C2 [get_ports dp_aux_data_in_0 ]; # HP_DP_24_N

Save all modifications, and Rebuild the bitstream.

When done, run the following command in the Vivado project's TCL Console to re-generate the XSA file:

write_hw_platform -file zub1cg_sbc_dualcam.xsa -include_bit -force
validate_hw_platform zub1cg_sbc_dualcam.xsa -verbose

The device tree definition in the petalinux project needs to be modified as follows:

project-spec/meta-avnet/recipes-bsp/device-tree/files/zub1cg-sbc/system-bsp.dtsi

...
/ {
   ...
   gtr_refclk_pcie: gtr_refclk_pcie { /* PCIe - 100MHz */
      compatible = "fixed-clock";
      #clock-cells = <0>;
      clock-frequency = <100000000>;
   };
   gtr_refclk_dp: gtr_refclk_dp { /* DP - 135MHz */
      compatible = "fixed-clock";
      #clock-cells = <0>;
      clock-frequency = <135000000>;
   };
};

...
&psgtr {
   clocks = <&gtr_refclk_pcie>,<&gtr_refclk_dp>;
   clock-names = "ref0","ref1";
};
/*
   The cells contain the following arguments.
   - description: The GTR lane
      minimum: 0
      maximum: 3
   - description: The PHY type
      enum:
      - PHY_TYPE_DP
      - PHY_TYPE_PCIE
      - PHY_TYPE_SATA
      - PHY_TYPE_SGMII
      - PHY_TYPE_USB
   - description: The PHY instance
      minimum: 0
      maximum: 1 # for DP, SATA or USB
      maximum: 3 # for PCIE or SGMII
   - description: The reference clock number
      minimum: 0
      maximum: 3
*/
&zynqmp_dpsub {
   phy-names = "dp-phy0","dp-phy1";
   //phys = <&psgtr 1 6 0 0>, <&psgtr 0 6 1 0>;
   phys = <&psgtr 3 6 0 2>, <&psgtr 2 6 1 2>;
   status = "okay";
   xlnx,max-lanes = <2>;
};
&zynqmp_dpdma {
   status = "okay";
};
&zynqmp_dp_snd_pcm0 {
   status = "okay";
};
&zynqmp_dp_snd_pcm1 {
   status = "okay";
};
&zynqmp_dp_snd_card0 {
   status = "okay";
};
&zynqmp_dp_snd_codec0 {
   status = "okay";
};

On the command line, rebuild the petalinux project, taking into account the new hardware XSA file:

cd petalinux/projects/zub1cg_sbc_dualcam_2022_2
petalinux-config --get-hw-description=../../../hdl/projects/zub1cg_sbc_dualcam_2022_2/zub1cg_sbc_dualcam.xsa --silentconfig
petalinux-build

This will re-generate the SD card image located in the following directory

petalinux/projects/zub1cg_sbc_dualcam_2022_2/images/linux/rootfs.wic

Program this new image to a 16GB (or greater) micro-SD card, and boot the ZUBoard.

Use the "lspci" command to detect the PCI express peripherals.

root@zub1cg-sbc-dualcam-2022-2:~# lspci
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

Use the "lspci -vv" variant of the command to list the details of the Hailo-8 AI Processor.

root@zub1cg-sbc-dualcam-2022-2:~# lspci -vv
00:00.0 Bridge: Xilinx Corporation Device d011
        ...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
        Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 111
        Region 0: Memory at 600000000 (64-bit, prefetchable) [size=16K]
        Region 2: Memory at 600008000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at 600004000 (64-bit, prefetchable) [size=16K]
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [f8] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
        Capabilities: [108 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [200 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [300 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0

We are doing great !

Next, we will start adding the yocto recipes for the Hailo-8 driver, runtime, and APIs.

Milestone 2 - Hailo-8 detected by driver and runtime

Now that we have a working petalinux project that can detect the Hailo-8 module on the PCI express bus, we can add its driver and run-time. This content is available as a yocto recipe on github, and can be obtained as follows:

cd project-spec
git clone -b honister https://github.com/hailo-ai/meta-hailo
cd ..

The gstreamer patches included with this repo do not work on our petalinux project, so we have to remove them:

project-spec/meta-hailo-tappas/recipes-multimedia/gstreamer/gstreamer1.0-plugins-base_%.imx.bbappend

Remove the the two patch files from the.bbappend file:

FILESEXTRAPATHS:prepend := "${THISDIR}/files:"
SRC_URI += " \
   file://allocate-cached-ion-buffer.patch \
   file://dont-pre-allocate-ion-buffers-gldownload-and-glupload.patch \
   "

As follows:

FILESEXTRAPATHS:prepend := "${THISDIR}/files:"
SRC_URI += " \
   "

Now we need to make our petalinux project aware of these new yocto recipes. This can be done in our main "config" file:

project-spec/configs/config

...
#
# User Layers
#
CONFIG_USER_LAYER_0="{proot}/project-spec/meta-avnet"
CONFIG_USER_LAYER_1="{proot}/project-spec/meta-hailo/meta-hailo-accelerator"
CONFIG_USER_LAYER_2="{proot}/project-spec/meta-hailo/meta-hailo-libhailort"
CONFIG_USER_LAYER_3="{proot}/project-spec/meta-hailo/meta-hailo-tappas"
CONFIG_USER_LAYER_4=""
...

Next, we declare the existence of the following Hailo-8 packages in our "userrootfsconfig" file:

project-spec/meta-user/config/userrootfsconfig

...
CONFIG_hailortcli
CONFIG_libhailort
CONFIG_pyhailort
CONFIG_hailo-pci
CONFIG_hailo-firmware

Finally, we add the packages required for the driver and run-time to our project in our "rootfsconfig" file:

project-spec/configs/rootfsconfig

...
#
# User Packages
#
# CONFIG_gpio-demo is not set
# CONFIG_peekpoke is not set
CONFIG_hailortcli=y
CONFIG_libhailort=y
CONFIG_pyhailort=y
CONFIG_hailo-pci=y
CONFIG_hailo-firmware=y
...

On the command line, rebuild the petalinux project:

petalinux-build

This will re-generate the SD card image located in the following directory

petalinux/projects/zub1cg_sbc_dualcam_2022_2/images/linux/rootfs.wic

Program this new image to a 16GB (or greater) micro-SD card, and boot the ZUBoard.

During boot, you will notice new content being generated specific to the Hailo-8 driver.

[    7.607911] hailo: Init module. driver version 4.15.0
[    7.608165] hailo 0000:01:00.0: Probing on: 1e60:2864...
[    7.608174] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 11592
[    7.608209] pci 0000:00:00.0: enabling device (0000 -> 0002)
[    7.608225] hailo 0000:01:00.0: enabling device (0000 -> 0002)
[    7.608235] hailo 0000:01:00.0: Probing: Device enabled
[    7.608287] hailo 0000:01:00.0: Probing: mapped bar 0 - (____ptrval____) 16384
[    7.608301] hailo 0000:01:00.0: Probing: mapped bar 2 - (____ptrval____) 4096
[    7.608312] hailo 0000:01:00.0: Probing: mapped bar 4 - (____ptrval____) 16384
[    7.608324] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    7.608333] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[    7.608356] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[    7.608365] hailo 0000:01:00.0: Disabling ASPM L0s
[    7.608377] hailo 0000:01:00.0: Successfully disabled ASPM L0s
[    7.824552] hailo 0000:01:00.0: Firmware was loaded successfully
[    7.851488] hailo 0000:01:00.0: Probing: Added board 1e60-2864, /dev/hailo0

The presence of the Hailo-8 driver can be confirmed with the "lsmod" command.

root@zub1cg-sbc-m2-2022-2:~# lsmod
Module                  Size  Used by
zocl                  184320  0
hailo_pci              65536  0
uio_pdrv_genirq        16384  0
dmaproxy               16384  0

Also, the "lspci -vv" command will output additional content relating to the Hailo-8 driver:

root@zub1cg-sbc-dualcam-2022-2:~# lspci -vv
...
        Kernel driver in use: hailo
        Kernel modules: hailo_pci

We can verify the Hailo-8 run-time with the "hailortcli" command, as shown below:

root@zub1cg-sbc-m2-2022-2:~# hailortcli fw-control identify -s 01:00.0
Executing on device: 0000:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.15.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8
Serial Number: HLLWMB0214600101
Part Number: HM218B1C2LA
Product Name: HAILO-8 AI ACCELERATOR M.2 B+M KEY MODULE

We have successfully detected the Hailo-8 AI Accelerator M.2 B+M Key module !

We can already run some benchmarks with the run-time.

First download some pre-compile model from the Hailo model zoo:

https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/PUBLIC_MODELS.rst

If your ZUBoard is connected to the internet, this can be done directory on the embedded platform with the "wget" utility:

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/resnet_v1_50.hef

Then run some benchmarks with the "hailortcli" utility:

root@zub1cg-sbc-dualcam-2022-2:~#  hailortcli benchmark resnet_v1_50.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network resnet_v1_50/resnet_v1_50: 100% | 19686 | FPS: 1312.33 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network resnet_v1_50/resnet_v1_50: 100% | 19686 | FPS: 1312.34 | ETA: 00:00:00
Measuring HW Latency
Network resnet_v1_50/resnet_v1_50: 100% | 4952 | HW Latency: 2.85 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 1312.35
        (streaming)               = 1312.35
Latency (hw)                      = 2.85141 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 3.8952 W
                          (max)     = 3.92493 W

root@zub1cg-sbc-dualcam-2022-2:~#

Next, we will add the TAPPAS, which is the application level software, including programming APIs and gstreamer plug-ins.

Milestone 3 - Hailo-8 working with TAPPAS

Now that we have proven that the Hailo-8 driver and run-time are working, we add the applications layers, or TAPPAS to our project.

We declare the existence of the following Hailo-8 packages in our "userrootfsconfig" file:

project-spec/meta-user/config/userrootfsconfig

...
CONFIG_hailortcli
CONFIG_libhailort
CONFIG_pyhailort
CONFIG_hailo-pci
CONFIG_hailo-firmware
CONFIG_libgsthailo
CONFIG_libgsthailotools
CONFIG_tappas-apps
CONFIG_hailo-post-processes

Finally, we add the packages required for the driver and run-time to our project in our "rootfsconfig" file:

project-spec/configs/rootfsconfig

...
#
# User Packages
#
# CONFIG_gpio-demo is not set
# CONFIG_peekpoke is not set
CONFIG_hailortcli=y
CONFIG_libhailort=y
CONFIG_pyhailort=y
CONFIG_hailo-pci=y
CONFIG_hailo-firmware=y
CONFIG_libgsthailo=y
CONFIG_libgsthailotools=y
CONFIG_tappas-apps=y
CONFIG_hailo-post-processes=y
...

On the command line, rebuild the petalinux project:

petalinux-build

This will re-generate the SD card image located in the following directory

petalinux/projects/zub1cg_sbc_dualcam_2022_2/images/linux/rootfs.wic

Program this new image to a 16GB (or greater) micro-SD card, and boot the ZUBoard.

You will notice the presence of an "apps" directory in the root directory.

root@zub1cg-sbc-dualcam-2022-2:~# ls -R apps
apps:
detection  license_plate_recognition  multistream_detection

apps/detection:
detection.sh resources

apps/detection/resources:
configs  yolov5m_yuv.hef

apps/detection/resources/configs:
yolov5.json

apps/license_plate_recognition:
license_plate_recognition.sh  resources

apps/license_plate_recognition/resources:
configs  liblpr_ocrsink.so  liblpr_overlay.so  lpr.raw  lprnet_yuy2.hef  tiny_yolov4_license_plates_yuy2.hef  yolov5m_vehicles_no_ddr_yuy2.hef

apps/license_plate_recognition/resources/configs:
yolov4_license_plate.json  yolov5_vehicle_detection.json
apps/multistream_detection:
multi_stream_detection.sh  resources

apps/multistream_detection/resources:
configs  detection0.mp4  detection1.mp4  detection2.mp4  detection3.mp4  detection4.mp4  detection5.mp4  yolov5s_personface_nv12_no_ddr.hef

apps/multistream_detection/resources/configs:
yolov5_personface.json

root@zub1cg-sbc-dualcam-2022-2:~#

We can run benchmarks on the models including in these directories as well:

root@zub1cg-sbc-dualcam-2022-2:~# hailortcli run2 set-net ./apps/detection/resources/yolov5m_yuv.hef
[HailoRT CLI] [warning] "hailortcli run2" is not optimized for single model usage. It is recommended to use "hailortcli run" command for a single model
[===================>] 100% 00:00:00
yolov5m_yuv: fps: 86.50

root@zub1cg-sbc-dualcam-2022-2:~#  hailortcli benchmark ./apps/detection/resources/yolov5m_yuv.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network yolov5m_yuv/yolov5m_yuv: 100% | 1548 | FPS: 103.19 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network yolov5m_yuv/yolov5m_yuv: 100% | 1425 | FPS: 94.93 | ETA: 00:00:00
Measuring HW Latency
Network yolov5m_yuv/yolov5m_yuv: 100% | 480 | HW Latency: 21.52 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 103.193
        (streaming)               = 94.9366
Latency (hw)                      = 21.5191 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 3.06434 W
                          (max)     = 3.15601 W

The demo scripts were created for iMX platforms, so need to be modified for use with the ZUBoard Dual Camera design.

I have provided one such script in the attachment section of this project : zub1cg_dualcam_hailo8_detection.sh

The script accepts the following arguments:

mode : primary (left sensor), secondary (right sensor), dual (both sensors)
width : width of image generated by capture pipeline
height : height of image generated by capture pipeline
format : format (yuv, rgb) generated by capture pipeline
sink : window, dp, fake

root@zub1cg-sbc-dualcam-2022-2:~/apps/detection# ./zub1cg_dualcam_hailo8_detection.sh --help
unknown arg --help
USAGE: zub1cg_dualcam_hailo8_detection.sh [OPTIONS]
-m|--mode           mode must be 'dual', 'primary' or 'secondary'
-s|--sink           sink must be 'dp', 'window' or 'fake'
-f|--format         output format must be 'yuv' or 'rgb'
-w|--width          output width
-h|--height         output height
root@zub1cg-sbc-dualcam-2022-2:~/apps/detection#

Since the yolov5m_yuv.hef model in this example expects a 1280x720 YUV input, the scripts defaults to these values.

We can select to use the left, right, or both (side-by-side) sensors as input video.

1 / 3 • ZUBoard MIPI Capture - mode == dual

We can also select to use either of the following outputs:

window (output to desktop, least performance due to color format conversion)
dp (output to native DP, best performance)
sink (no output, highest performance)

First let's try outputting to the desktop:


root@zub1cg-sbc-dualcam-2022-2:~/apps/detection# export DISPLAY=:0.0
root@zub1cg-sbc-dualcam-2022-2:~/apps/detection# ./zub1cg_dualcam_hailo8_detection.sh --mode primary --sink window
WARNING: format not set: using default 'yuv' format
WARNING: output resolution not set: using default '1280x720' resolution
Run Camera with: mode=primary, sink=window, output resolution=1280x720, format=yuv
+ media-ctl -d /dev/media0 -V ''\''ap1302.0-003c'\'':2 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0000000.mipi_csi2_rx_subsystem'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0000000.mipi_csi2_rx_subsystem'\'':1 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0020000.v_proc_ss'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0020000.v_proc_ss'\'':1 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0040000.v_proc_ss'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0040000.v_proc_ss'\'':1 [fmt:UYVY8_1X16/1280x720 field:none]'
+ set +x
+ media-ctl -d /dev/media0 -l ''\''0-003c.ar0144.0'\'':0 -> '\''ap1302.0-003c'\'':0[1]'
+ media-ctl -d /dev/media0 -l ''\''0-003c.ar0144.1'\'':0 -> '\''ap1302.0-003c'\'':1[0]'
+ set +x
Detected AR0144 - disabling AWB
Detected AR0144 - setting brightness
+ gst-launch-1.0 v4l2src device=/dev/video0 io-mode=dmabuf '!' 'video/x-raw, width=1280, height=720, format=YUY2, framerate=60/1' '!' queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 '!' hailonet hef-path=/home/root/apps/detectio
n/resources/yolov5m_yuv.hef '!' queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 '!' hailofilter function-name=yolov5 config-path=/home/root/apps/detection/resources/configs/yolov5.json so-path=/usr/lib/hailo-post-processes/libyolo_post
.so qos=false '!' queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 '!' hailooverlay '!' queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 '!' videoconvert '!' fpsdisplaysink 'video-sink='\''autovideosink'\''' te
xt-overlay=false sync=false -v
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 2, dropped: 0, current: 3.89, average: 3.89

The "media-ctl" and "v4l2-ctl" commands configure the following MIPI capture pipeline.

Press CTRL-C to stop the gstreamer pipeline.

As expected, this is the example with the least performance, since there is significant computation being done on the CPU for color format conversion (ie. videoconvert).

Next, let's try outputting directly to the DP monitor (DRM driver).

root@zub1cg-sbc-dualcam-2022-2:~/apps/detection# ./zub1cg_dualcam_hailo8_detection.sh --mode primary --sink dp
WARNING: format not set: using default 'yuv' format
WARNING: output resolution not set: using default '1280x720' resolution
Run Camera with: mode=primary, sink=dp, output resolution=1280x720, format=yuv
+ media-ctl -d /dev/media0 -V ''\''ap1302.0-003c'\'':2 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0000000.mipi_csi2_rx_subsystem'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0000000.mipi_csi2_rx_subsystem'\'':1 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0020000.v_proc_ss'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0020000.v_proc_ss'\'':1 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0040000.v_proc_ss'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0040000.v_proc_ss'\'':1 [fmt:UYVY8_1X16/1280x720 field:none]'
+ set +x
+ media-ctl -d /dev/media0 -l ''\''0-003c.ar0144.0'\'':0 -> '\''ap1302.0-003c'\'':0[1]'
+ media-ctl -d /dev/media0 -l ''\''0-003c.ar0144.1'\'':0 -> '\''ap1302.0-003c'\'':1[0]'
+ set +x
Detected AR0144 - disabling AWB
Detected AR0144 - setting brightness
trying to open device 'i915'...done
setting mode 1280x720-60.00Hz on connectors 43, crtc 41
testing 1280x720@YUYV overlay plane 39
+ gst-launch-1.0 v4l2src device=/dev/video0 io-mode=dmabuf '!' 'video/x-raw, width=1280, height=720, format=YUY2, framerate=60/1' '!' queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 '!' hailonet hef-path=/home/root/apps/detectio
n/resources/yolov5m_yuv.hef '!' queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 '!' hailofilter function-name=yolov5 config-path=/home/root/apps/detection/resources/configs/yolov5.json so-path=/usr/lib/hailo-post-processes/libyolo_post
.so qos=false '!' queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 '!' hailooverlay '!' queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 '!' fpsdisplaysink 'video-sink='\''kmssink' plane-id=39 bus-id=fd4a0000.d
isplay 'render-rectangle="<0,0,1280,720>"'\''' fullscreen-overlay=true sync=false -v
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstTextOverlay:fps-display-text-overlay: text = rendered: 796, dropped: 0, current: 28.36, average: 25.36
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 796, dropped: 0, current: 28.36, average: 25.36

Notice this time, the performance is much better, since there are no longer any color format conversions being executed on the CPU.

If we remove the output, we will get the highest performance achievable:

root@zub1cg-sbc-dualcam-2022-2:~/apps/detection# ./zub1cg_dualcam_hailo8_detection.sh --mode primary --sink fake
WARNING: format not set: using default 'yuv' format
WARNING: output resolution not set: using default '1280x720' resolution
Run Camera with: mode=primary, sink=fake, output resolution=1280x720, format=yuv
+ media-ctl -d /dev/media0 -V ''\''ap1302.0-003c'\'':2 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0000000.mipi_csi2_rx_subsystem'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0000000.mipi_csi2_rx_subsystem'\'':1 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0020000.v_proc_ss'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0020000.v_proc_ss'\'':1 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0040000.v_proc_ss'\'':0 [fmt:UYVY8_1X16/1280x800 field:none]'
+ media-ctl -d /dev/media0 -V ''\''b0040000.v_proc_ss'\'':1 [fmt:UYVY8_1X16/1280x720 field:none]'
+ set +x
+ media-ctl -d /dev/media0 -l ''\''0-003c.ar0144.0'\'':0 -> '\''ap1302.0-003c'\'':0[1]'
+ media-ctl -d /dev/media0 -l ''\''0-003c.ar0144.1'\'':0 -> '\''ap1302.0-003c'\'':1[0]'
+ set +x
Detected AR0144 - disabling AWB
Detected AR0144 - setting brightness
+ gst-launch-1.0 v4l2src device=/dev/video0 io-mode=dmabuf '!' 'video/x-raw, width=1280, height=720, format=YUY2, framerate=60/1' '!' queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 '!' hailonet hef-path=/home/root/apps/detectio
n/resources/yolov5m_yuv.hef '!' queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 '!' hailofilter function-name=yolov5 config-path=/home/root/apps/detection/resources/configs/yolov5.json so-path=/usr/lib/hailo-post-processes/libyolo_post
.so qos=false '!' queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 '!' hailooverlay '!' queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 '!' fpsdisplaysink 'video-sink='\''fakevideosink'\''' text-overlay=false
sync=false -v
...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1136, dropped: 0, current: 32.37, average: 31.77

This completes our Hailo-8 integration for the ZUBoard Dual Camera design.

We already know that a real application consists of more than just the deep learning models. There is also the pre-processing and post-processing to consider.

By leveraging the ZU1 device's PL for the pre-processing, and the Hailo-8 module for the deep learning models, we were able to achieve real-time performance in a realistic camera to display application.

So It Begins

We now have a working Hailo-8 module, the time has come to run some benchmarks. Last time I felt this level of anticipation, was before the battle of helm's deep...

Benchmarking the B128 DPU

Benchmarking is only meaningful if we have a baseline to compare against.

In this project, the baseline is our ZUBoard Dual Camera design. For this design, the B128 DPU is the DPU that fits inside the remaining resources of the PL when we have the MIPI capture pipeline for our HSIO DualCam module.

In order to run these benchmarks, I will defer back to the SD image from my previous ZUBoard 2022.2 project:

http://avnet.me/vitis-ai-3.0-robot-control

The setup for this set of benchmarks is shown below:

ZUBoard Dual Camera design - B128 Benchmarking Setup

For convenience, we will configure the image to boot with the dualcam-dpu design:

root@zub1cg-sbc-2022-2:~# xmutil listapps
             Accelerator          Accel_type      #slots(PL+AIE)         Active_slot

avnet-zub1cg-dualcam-dpu            XRT_FLAT               (0+0)                  -1
    avnet-zub1cg-dualcam            XRT_FLAT               (0+0)                  -1
       avnet-zub1cg-base            XRT_FLAT               (0+0)                  -1
  avnet-zub1cg-benchmark            XRT_FLAT               (0+0)                  0,


root@zub1cg-sbc-2022-2:~# cat /etc/dfx-mgrd/default_firmware
avnet-zub1cg-benchmark

root@zub1cg-sbc-2022-2:~# echo avnet-zub1cg-dualcam-dpu > /etc/dfx-mgrd/default_firmware

root@zub1cg-sbc-2022-2:~# cat /etc/dfx-mgrd/default_firmware
avnet-zub1cg-dualcam-dpu

We also want to configure the image to use the models pre-compiled for the B128 DPU:

root@zub1cg-sbc-2022-2:~# ls -la /usr/share/vitis_ai_library
total 44
drwxr-xr-x   6 root root  4096 Nov 20 16:41 .
drwxr-xr-x 327 root root 16384 Mar  9  2018 ..
lrwxrwxrwx   1 root root    14 Nov 20 16:41 models -> models.b512-lr
drwxr-xr-x  40 root root  4096 May  6  2023 models.b128-lr
drwxr-xr-x 144 root root 12288 Apr 20  2023 models.b512-lr
drwxr-xr-x   5 root root  4096 Mar  9  2018 samples
drwxr-xr-x  71 root root  4096 Mar  9  2018 test

root@zub1cg-sbc-2022-2:~# rm /usr/share/vitis_ai_library/models
root@zub1cg-sbc-2022-2:~# ln -sf models.b128-lr /usr/share/vitis_ai_library/models

root@zub1cg-sbc-2022-2:~# ls -la /usr/share/vitis_ai_library
total 44
drwxr-xr-x   6 root root  4096 Nov 20 16:41 .
drwxr-xr-x 327 root root 16384 Mar  9  2018 ..
lrwxrwxrwx   1 root root    14 Nov 20 16:41 models -> models.b128-lr
drwxr-xr-x  40 root root  4096 May  6  2023 models.b128-lr
drwxr-xr-x 144 root root 12288 Apr 20  2023 models.b512-lr
drwxr-xr-x   5 root root  4096 Mar  9  2018 samples
drwxr-xr-x  71 root root  4096 Mar  9  2018 test

Finally, we can reboot the design:

root@zub1cg-sbc-2022-2:~# reboot

After reboot, we can verify that the ZUBoard has booted with the dualcam-dpu as follows:

root@zub1cg-sbc-2022-2:~# xmutil listapps
             Accelerator          Accel_type      #slots(PL+AIE)         Active_slot

avnet-zub1cg-dualcam-dpu            XRT_FLAT               (0+0)                  0,
    avnet-zub1cg-dualcam            XRT_FLAT               (0+0)                  -1
       avnet-zub1cg-base            XRT_FLAT               (0+0)                  -1
  avnet-zub1cg-benchmark            XRT_FLAT               (0+0)                  -1

root@zub1cg-sbc-2022-2:~# ls -la /usr/share/vitis_ai_library/models
lrwxrwxrwx 1 root root 14 Nov 20 16:41 /usr/share/vitis_ai_library/models -> models.b128-lr

The major challenge with benchmarking the B128 DPU is that I was not able to compile most of the Vitis-AI model zoo for this architecture.

I was able to compile the following, which have equivalent models in the Hailo model zoo.

resnet50
mobilenet v1

root@zub1cg-sbc-2022-2:~/benchmarking_b128# source ./benchmark_power8.sh
===========================================================
resnet50_tf2 classification classification
===========================================================
/home/root/Vitis-AI/examples/vai_library/samples/classification
-----------------------------------------------------------
./test_performance_classification resnet50_tf2 ./test_performance_classification.list -s 40 -t 1
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:43:04.299680   984 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:43:04.300307   984 benchmark.hpp:214] waiting for 0/40 seconds, 1 threads running
I1120 19:43:14.300457   984 benchmark.hpp:214] waiting for 10/40 seconds, 1 threads running
I1120 19:43:24.300678   984 benchmark.hpp:214] waiting for 20/40 seconds, 1 threads running
I1120 19:43:34.300899   984 benchmark.hpp:214] waiting for 30/40 seconds, 1 threads running
I1120 19:43:44.301200   984 benchmark.hpp:222] waiting for threads terminated
FPS=5.53132
E2E_MEAN=180676
DPU_MEAN=180023
-----------------------------------------------------------
IDLE
-----------------------------------------------------------
./test_performance_classification resnet50_tf2 ./test_performance_classification.list -s 40 -t 2
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:44:25.529937   990 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:44:25.722285   990 benchmark.hpp:214] waiting for 0/40 seconds, 2 threads running
I1120 19:44:35.722501   990 benchmark.hpp:214] waiting for 10/40 seconds, 2 threads running
I1120 19:44:45.722723   990 benchmark.hpp:214] waiting for 20/40 seconds, 2 threads running
I1120 19:44:55.722946   990 benchmark.hpp:214] waiting for 30/40 seconds, 2 threads running
I1120 19:45:05.723246   990 benchmark.hpp:222] waiting for threads terminated
FPS=5.55263
-----------------------------------------------------------
IDLE
-----------------------------------------------------------
./test_performance_classification resnet50_tf2 ./test_performance_classification.list -s 40 -t 4
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:45:47.175784   994 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:45:47.754333   994 benchmark.hpp:214] waiting for 0/40 seconds, 4 threads running
I1120 19:45:57.755023   994 benchmark.hpp:214] waiting for 10/40 seconds, 4 threads running
I1120 19:46:07.755244   994 benchmark.hpp:214] waiting for 20/40 seconds, 4 threads running
I1120 19:46:17.755471   994 benchmark.hpp:214] waiting for 30/40 seconds, 4 threads running
I1120 19:46:27.755781   994 benchmark.hpp:222] waiting for threads terminated
FPS=5.54796
-----------------------------------------------------------
IDLE
-----------------------------------------------------------
./test_performance_classification resnet50_tf2 ./test_performance_classification.list -s 40 -t 8
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:47:09.483896  1001 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:47:10.621503  1001 benchmark.hpp:214] waiting for 0/40 seconds, 8 threads running
I1120 19:47:20.621737  1001 benchmark.hpp:214] waiting for 10/40 seconds, 8 threads running
I1120 19:47:30.621934  1001 benchmark.hpp:214] waiting for 20/40 seconds, 8 threads running
I1120 19:47:40.622140  1001 benchmark.hpp:214] waiting for 30/40 seconds, 8 threads running
I1120 19:47:50.622427  1001 benchmark.hpp:222] waiting for threads terminated
FPS=5.54612
-----------------------------------------------------------
IDLE
===========================================================
mobilenet_1_0_224_tf2 classification classification
===========================================================
/home/root/Vitis-AI/examples/vai_library/samples/classification
-----------------------------------------------------------
./test_performance_classification mobilenet_1_0_224_tf2 ./test_performance_classification.list -s 40 -t 1
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:48:32.787725  1011 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:48:32.788704  1011 benchmark.hpp:214] waiting for 0/40 seconds, 1 threads running
I1120 19:48:42.788931  1011 benchmark.hpp:214] waiting for 10/40 seconds, 1 threads running
I1120 19:48:52.789491  1011 benchmark.hpp:214] waiting for 20/40 seconds, 1 threads running
I1120 19:49:02.789690  1011 benchmark.hpp:214] waiting for 30/40 seconds, 1 threads running
I1120 19:49:12.789969  1011 benchmark.hpp:222] waiting for threads terminated
FPS=20.4987
E2E_MEAN=48771.8
DPU_MEAN=48223.7
-----------------------------------------------------------
IDLE
-----------------------------------------------------------
./test_performance_classification mobilenet_1_0_224_tf2 ./test_performance_classification.list -s 40 -t 2
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:49:53.278578  1017 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:49:53.319661  1017 benchmark.hpp:214] waiting for 0/40 seconds, 2 threads running
I1120 19:50:03.319864  1017 benchmark.hpp:214] waiting for 10/40 seconds, 2 threads running
I1120 19:50:13.320076  1017 benchmark.hpp:214] waiting for 20/40 seconds, 2 threads running
I1120 19:50:23.320278  1017 benchmark.hpp:214] waiting for 30/40 seconds, 2 threads running
I1120 19:50:33.320550  1017 benchmark.hpp:222] waiting for threads terminated
FPS=20.7599
-----------------------------------------------------------
IDLE
-----------------------------------------------------------
./test_performance_classification mobilenet_1_0_224_tf2 ./test_performance_classification.list -s 40 -t 4
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:51:13.815217  1021 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:51:13.934170  1021 benchmark.hpp:214] waiting for 0/40 seconds, 4 threads running
I1120 19:51:23.934732  1021 benchmark.hpp:214] waiting for 10/40 seconds, 4 threads running
I1120 19:51:33.934931  1021 benchmark.hpp:214] waiting for 20/40 seconds, 4 threads running
I1120 19:51:43.935127  1021 benchmark.hpp:214] waiting for 30/40 seconds, 4 threads running
I1120 19:51:53.935407  1021 benchmark.hpp:222] waiting for threads terminated
FPS=20.7532
-----------------------------------------------------------
IDLE
-----------------------------------------------------------
./test_performance_classification mobilenet_1_0_224_tf2 ./test_performance_classification.list -s 40 -t 8
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:52:34.569006  1028 benchmark.hpp:187] writing report to <STDOUT>
I1120 19:52:34.851023  1028 benchmark.hpp:214] waiting for 0/40 seconds, 8 threads running
I1120 19:52:44.851577  1028 benchmark.hpp:214] waiting for 10/40 seconds, 8 threads running
I1120 19:52:54.851781  1028 benchmark.hpp:214] waiting for 20/40 seconds, 8 threads running
I1120 19:53:04.851984  1028 benchmark.hpp:214] waiting for 30/40 seconds, 8 threads running
I1120 19:53:14.852298  1028 benchmark.hpp:222] waiting for threads terminated
FPS=20.7238
-----------------------------------------------------------
IDLE
root@zub1cg-sbc-2022-2:~/Vitis-AI/examples/vai_library/samples/classification#

The power measurements were performed manually with a power meter directly at the electrical outlet.

ZUBoard Dual Camera design - B128 Power Measurements

Benchmarking the Hailo-8

The setup for this set of benchmarks is shown below:

ZUBoard Dual Camera design - Hailo-8 Benchmarking Setup

Although we have already been running some benchmarks, we have yet to compare the performance with the Vitis-AI B128 DPU to validate the theoretical 83X improvement.

For this purpose, let's download more models for inference on Hailo-8:

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/mobilenet_v1.hef

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/mobilenet_v2_1.0.hef

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/mobilenet_v3.hef

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/mobilenet_v3_large_minimalistic.hef

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/ssd_mobilenet_v1.hef

wget https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.9.0/ssd_mobilenet_v2.hef

Now let's run the benchmarks for the following models:

resnet50
mobilenet v1
mobilenet v2
SSD mobilenet v1
SSD mobilenet v2

root@zub1cg-sbc-dualcam-2022-2:~# hailortcli benchmark resnet_v1_50.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network resnet_v1_50/resnet_v1_50: 100% | 19684 | FPS: 1312.20 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network resnet_v1_50/resnet_v1_50: 100% | 19686 | FPS: 1312.33 | ETA: 00:00:00
Measuring HW Latency
Network resnet_v1_50/resnet_v1_50: 100% | 4934 | HW Latency: 2.85 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 1312.21
        (streaming)               = 1312.35
Latency (hw)                      = 2.85164 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 3.84106 W
                          (max)     = 3.87137 W

root@zub1cg-sbc-dualcam-2022-2:~# hailortcli benchmark mobilenet_v1.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network mobilenet_v1/mobilenet_v1: 100% | 52360 | FPS: 3490.48 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network mobilenet_v1/mobilenet_v1: 100% | 52360 | FPS: 3490.45 | ETA: 00:00:00
Measuring HW Latency
Network mobilenet_v1/mobilenet_v1: 100% | 9706 | HW Latency: 1.30 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 3490.53
        (streaming)               = 3490.49
Latency (hw)                      = 1.29576 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 3.32183 W
                          (max)     = 3.34346 W

root@zub1cg-sbc-dualcam-2022-2:~# hailortcli benchmark mobilenet_v2_1.0.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network mobilenet_v2_1_0/mobilenet_v2_1_0: 100% | 36669 | FPS: 2444.44 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network mobilenet_v2_1_0/mobilenet_v2_1_0: 100% | 36668 | FPS: 2444.38 | ETA: 00:00:00
Measuring HW Latency
Network mobilenet_v2_1_0/mobilenet_v2_1_0: 100% | 7114 | HW Latency: 1.86 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 2444.48
        (streaming)               = 2444.41
Latency (hw)                      = 1.85706 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 2.27321 W
                          (max)     = 2.27998 W

root@zub1cg-sbc-dualcam-2022-2:~# hailortcli benchmark ssd_mobilenet_v1.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network ssd_mobilenet_v1/ssd_mobilenet_v1: 100% | 4816 | FPS: 321.04 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network ssd_mobilenet_v1/ssd_mobilenet_v1: 100% | 5382 | FPS: 358.77 | ETA: 00:00:00
Measuring HW Latency
[HailoRT] [warning] HW Latency measurement is not supported on NMS networks
Network ssd_mobilenet_v1/ssd_mobilenet_v1: 100% | 2525 | HW Latency: NaN | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 316.58
        (streaming)               = 358.774
Device 0000:01:00.0:
  Power in streaming mode (average) = 1.49031 W
                          (max)     = 1.79797 W

root@zub1cg-sbc-dualcam-2022-2:~# hailortcli benchmark ssd_mobilenet_v2.hef
Starting Measurements...
Measuring FPS in hw_only mode
Network ssd_mobilenet_v2/ssd_mobilenet_v2: 100% | 1753 | FPS: 116.86 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network ssd_mobilenet_v2/ssd_mobilenet_v2: 100% | 1752 | FPS: 116.79 | ETA: 00:00:00
Measuring HW Latency
[HailoRT] [warning] HW Latency measurement is not supported on NMS networks
Network ssd_mobilenet_v2/ssd_mobilenet_v2: 100% | 1752 | HW Latency: NaN | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 116.861
        (streaming)               = 116.79
Device 0000:01:00.0:
  Power in streaming mode (average) = 1.06451 W
                          (max)     = 1.0673 W
root@zub1cg-sbc-dualcam-2022-2:~#

The "benchmark" command only measures the latency for the model. In order to get the "overall latency", we need to use the "run" command with the "--measure-latency" and "--measure-overall-latency" options.

root@zub1cg-sbc-2022-2:~/hailo_benchmarks# hailortcli run --measure-latency --measure-overall-latency resnet_v1_50.hef                         
Running streaming inference (resnet_v1_50.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network resnet_v1_50/resnet_v1_50: 100% | 1654 | HW Latency: 2.85 ms | ETA: 00:00:00
> Inference result:
 Network group: resnet_v1_50
    Frames count: 1654
    HW Latency: 2.85 ms
    Overall Latency: 3.00 ms

root@zub1cg-sbc-2022-2:~/hailo_benchmarks# hailortcli run --measure-latency --measure-overall-latency mobilenet_v1.hef
Running streaming inference (mobilenet_v1.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network mobilenet_v1/mobilenet_v1: 100% | 3262 | HW Latency: 1.30 ms | ETA: 00:00:00
> Inference result:
 Network group: mobilenet_v1
    Frames count: 3262
    HW Latency: 1.30 ms
    Overall Latency: 1.51 ms

root@zub1cg-sbc-2022-2:~/hailo_benchmarks# hailortcli run --measure-latency --measure-overall-latency mobilenet_v2_1.0.hef
Running streaming inference (mobilenet_v2_1.0.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network mobilenet_v2_1_0/mobilenet_v2_1_0: 100% | 2400 | HW Latency: 1.86 ms | ETA: 00:00:00
> Inference result:
 Network group: mobilenet_v2_1_0
    Frames count: 2400
    HW Latency: 1.86 ms
    Overall Latency: 2.07 ms

root@zub1cg-sbc-2022-2:~/hailo_benchmarks# hailortcli run --measure-latency --measure-overall-latency ssd_mobilenet_v1.hef
Running streaming inference (ssd_mobilenet_v1.hef):
  Transform data: true
    Type:      auto
    Quantized: true
[HailoRT] [warning] HW Latency measurement is not supported on NMS networks
Network ssd_mobilenet_v1/ssd_mobilenet_v1: 100% | 843 | HW Latency: NaN | ETA: 00:00:00
> Inference result:
 Network group: ssd_mobilenet_v1
    Frames count: 843
    Overall Latency: 5.91 ms

root@zub1cg-sbc-2022-2:~/hailo_benchmarks# hailortcli run --measure-latency --measure-overall-latency ssd_mobilenet_v2.hef
Running streaming inference (ssd_mobilenet_v2.hef):
  Transform data: true
    Type:      auto
    Quantized: true
[HailoRT] [warning] HW Latency measurement is not supported on NMS networks
Network ssd_mobilenet_v2/ssd_mobilenet_v2: 100% | 584 | HW Latency: NaN | ETA: 00:00:00
> Inference result:
 Network group: ssd_mobilenet_v2
    Frames count: 584
    Overall Latency: 8.53 ms

root@zub1cg-sbc-2022-2:~/hailo_benchmarks#

The Final Verdict

For the ZUBoard Dual Camera design, the benchmarking provides comparative results between the B128 DPU and the Hailo-8 acceleration module:

performance (FPS)
latency (msec)
power (W)
performance/W (FPS/W)

The performance (FPS) results were truly impressive, with the Hailo-8 delivering an average of 200x more FPS. Although this is only 30% of the 684x increase we were expecting theoretically, this is still a very big improvement in performance.

B128 versus Hailo-8 - Performance (FPS)

The latency (msec) results were equally impressive and unexpected, with the Hailo-8 providing 46x less latency.

B128 versus Hailo-8 - Latency (msec)

When idle, the Hailo-8 based design consumes 0.5W more power than the B128 version. When benchmarking a model, however, the Hailo-8 module consumed 20 times more power.

B128 versus Hailo-8 - Power (W)

The performance/W (FPS/W) results compare the relative efficiency of each solution, with Hailo-8 being 20-24 more efficient.

B128 versus Hallo-8 - Performance/W (FPS/W)

Conclusion

I hope this project inspires you to create innovative applications on ZUBoard.

Don't forget to check out the following projects that describe how to add support for ROS2 in your petalinux 2022.2 projects:

ZUBoard - Adding Support for ROS2

If this project sparks other ideas or questions that you want to share with the community, let me know in the comments below.

Acknowledgements

I want to thank my co-author Gianluca Filippini (EBV) for his pioneering work with the Hailo-8 AI Accelerator, and bringing this marvel to my attention.

I also want to thank Tom Curran for the M2 support on ZUBoard via the M2 HSIO.

Version History

2023/11/21 - Initial Version
2024/01/16 - Update Hailo-8 latency results for overall latency

zub1cg_dualcam_hailo8_detection.sh

#!/bin/bash
set -e

CURRENT_DIR="$(dirname "$(realpath "${BASH_SOURCE[0]}")")"

function init_variables() {
    readonly RESOURCES_DIR="${CURRENT_DIR}/resources"
    readonly POSTPROCESS_DIR="/usr/lib/hailo-post-processes"
    readonly DEFAULT_POSTPROCESS_SO="$POSTPROCESS_DIR/libyolo_post.so"
    readonly DEFAULT_NETWORK_NAME="yolov5"
    readonly DEFAULT_VIDEO_SOURCE="/dev/video0"
    readonly DEFAULT_HEF_PATH="${RESOURCES_DIR}/${DEFAULT_NETWORK_NAME}m_yuv.hef"
    readonly DEFAULT_JSON_CONFIG_PATH="$RESOURCES_DIR/configs/yolov5.json" 

    postprocess_so=$DEFAULT_POSTPROCESS_SO
    network_name=$DEFAULT_NETWORK_NAME
    input_source=$DEFAULT_VIDEO_SOURCE
    hef_path=$DEFAULT_HEF_PATH
    json_config_path=$DEFAULT_JSON_CONFIG_PATH 

    print_gst_launch_only=false
    additional_parameters=""
}

init_variables $@

# Script inputs
print_usage() {
    echo "USAGE: zub1cg_dualcam_hailo8_detection.sh [OPTIONS]"
    echo " -m|--mode           mode must be 'dual', 'primary' or 'secondary'"
    echo " -s|--sink           sink must be 'dp', 'window' or 'fake'"
    echo " -f|--format         output format must be 'yuv' or 'rgb'"
    echo " -w|--width          output width"
    echo " -h|--height         output height"
}

while [ "$1" != "" ]; do
    case "$1" in
        -m|--mode)
            case "$2" in
                "dual" | "primary" | "secondary")
                    mode=$2
                ;;
                *)
                    echo "ERROR: Unknown mode specified";
                    print_usage
                    exit 1
                ;;
            esac
            shift
            ;;
        -s|--sink)
            case "$2" in
                "dp" | "fake" | "window")
                    sink=$2
                ;;
                *)
                    echo "ERROR: Unknown sink specified";
                    print_usage
                    exit 1
                ;;
            esac
            shift
            ;;
        -f|--format)
            case "$2" in
                "yuv" | "rgb")
                    format=$2
                ;;
                *)
                    echo "ERROR: Unknown format specified";
                    print_usage
                    exit 1
                ;;
            esac
            shift
            ;;
        -w|--width)
            width=$2
            shift
            ;;
        -h|--height)
            height=$2
            shift
            ;;
        *) # unknown argument
            echo "unknown arg $1"
            print_usage
            exit 1
            ;;
    esac
    shift
done

if [ -z $mode ]
then
    echo "WARNING: mode not set: using default 'dual' mode";
    mode="dual"
fi

if [ -z $sink ]
then
    echo "WARNING: sink not set: using default 'window' mode";
    sink="window"
fi

if [ -z $format ]
then
    echo "WARNING: format not set: using default 'yuv' format";
    format="yuv"
fi

if [ -z $width ] || [ -z $height ]
then
    echo "WARNING: output resolution not set: using default '1280x720' resolution";
    width=1280
    height=720
fi

output_res=${width}x${height}

echo -e "\n\nRun Camera with: mode=$mode, sink=$sink, output resolution=$output_res, format=$format\n"

# Detect MIPI capture pipeline devices
MEDIA_DEV=/dev/$(ls /sys/devices/platform/amba_pl@0/amba_pl@0\:vcap_CAPTURE_PIPELINE_v_proc_ss_scaler_0/ | grep media)
VIDEO_DEV=/dev/$(ls /sys/devices/platform/amba_pl@0/amba_pl@0\:vcap_CAPTURE_PIPELINE_v_proc_ss_scaler_0/video4linux/ | grep video)

CSC_DEV=$(ls /sys/bus/platform/drivers/xilinx-vpss-csc/  | grep v_proc_ss)
SCALER_DEV=$(ls /sys/bus/platform/drivers/xilinx-vpss-scaler/  | grep v_proc_ss)
MIPI_DEV=$(ls /sys/bus/platform/drivers/xilinx-csi2rxss/  | grep mipi_csi2_rx_subsystem)


MODEL=$(tr -d '\0' < $(find /sys/firmware/devicetree -name "sensor,model"))

BUS_DEV=$(ls /sys/bus/i2c/drivers/ap1302 | grep 003c)

AP1302_I2C="${BUS_DEV}"
AP1302_DEV="ap1302.${AP1302_I2C}"
AP1302_SENSOR="${AP1302_I2C}.${MODEL}"

sensor_width_hex=$(xxd -ps -l 4 $(find /sys/firmware/devicetree -name "sensor,resolution"))
sensor_height_hex=$(xxd -ps -l 4 -s 4 $(find /sys/firmware/devicetree -name "sensor,resolution"))
sensor_width=`printf "%d\n" 0x$sensor_width_hex`
sensor_height=`printf "%d\n" 0x$sensor_height_hex`

if [ "$mode" = "dual" ]
then
    CAMERA_RESOLUTION=$((sensor_width*2))x${sensor_height}
else
    CAMERA_RESOLUTION=${sensor_width}x${sensor_height}
fi

if [ "$format" = "yuv" ]
then
    GST_FORMAT="YUY2"
    MEDIA_FORMAT="UYVY8_1X16"
	src_convert=""
else
    GST_FORMAT="BGR"
    MEDIA_FORMAT="RBG24"
	src_convert="videoconvert ! "
fi

# Configure MIPI capture pipeline for RGB
set -x
media-ctl -d ${MEDIA_DEV} -V "'${AP1302_DEV}':2 [fmt:UYVY8_1X16/$CAMERA_RESOLUTION field:none]"
media-ctl -d ${MEDIA_DEV} -V "'${MIPI_DEV}':0 [fmt:UYVY8_1X16/$CAMERA_RESOLUTION field:none]"
media-ctl -d ${MEDIA_DEV} -V "'${MIPI_DEV}':1 [fmt:UYVY8_1X16/$CAMERA_RESOLUTION field:none]"
media-ctl -d ${MEDIA_DEV} -V "'${CSC_DEV}':0 [fmt:UYVY8_1X16/$CAMERA_RESOLUTION field:none]"
media-ctl -d ${MEDIA_DEV} -V "'${CSC_DEV}':1 [fmt:$MEDIA_FORMAT/$CAMERA_RESOLUTION field:none]"
media-ctl -d ${MEDIA_DEV} -V "'${SCALER_DEV}':0 [fmt:$MEDIA_FORMAT/$CAMERA_RESOLUTION field:none]"
media-ctl -d ${MEDIA_DEV} -V "'${SCALER_DEV}':1 [fmt:$MEDIA_FORMAT/$output_res field:none]"
set +x

# Setup Sensors links to AP1302
link1_enable=1
link2_enable=1

if [ "$mode" = "primary" ]
then
    link2_enable=0
fi

if [ "$mode" = "secondary" ]
then
    link1_enable=0
fi
set -x
media-ctl -d ${MEDIA_DEV} -l "'${AP1302_SENSOR}.0':0 -> '${AP1302_DEV}':0[$link1_enable]"
media-ctl -d ${MEDIA_DEV} -l "'${AP1302_SENSOR}.1':0 -> '${AP1302_DEV}':1[$link2_enable]"
set +x

# Turn off AWB for case of AR0144 sensors (monochrome)
if [[ "$MODEL" == "ar0144" ]]; then
	echo "Detected AR0144 - disabling AWB"
	v4l2-ctl --set-ctrl white_balance_auto_preset=0 -d ${VIDEO_DEV}
	echo "Detected AR0144 - setting brightness"
	v4l2-ctl --set-ctrl brightness=256 -d ${VIDEO_DEV}
fi
if [[ "$MODEL" == "ar1335" ]]; then
	echo "Detected AR1335 - enabling AWB"
	v4l2-ctl --set-ctrl white_balance_auto_preset=1 -d ${VIDEO_DEV}
fi

# Setup sink
case "$sink" in
    "dp")
        modetest -D fd4a0000.display -s 43@41:$output_res@RG16 -P 39@41:$output_res@YUYV -w 40:alpha:0 &
        sleep 1
		sink_convert=""
        sink_cmd="fpsdisplaysink video-sink='kmssink plane-id=39 bus-id=fd4a0000.display render-rectangle=\"<0,0,$width,$height>\"' fullscreen-overlay=true sync=false"
    ;;
    "window")
		sink_convert="videoconvert ! "
        sink_cmd="fpsdisplaysink video-sink='autovideosink' text-overlay=false sync=false"
    ;;
    "fake")
		sink_convert=""
        sink_cmd="fpsdisplaysink video-sink='fakevideosink' text-overlay=false sync=false"
    ;;
    *)
        echo "ERROR: Unknown sink specified";
        print_usage
        exit 1
    ;;
esac


# Start Pipeline
set -x
gst-launch-1.0 v4l2src device=${VIDEO_DEV} io-mode="dmabuf" ! \
	"video/x-raw, width=$width, height=$height, format=$GST_FORMAT, framerate=60/1" ! \
	$src_convert \
    queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 ! \
    hailonet hef-path=$hef_path ! \
    queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 ! \
    hailofilter function-name=$network_name config-path=$json_config_path so-path=$postprocess_so qos=false ! \
    queue leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0 ! \
    hailooverlay ! \
    queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 ! \
    $sink_convert \
	$sink_cmd \
	-v
set +x

Credits

Mario Bergeron

54 projects • 295 followers

Mario Bergeron is a Technical Marketing Engineer working at Tria, specializing in embedded vision and machine learning.

Contact

Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Going beyond the Programmable Logic

Introducing the HSIO M2 module

Introducing the Hailo-8 AI Acceleration module

Getting Hailo-8 Up and Running

Milestone 1 - Hailo-8 detected on PCI express bus

Milestone 2 - Hailo-8 detected by driver and runtime

Milestone 3 - Hailo-8 working with TAPPAS

So It Begins

Benchmarking the B128 DPU

Benchmarking the Hailo-8

The Final Verdict

Conclusion

Acknowledgements

Version History

Code

zub1cg_dualcam_hailo8_detection.sh

Credits

Mario Bergeron

gianluca filippini

Comments

Embed the widget on your own site

Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Going beyond the Programmable Logic

Introducing the HSIO M2 module

Introducing the Hailo-8 AI Acceleration module

Getting Hailo-8 Up and Running

Milestone 1 - Hailo-8 detected on PCI express bus

Milestone 2 - Hailo-8 detected by driver and runtime

Milestone 3 - Hailo-8 working with TAPPAS

So It Begins

Benchmarking the B128 DPU

Benchmarking the Hailo-8

The Final Verdict

Conclusion

Acknowledgements

Version History

Code

zub1cg_dualcam_hailo8_detection.sh

Credits

Mario Bergeron

gianluca filippini

Comments

Related channels and tags