Project Intro
Getting Started with the KR260 🧭
Updating the Boot Firmware
Kria-PYNQ
Overlays
Vitis AI
KR260 demo: ROS2 Perception Node
Yolo
Conclusion

•

Farnaz Baksh

Published August 1, 2024 © MIT

(Almost) Counting Trees w/ KR260 on the (Flying?) Edge

Documenting how I tried porting ML pipelines to the KR260's FPGA for counting trees on the eventual flying edge.

ExpertWork in progressOver 1 day135

(Almost) Counting Trees w/ KR260 on the (Flying?) Edge

Things used in this project

Hardware components

AMD Kria™ KR260 Robotics Starter Kit

Amazing FPGA w/ ARM Cortex-A53 & tons of powerful peripherals, features and examples. Be ready to climb a steep learning curve and have patience jumping through the docs.

Edimax N150 Wi-Fi 4 Nano USB Adapter

Turnkey solution for reliable WiFi. Helps make the dev kit more portable. (Most useful if you travel.)

Software apps and online services

AMD PYNQ Framework

To install on the KR260, follow this tutorial instead: https://github.com/amd/Kria-RoboticsAI After a ~25' install, the provided Jupyter notebook examples build a solid narrative for simplifying dev tasks on the KR260. You also get KV260 examples to your Jupyter and a Vitis-AI v3.5 compatibility patch.

AMD Vitis Unified Software Platform

This framework translates and recompiles your model to make it deployable on the FPGA for inferencing. Vitis-AI runs on your host machine (not KR260) in a Docker container. Use the tf2 cpu to compile common models.

ROS Robot Operating System

Microsoft VS Code

Story

Project Intro

I wanted to learn about the KR260 and how to work with it by preparing and deploying a tree counting model onto its FPGA. The intent was to leverage parallel compute on the edge to combat deforestation. Imagine: autonomous drones can already fly emissions and other monitoring missions, but are limited to monitoring specifically assigned swathes of forested areas.

They have to land, top-up, offload data (if not already live streamed) and resume operations while a separate runtime on the cloud (maybe?) runs an analysis on the collected data. Suppose we could swap the onboard compute for the KR260 then perhaps we could also run the analysis at the same or lower energy cost as cloud processing but right there, on the (flying) edge and also reduce the costs of live streaming data in flights, by reducing it to preprocessed findings instead of pictures. This strategy could result in extended flight times and open the doors to more advanced drone-based forestry monitoring.

Ultimately, with this project I aimed to:

explore the KR260 Robotics Starter Kit and
try to run a tree counting model on the FPGA.

Going into this project I knew very little about FPGAs (Field-Programmable Gate Array). Sure, we had a lab years ago at UniLj playing with block diagrams in Vivado, compiling fun LED matrix demos on a Zynq unit. I also played with the Red Pitaya (link) which is a really cool product that leverages an onboard FPGA, packaged as an electronics multitool full of deployable apps like oscilloscopes and network analyzers in one package.

Compared to those experiences, the KR260 differs as a dev kit (link to AMD KR260 landing page), by focusing on robotics and AI applications, with supporting software stacks and built around the K26 SoM (System-on-Module), featuring Cortex-A53 processors and a Xilinx UltraScale+ FPGA. The KR260 kit also has tons of useful peripherals and powerful features for industry ready robotics solutions.

My intention was to learn how to work with the KR260 and then try to build & deploy a smart forestry related application. While working with the platform, I found it well designed and documented. I appreciate how the kit invites you to tinker with it and explore various demos. It also comes with a steep learning curve with many a challenging experience and I enjoyed learning along the way.

Getting Started with the KR260 🧭🎉

To watch the unboxing of the KR260 kit, do check out Farnaz' project page (link).

Initial setup was successful and the instructions I followed are available here. It's your usual flash an SD card (docs say 64GB max but I used 256GB) with a ready to go image (iot-limerick-kria-classic-desktop-2204-20240304-165.img.xz - latest at time of writing) then boot it up to get started.

However, I had to flash the image a couple times though as Balena Etcher and Raspberry Pi Imager both flashed unsuccessfully on my host Ubuntu system due to validation errors, which I didn't want to dive into... Instead I just booted to a Windows 10 system and reflashed with Balena successfully.

1 / 3 • Pulling the KR260 Ubuntu image from https://ubuntu.com/download/amd.

Before booting though, I decided to modify the flashed image by enabling SSH and including credentials for the board to connect to a wireless network via the Edimax nano WiFi4 dongle I plugged in. After waiting a few minutes for the initial boot process I logged in to the Ubuntu 22.04 LTS using VS Code remote explorer, and ran: passwd (default pw = ubuntu) and sudo apt update && sudo apt upgrade to change the root password and update the system. Then I tried to navigate the first major setup hurdle: updating the onboard firmware.

Updating the Boot Firmware

1 / 4 • Finding the right firmware file & instructions took a bit with all the info on the (long) page.

Following the linked guide by AMD-Xilinx here, I downloaded the latest available firmware image file (K26-BootFW-01.02-06140626.bin), after agreeing with AMD's export license, and followed their instructions:

used VS Code remote explorer SFTP (Secure File Transfer Protocol; really appreciate the drag and drop solution) to move the bin firmware file to the KR260, sudo xmutil bootfw_update -i <path to boot.bin>
verify status: sudo xmutil bootfw_status
then sudo reboot
finally, verify the update worked: sudo xmutil bootfw_update -v

Kria-PYNQ

Soon as the OS was installed & firmware updated, I got started with Kria-PYNQ.

First off. What's PYNQ (link)?

PYNQ™ (Python Productivity for Zynq) is an open-source project from AMD® that makes it easier to use Adaptive Computing platforms (ie. FPGAs). You can leverage PYNQ to create high performance apps with:

parallel hardware execution
high frame-rate video processing
hardware accelerated algorithms
real-time signal processing
high bandwidth IO
low latency control

tl;dr: Its a fantastic framework that lets you work in Jupyter in a clean and pythonic (and C/C++), user friendly and cognitively desaturated environment, which is also conveniently familiar to many modern data scientists and ML devs alike.

There's a fantastic workshop with a set of guided tutorial sessions by Xilinx-PYNQ (link) that teaches you how to work with PYNQ using PYNQ-Z1 or PYNQ-Z2 boards but I'm sure there's useful parts also for the KR260 kit.

For the K26 SoM & KV260/KR260 kit specifically though we can install instead the Kria-PYNQ (link) or follow the Kria-RoboticsAI (link) tutorial, both featuring comprehensive ready made repos which feature ready to use overlays.

Overlays

What's an overlay?

Example: KV260 & Vivado bespoke overlay building tutorial (link)

In the context of working with the KR260/KV260/<any FPGA?> and Kria-PYNQ, an overlay is a pre-designed hardware configuration that you load to an FPGA to implement a specific function or set of functions.

tl;dr: Overlays reduce the complexity of FPGA development by providing high level abstractions just like traditional libraries abstract away low level hardware.

What can overlays do?

overlays are a shortcut around deep know-how of hardware design
enable quick prototyping and idea testing on FPGAs
you can make custom overlays and modify existing ones, too
implement signal/image processing, machine learning... as low-level functions, which can be used to accelerate inferencing, video processing or DAQ apps since you access FPGA's superior flexible parallel processing capacity compared to CPU/GPU deployments

PYNQ makes managing overlays that interface to FPGA hardware simpler through a Python-Jupyter environment.

For example, here is the description for the KV260 Base Overlay:

This overlay includes support for the KV260's Raspberry Pi camera and PMOD interfaces. A Digilent Pcam 5C camera can be attached to the KV260 and controlled from Jupyter notebooks. Additionally, a variety of Grove and PMOD devices are supported on the PMOD interface - all controllable from a Xilinx Microblaze processor in programmable logic.

Thankfully, Kria-PYNQ (link) and PYNQ-Peripherals (link) pack many useful overlays.

However, for the most up-to-date experience, look to the Kria-Robotics AIrepository that also includes a compatibility patch for Vitis-AI 3.5 (link).

I feel a bit unlucky to have stumbled on it too late in my project timeline but thoroughly recommend following the included tutorial.

The PYNQ stack takes just shy of the advertised 25 minutes to install. In my experience it seems to mostly fail in case of bad network configs, ie a package server fails to respond in time. In case of problems, it relies on pip and seems fairly robust. I had to rerun the installer script and installed it without issue on the second go.

1 / 3 • Kria-PYNQ Jupyter environment running on my KR260 at kria.local:9090

Provided everything works as intended I still had to get started with my own project. In the traditional sense this means collecting lots of data, cleaning the data (most of the work) then picking an architecture (or preparing your own), training the model, evaluating (repeating, tuning etc) and finally deploying it on the CPU/GPU and... FPGA?

On the KR260 I learned we can deploy models just fine on the available compute without leveraging the FPGA unit and I think that would net us about the performance of a Raspberry Pi 3/4 (maybe?). Instead I'd like to have the model run on the FPGA which sounds like a hard problem but thankfully we have Vitis AI that will help us convert, optimise, deploy and evaluate our model for inferencing on the Deep Neural Processing Unit (DPU, aka, FPGA?).

Vitis AI

Wait. What's Vitis AI?

Vitis™ AI software is a comprehensive AI inference development solution (link). It's a framework designed to help you extend your traditional ML pipeline and deploy recompiled and further optimised models on a wide range of devices. Vitis AI also featuresModel Zoo (link), a collection of optimised, retrainable models and a range of tools to help you evaluate model performance.

1 / 4

tl;dr: Vitis AI helps you translate and inference your models as overlays on the DPU (FPGA).

The official docs (link) explain lots on how to run Vitis AI for various applications and different hardware. Principally, you run Vitis on your host machine, process and compile your models then copy them over to the KR260. Here, though, the Kria-RoboticsAI repo (link) once again details KR260 specific and useful instructions and provides a much needed clarity in addition to a compatibility patch to Vitis AI 3.5 (latest at time of writing). This means we should be able to use the latest Vitis framework for working with our models.

1 / 3 • Expected result running Kria-RoboticsAI tutorial Vitis AI ResNet18 Cifar 10 provided demo.

Following the tutorial, I tried the recommended Vitis 3.5 tensorflow2 cpu docker container and compiled the cifar10 resnet18 xmodel file and tried deploying it on the KR260. And... this is as far as I got. Unfortunately the provided demo didn't work as expected due to a fingerprint issue, much like the one described in this issue posted in March 2024 (link). Going in-depth on this issue I soon realised I'd need to invest a lot more time to understand the underlying issue further.

In the tutorial its mentioned (link) that if we want to compile custom model then they advise to build the container for the host machine's GPU (cuda required?) from scratch - which I tried but it didn't succeed for me.

So I decided at this point to instead shift focus towards getting started with the tree counting solution.

KR260 demo: ROS2 Perception Node

Since drone based deforestation monitoring needs, well, a drone, it made sense to also explore the advertised native ROS2 compatibility. So I tried the provided ROS2 Perception Stack demo (link) which spins up an image_raw resizing node for comparison on the CPU and the FPGA (which runs ~25% faster).

1 / 2 • ROS2-RQT nodegraph of perception demo inferencing image resizing on the FPGA.

Meanwhile, there's also a provided Gazebo sim (run on host machine) scene with a camera capture and moving objects. Running rqt graph helps confirm the node graph and see what's running where.

There's other demo projects available on the page but I opted for this one since it didn't require any hardware modification.

Hypothetically, one could simulate the flying drone atop a couple tree props or an actual RTAB map (if I managed to find one). Since the drone urdf & drivers would ideally include drivers for the onboard camera too (ie. parrot drone SDK supports ROS2 bridge camera feeds), I would've fed the image feed to the FPGA on the KR260, running the tree counting model and returning the number of trees counted in the image to be sent to a database. In real deployment then we would bypass the simulation and feed the real camera feed thru the model.

Yolo.

Counting trees at last. Almost. Nope, trouble here, too!

The plan? Start by looking for a easy to run deep learning model on traditional hardware. Then compile the model for inferencing on the KR260's FPGA.

One approach I started with was to look for papers and articles discussing deep learning based tree counting. But I had trouble finding resources or replication package materials in appendices, zenodo etc. So I shifted focus and started looking instead on GitHub for ready-made sample projects.

The project repository by GitHub user Loki-Silvres, 'Tree-Counting-using-YoloV8' stood out to me because it provides the python scripts for running transfer learning on Yolov8, a recent enough version of the prolific and performant image segmentation model. Loki also provided a reference to the original dataset on Kaggle and his adaptation for this project.

Aerial images of palm trees (link) features 349 images of a palm tree farm in Saudi Arabia. Reference to dataset on kaggle:

Ammar, A., Koubaa, A. and Benjdira, B., 2021. Deep-learning-based automated palm tree counting and geolocation in large farms from aerial geotagged images. Agronomy, 11(8), p.1458. Available on Kaggle online: https://www.kaggle.com/datasets/riotulab/aerial-images-of-palm-trees

1 / 6 • Test image from dataset: ck2euxc9kxgvm07486g2d5pid.jpg (3.88 MB; compressed here).

Investigating Loki's work further though I realised the author hadn't attached a license to their work so I decided to look elsewhere for reference projects.

To count trees, through the dataset authors paper on MDPI (link) and Loki's replication attempt it shows transfer learning a EfficientDet or Yolo (v8) model on the dataset seems to produce decent results. Investigating further, the dataset authors used rather beefy host machine for training (and for inferencing the model, I assume). I wonder how the FPGA xmodel compiled and optimised versions of the models they tested would compare in terms of inferencing performance but I'm sure it would be an impressive comparison.

Regardless, despite the reported results, the dataset features an artificially planted orchard or grove of palm trees in a very organised manner. In contrast typical canopy foliage in temperate mixed woods forests is quite erratic and I fear that makes tree counting much more difficult.

Perhaps by segmenting specific specimens and counting those could help since different species often feature specific canopy patterns. Or we could explore seasonal datasets - some trees drop their leaves after all in autumn and winter months.

Conclusion

Explored the KR260 Robotics Starter Kit and attempt to deploy a tree counting model on its FPGA to support drone-based deforestation monitoring.
Gained practical knowledge of FPGAs and Vitis AI, and navigated challenges such as initial setup, firmware updates, and leveraging the Kria-PYNQ framework.
Successfully set up the KR260, despite some challenges with flashing the SD card, and established SSH and Wi-Fi connectivity.
Updated the onboard firmware using the appropriate tools and instructions.
Installed and utilised Kria-PYNQ, appreciating its Pythonic interface for FPGA programming and its extensive support for high-performance applications.
Experimented with Vitis AI to optimise and deploy models on the FPGA, though faced some issues with provided demos.
Explored ROS2 Perception Stack demo, validating its utility for image processing tasks on the FPGA.
Investigated and attempted to deploy a deep learning model for tree counting, specifically exploring transfer learning with YOLOv8. However, faced challenges with dataset availability and model licensing.

Key Takeaways

Demonstrated potential for edge computing in drone applications, potentially reducing energy costs and improving efficiency.
Newfound appreciation of the steep learning curve associated with FPGA development but found the experience rewarding and informative.
Identified areas for future exploration, including more robust tree counting models and real-world drone integration.

Despite feeling a bit sad for not reaching my outlined goal with this project, I think it still highlights the incredible and diverse capabilities of the KR260 kit for AI and robotics applications. While challenges were encountered, the overall experience was educational and set the stage for further development in edge computing for environmental monitoring.