Learning Curveand Challenges!
I initially entered the Adaptive Computing Challenge with a project idea of creating situational awareness around your house: Home Assist Situational Awareness (HASA). Using an object detection model, a streaming camera and the Kria KV260 Vision AI kit, the intent was to identify events of interest around the house (a car/van/truck shows up in your driveway, a person is walking towards your house or is in your driveway, the school bus arrives, etc) and then notify you via text message when that event occurs. I had already done something similar with a Jetson Nano (written in Python) so I figured it wouldn't be that hard to emulate with the Kria KV260. Was I in for a surprise!
I was one of the fortunate ones to receive a Kria KV260 Vision AI Starter Kit as part of the Adaptive Computing Challenge competition. It consisted of the board and the accessory pack, which included a USB cable, HDMI cable, ethernet cable, MicroSD card, camera module, and power adapter.
The first challenge I had was which operating system to use? Soon after the contest kicked off, Xilinx announced that Ubuntu was now an option for the Kria KV260. Given that I had never worked with Petalinux before, I figured a GUI was the safe bet. The downside to that is that the OS was so new, that most of the Xilinx documentation referred to the Petalinux OS, and many of the examples were only compatible with Petalinux (ie Smartcamera). I figured I could work with my own model and develop my own executable.
The second challenge was that the hardware didn't come with Wifi! For better or worse, I run my house off of wireless, and the area where I would be doing the work is not close to a router or access point, so I had to figure out a way to go wireless! I borrowed a Realtek USB Wifi adapter that I had from another piece of hardware and I looked for drivers that would work with the Kria. After a couple hours of web searching and a some dead ends, I finally had a working USB Wifi adapter! Yes!
Once I had the OS loaded and Wifi going, my next step was to test out the NLP Smartvision example. Xilinx has a good Getting Started guide with a comprehensive Wiki on how to load the NLP Smartvision demo. Fortunately, this was the one demo that worked with the Ubuntu OS. Running the demo, I was able to validate that the USB camera that I had (Logitech Brio) plus the MIPI camera module were functioning as expected.
Time to start coding...but whereto start?
Now that I was able to prove out that the hardware was working as intended, I was excited to start my coding adventure. However, I had NO IDEA where to start. I installed the Vitis-AI git and I figured I would look around the folder structure to find any samples or examples that I could leverage. I was excited about the tfsdd sample, but the source code was from my perspective fairly complex (using a lot of templates, and include files in multiple locations) and I didn't know if I was capable enough to extend the functionality for my application. With the KV260 and Ubuntu distro both being so new, I found the documentation mostly tailored to other products and Petalinux, which made progress slow and arduous. Even the model.yaml files for each model didn't have the link for KV260 compatible models, and you had to manually enter in another file path to be able to download a model from the Model Zoo. I know the documentation will eventually catch up; it can be challenging to be some of the first users of the hardware and software.
After some investigation, I initially settled on the video analysis demo, using a Caffe SSD pruned model. The example was fairly concentrated into its own directory so that made leveraging it fairly straightforward. Unfortunately, the Caffe model was intended for traffic detection and only has 4 classes: car, bike, person, and background, but I felt it was enough to be usable for my application. However, as I was starting to write up the project here, I gave one last attempt at looking at the example code for the tfssd example, which uses the TensorFlow SSD Mobilenet V2 COCO model. That model is much more comprehensive and has more of the labels that I was looking for (bus for school bus, trucks, better person detection, etc).
In parallel, I also attempted to test the capability of using an Edge Impulse model since I am familiar with their tools. I was able to export an object detection model to a C++ library and compile an executable on the Kria, but the inference time was slow (~18s) so I really think the model needs to go through the Vitis AI pipeline to be successful.
With my last ditch look at the tfssd example code, I had some success and was able to pull together a working demo using the Reolink E1 Zoom camera over RTSP and the TF SSD Mobilenet V2 COCO model running on the Kria KV260. In addition, I used the bounding box location in relation to the frame to determine if a car/truck/person was close to the house. I didn't want to be notified of every car or truck that drove by, or every person that was walking on the sidewalk. If the threshold was tripped, I used the Twilio messaging service to send me a text message that a vehicle or person was in the driveway, or a bus was detected. Once that text message is received, I can open the Reolink app to confirm the object near the house. Future functionality could include a screenshot of the detection image in the text message.
ResultsAs you can see in the videos below, the Tensorflow SSD Mobilenet model is much more accurate and capable than the Caffe traffic model, especially for the person detection class in my application. That makes sense, given that the intent of the traffic model is the viewpoint of a driving car.
I tried both RGB color input and black and white (for sun bleached-out conditions) and the model performed well in both. When the objects of interest are detected, a text message is sent to alert the home owner:
This project was a challenge for me and took a LOT of time. I don't have much experience with FPGAs, and this was my first foray into AMD-Xilinx hardware. It took me a lot of time and research to learn the Vitis AI Library structure and capabilities. It was really neat looking through the Vitis AI Model Zoo and testing out the examples that were compatible with those models.
Future capability would be augmenting or tailoring a model for my exact needs. I tried working with my own model but was unsuccessful in getting it to work with the Kria. I'm happy that I was able to shift from the SSD traffic model to the SSD Mobilenet V2 model and get much better results. I hope you enjoyed this and feel free to reach out with any questions. Thank you!
Comments