This project uses FPGA-accelerated machine learning for mobile robot terrain classification. A new terrain dataset, TerraSet, is gathered which contains realistic images with both terrain and background, as a mobile robot sees in a real outdoor application. The ResNet18 classifier is trained on TerraSet and reaches an accuracy of 96% on the test set, indicating that the task has been easy to learn, highly likely due to the small size of the dataset. The ResNet18 model is trained, quantized and compiled for the deployment on the DPU loaded on the Kria KR260 board using PYNQ. This trained model’s predictions are used to adjust the movement of a TurtleBot3 robot hosting a KR260 and a RealSense camera. This is achieved by creating a Robot Operating System (ROS2) node to publish the predictions and use them in a custom navigation plugin that adjusts the robot’s path and velocity to safely reach given waypoints and to avoid hazardous terrains.
IntroductionAutonomous mobile robots need to be able to discern between safe and hazardous terrains, and to adapt their speed and their routes based on the terrain they are encountering.
Terrain classification is important and useful to mobile robots from multiple points of view. It helps robots avoid damage by detecting dangerous terrains, ensuring stability and overall safety during navigation. Knowing the terrain type, the robot can also save energy by changing speeds and choosing alternative, easier to navigate paths. Moreover, when the robot understands the environment and acts appropriately, it gains autonomy. All of these benefits are essential in applications such as search and rescue robots, wildlife and environmental monitoring, and trail mapping.
The aim of our project is to use FPGA-accelerated machine learning to classify images of terrains and to use this classification to adjust the behavior and trajectory of a mobile robot. Our research of related work revealed that there is little work on terrain classification accelerated on FPGA and most of these either use classification based on inertial measurement units or they focus on datasets of aerial/satellite/spectral images. Our idea is targeting mobile robots which traverse a variety of terrain types such as pavement, cobblestone, dirt ground, grass, sand, etc. We have not found any public dataset that has most/all of the categories we target or that fulfills the conditions of our application, therefore, we aimed to build our own dataset. One of the conditions of our application is that most of the images of terrains need to contain diverse backgrounds (as opposed to aerial views of terrains) as that is usually the perspective of a mobile robot navigating the surroundings. This condition ensures that the classifier will be invariable to different backgrounds and it will be more accurate in real world applications.
Figure 1 shows an overview diagram of the components of this project and how they interconnect. The dataset we gathered from scratch, TerraSet, is used to train our classifier of choice, ResNet18. Vitis AI (with TensorFlow) and its tutorials provide a complete flow and instructions for training, quantizing and compiling the model. The compiled model is run on the DPU loaded on the KR260 using PYNQ. The KR260 is connected to the TurtleBot robot and ROS2 is used to process the output of the terrain classifier, and to adapt the path and the velocity of the robot.
The main contributions of this project are the following:
- Building of a new terrain dataset, called TerraSet, which is essential for implementing accurate terrain classification for mobile robots. The images reflect the perspectives of a mobile robot, including various background as opposed to having only aerial views of the terrains
- Adapting Vitis-AI/Vitis-AI-Tutorials scripts to load, process and augment TerraSet, and to train ResNet18 on this dataset. The code as well as the dataset are public and the github repository contains step-by-step instructions of how to access the dataset and run all the scripts
- Our classifier, ResNet18, achieves 96% accuracy on the TerraSet test set, showing that the model learned to differentiate between terrains very well and to be invariable to different backgrounds
- Developed the terra_sense package which contains a terrain classification node and a Custom Terrain Costmap layer plugin for Nav2, allowing the robot to access the terrain classification and use it in the navigation stack.
- Demonstrating our idea using the TurtleBot3 robot hosting the KR260 Robotics Kit and a RealSense camera
From our research, we noticed that there is no public dataset which has all or most of the categories we want to target (pavement, cobblestone, dirt ground, grass, sand, etc.). Furthermore, most of the public datasets we have found have pictures of only the terrain (as a top view) while in a real application a robot sees terrain and different backgrounds, depending on the camera orientation.
Figure 2 shows an example of a picture with only terrain versus one where there is background too. The reason is that a mobile robot can move its camera in various positions and we want the classifier to be invariable to different backgrounds. It can be easy to classify terrains when the input images only picture the type of terrain without other distracting objects or areas, but it is harder to give an accurate classification when backgrounds vary so much on the robot's path. We want the robot to be able to identify terrains and take corresponding actions even if the camera orientation changes or if there is a lot more of a distracting background. A more complete dataset would contain both types of images.
Taking all of this into consideration, we started building a dataset from scratch. We only used images gathered by us, most of them captured using the camera attached to the TurtleBot, and we avoided getting images from online source due to privacy and copyright concerns.
The final dataset used in this project, TerraSet, contains 6 categories with 100 images each.
The TerraSet categories are the following:
1. Cobblestone/Brick
2. Dirt ground
3. Grass
4. Pavement
5. Sand
6. Stairs
Figures 3-5 show examples from each category, examples with different backgrounds, angles, heights, camera positions.
Having to gather data by ourselves and to label all the images was time consuming and it led us to using a shorter list of categories (initially aimed to include classes such as shallow/deep water and more) due to the effort needed and the limited time. We are aware that in the context of machine learning, TerraSet is a small dataset, but we think it is a good starting point for demonstrating our project. It is part of our future work to extend the dataset (more categories and more images per class) and to make it available to the community, with the hope of enabling more research on autonomous mobile robots for applications such as search and rescue, and environmental monitoring.
Training of ResNet18 on TerraSetWe chose to use ResNet18 as our image classifier because it is a relatively small model but widely adopted and powerful. Research shows it achieves good accuracy on common image datasets such as CIFAR10 and ImageNet.
We make use of the Vitis-AI-Tutorials repository which provides the scripts for training ResNet18, quantizing and compiling the model for deployment on the DPU (the DPU being loaded on the Kria KR260 board using PYNQ). In the initial tutorial, ResNet18 is pre-trained on the ImageNet dataset and then trained on CIFAR10.
We adapted the scripts to load the TerraSet dataset and to use it for training ResNet18. We ran multiple experiments and we made several changes. Intuitively, the pre-trained weights on ImageNet did not help for our project because ImageNet is a very different dataset, hence we trained ResNet18 on TerraSet from scratch. The images were cropped to a size of 224x224, as the resolution of 32x32 used for the CIFAR10 example did not yield good results.
TerraSet is a small dataset, hence we decided to use augmentation to expand the dataset and to also make the classifier more robust to different factors that could impact the input images such as brightness or rotation. The following code snippet lists all the augmentation functions we defined and applied to the dataset (random_crop, random_flip, random_brightness, random_rotation, random_zoom, random_contrast, random_saturation, random_hue). The initial size of the dataset was 6 classes x 100 images = 600 images, after the augmentation we obtained 5400 images (600 initial images + 600 * 8 augmentation methods).
for preprocess_func in [random_crop, random_flip, random_brightness, random_rotation, random_zoom, random_contrast, random_saturation, random_hue]:
augmented_datasets.append(unbatched_dataset.map(preprocess_func, num_parallel_calls=tf.data.AUTOTUNE))
Although the augmented dataset is still considered a small dataset in the context of machine learning, we consider it a starting point for a more complete future dataset which we intend to make public for the research community. Moreover, this dataset allows us to demonstrate our idea and its potential, and to exemplify the entire flow of our implementation.
Out of the total 5400 images, we use 4320 images for training and 1080 images for validation and testing. ResNet18 achieved 98% accuracy on the training dataset and 96% on the test set. These high accuracies raise concerns about overfitting, however it is a good sign that the model performs well on the unseen data, the test set. We believe that having a larger dataset will reduce the overfitting risk of the model and it will increase its ability to generalize.
The trained ResNet18 model is quantized to int8 and compiled for the deployment on the DPU. Step-by-step instructions of how to access the TerraSet, run all the scripts for training, quantizing and compiling are provided in the ReadMe file in our project's repository.
Hardware SetupIn this project, we modified a TurtleBot3 Waffle Pi Robot to serve as a platform for our experiments. The original Raspberry Pi was replaced with the KR260 board, and the battery was modified to supply power to both the KR260 and the OpenCR board on the robot. Additionally, we replaced the TurtleBot's standard LiDAR with a RealSense Camera D455. This sensor substitution allowed us to use the new device for both image classification inference and for providing information necessary for localization and mapping.
To operate the modified TurtleBot, we installed the official Ubuntu 22.04 image on the KR260 and followed the repository guide to install PYNQ DPU and ROS2 Humble.
Due to computational constraints, we employed a laptop running Ubuntu 22.04 and ROS2 Humble to execute compute-intensive algorithms such as SLAM and navigation. This approach was necessary to demonstrate the implementation of the terrain costmap because the time constraints precluded the level of optimization required to run all processes on the KR260.
Figure 6 shows the assembled robot consisting of the TurtleBot3 Waffle Pi Robot, the KR260 Robotics board and the RealSense Camera D455.
Navigation2, commonly referred to as Nav2, is the ROS2 navigation stack that provides the tools and capabilities required for autonomous navigation in robots. It is designed to be flexible, scalable, and easily customizable.
The main functionalities of Nav2 are the following:
- Path Planning: Calculates the best path from the robot's current position to a goal position while avoiding obstacles
- Obstacle Avoidance: Ensures the robot can dynamically avoid obstacles in its path.
- Control: Executes the planned path by sending appropriate velocity commands
Costmaps are a fundamental part of the Nav2 stack. They represent the environment around the robot in the form of a grid, where each cell has an associated cost. The cost represents the difficulty or risk of traversing that cell, with higher costs indicating more difficult or risky areas.
There are two primary types of costmaps:
- Global Costmap: Represents the entire environment and is used for long-term planning. It provides a broad view of the space and helps in creating a path from the current location to the goal.
- Local Costmap: Focuses on the area immediately surrounding the robot and is used for short-term planning and obstacle avoidance. It provides a detailed view of the nearby environment, allowing the robot to make real-time adjustments to its path and to the robot's motors.
In our application we integrated the Nav2 stack with RTAB-Map, a Simultaneous Localization and Mapping (SLAM) approach that facilitates seamless interfacing with the RealSense camera. RTAB-Map processes sensor data to generate a detailed map, which Nav2 uses for path planning and obstacle avoidance, ensuring efficient and reliable navigation in dynamic environments. We modified Nav2’s path planning and obstacle avoidance to change based on the predictions of a machine learning (ML) algorithm.
Integrating ROS2 with ML opens up new possibilities for developing intelligent robotic systems. Custom ROS2 applications can leverage ML algorithms to enhance perception, decision-making, and autonomous behavior. We will now explore the integration of the KR260 board with ROS2, and how it supports the computational demands of our ML-enhanced robotic application. To do this, we created the terra_sense package shown in Figure 7. Instructions on how to launch all the packages shown in the figure can be found in our repository.
The terra_sense package comprises a terrain classification node and a custom navigation (Nav2) terrain plugin. The terrain classification node, configured similarly to existing solutions, utilizes the PYNQ DPU overlay and a trained ML model to publish the predicted terrain ahead of the robot. The Custom Terrain Costmap plugin subscribes to these classification messages, adding a value to the traversal cost of the local map based on the detected terrain class. The cost is integrated into the robot's local costmap, which primarily covers the robot's field of view because that coincides with the RealSense camera’s field of view. For Pavement and Brick we added a cost of 0 because these terrains are usually easy to transverse, so the cost for these instances is only based on the obstacles in the map. Dirt ground, Grass, and Sand are harder to transverse for the Turtlebot, so we added a value of 5 to the local costmap. Finally, stairs are very dangerous so we added a cost of 254 to the costmap, which corresponds to the cost of a lethal obstacle.
TerraSense Demo - Real-time Terrain ClassificationThe video shown below presents the demo of our implementation.
The first part demonstrates how terrain classification works in real time. Starting from timestamp 0:08, we can see on the bottom left side a video of the TurtleBot moving around and another video in the top left corner of what the robot sees through its camera. In this part of the demo, we were moving the robot by sending velocity controls which can be seen in the bottom right box of the screen. While operating outdoors, the robot is battery powered and we encountered issues with the navigation because the power constriction prevented the robot from supporting the communication with all the nodes. However, we could still do real-time terrain classification. The top right corner shows the outputs of the classifier, where the numbers correspond to the order of the list of TerraSet class labels presented in the TerraSet section. In this case, the classifier detects mostly 3 which stands for Grass and 4 which stands for Pavement. The classifier seems to correctly detect the terrain most of the times, but it confuses pavement with sand from time to time, we believe because the dataset contains images of very smooth sand that resembles pavement in certain lighting.
TerraSense Demo - Terrain Aware NavigationIn this part of the demo we are showing the robot performing classification of terrain as well as using that prediction to change the local costmap. This part of the demo had to be done indoors with the robot plugged into AC to keep up with the power demands of the communication of all the involved ROS nodes. On the top left corner we can observe the current detected terrain and the change of the costmap. Prev cost (previous) corresponds to the cost obtained from the basic Nav2 costmap layers, namely obstacles, inflation, and voxel, and new cost represents the adjusted cost after taking terrain into consideration. On the bottom left corner, we find a view of the RealSense camera, and on the right we see robot in the middle surrounded by the map of the environment. The local costmap is represented as a rectangle around the robot and the global map is represented by the purple and cyan shapes. Cyan represents in both maps low cost, red represents the highest cost, and purple represents the obstacles. The video shows that we get the expected result from the terrain plugin as the local map’s cost is slightly higher than the global map because it is considering the terrain. Since we did not include in our training dataset carpet, which is the terrain the robot is moving on, the robot classified the carpet as Sand, Pavement, and Grass. This might be due to the similarities of color with Pavement and Sand and texture in the case of Grass. Regardless, the Sand and Grass classes incur an added cost of 5, which is why the local costmap has blue in place of cyan, and red in place of purple. When the terrain is classified as Pavement, the local map colors change to more closely resemble the global costmap and the new cost is the same as the previous cost, which is the expected result.
ConclusionThis project successfully demonstrates the integration of FPGA-accelerated machine learning for mobile robot terrain classification and navigation. By creating a new terrain dataset, TerraSet, and training a ResNet18 classifier, we achieved a high accuracy of 96% on the test set, showcasing the model's effectiveness in classifying different terrains. The deployment of this model on the DPU loaded on the Kria KR260 board allowed real-time inference, which was crucial for the robot's adaptive behavior.
We modified a TurtleBot3 Waffle Pi Robot, replacing the Raspberry Pi with the KR260 board and integrating a RealSense Camera D455. This setup enabled the robot to utilize the terrain classification for both navigation and mapping tasks. The use of a laptop for compute-intensive algorithms like SLAM and navigation, alongside the KR260, highlighted the practical considerations and computational challenges in deploying such systems.
The development of the terra_sense package, which includes a terrain classification node and a custom Nav2 terrain plugin, facilitated the integration of terrain information into the robot's navigation stack. By assigning traversal costs to different terrains and incorporating them into the local costmap, the robot could make informed decisions about its path and speed, enhancing its ability to navigate complex environments autonomously.
Overall, the project's results underscore the potential of FPGA-accelerated machine learning in enhancing the capabilities of autonomous mobile robots. Future work includes expanding the TerraSet dataset and optimizing the system for complete onboard processing.
Comments