Utilizing AMD Field Programmable Gate Array (FPGA) hardware to deploy a quantized and modified neural network for rail fault detection, addressing the challenge of implementing automated rail defect detection on resource-constrained edge devices.
Background: With the increasing demand for railway transportation safety, traditional rail inspection methods involve workers regularly walking along the tracks to observe wear, deformation, and cracks. This method is time-consuming, labor-intensive, and inefficient, though it can intuitively detect some obvious faults. Such methods no longer meet the needs of modern railway systems. To address the automation and efficiency issues in railway fault detection, this project developed a railway detection system based on a FPGA. This edge AI system captures track images through cameras and uses Convolutional Neural Networks (CNN) to detect track defects in real-time and automatically report fault information.
Market Value Analysis of the Project
Utilizing FPGA for railway track inspection can address several critical issues, including:
- Accident Prevention: Defects in railway tracks, such as cracks, wear, and corrosion, are primary causes of train derailments. If these defects are not detected and repaired in time, they pose serious risks to the safety of passengers and goods. Automated inspection systems can detect these issues early, thus preventing accidents.
- Increased Efficiency: Traditional manual inspection methods are inefficient and prone to errors. In contrast, FPGA combined with deep learning and computer vision technologies can significantly improve detection efficiency and accuracy, reducing the need for human intervention.
- Real-time and Accuracy: FPGAs offer parallel processing capabilities, enabling real-time data processing and analysis. This is crucial for railway applications that require continuous monitoring and quick response.
Market Potential
- Widespread Market Demand: With the global expansion and upgrading of railway networks, there is an increasing demand for efficient and reliable railway track inspection systems. The frequent use of rail transport and the growing length of railway lines make track inspection an urgent issue to address.
- Technological Advantage: The application of FPGA combined with deep learning technology in railway track inspection not only enhances the speed and accuracy of detection but also reduces errors caused by environmental and human factors. This gives the project a strong competitive edge in technology.
- Economic Benefits: By improving inspection efficiency and reducing the occurrence of accidents, railway operating companies can lower maintenance costs and accident compensation expenses. Additionally, minimizing transport interruptions caused by accidents can improve operational efficiency and economic benefits.
Preliminary Exploration: Design and Deployment Attempts through Vitis AI:
In addition to traditional processes, AMD provides a powerful tool called Vitis AI. This toolchain provides optimized IP, tools, libraries, models, as well as resources, such as example designs and tutorials that aid the user throughout the development process. It is designed with high efficiency and ease-of-use in mind, unleashing the full potential of AI acceleration on AMD Adaptable SoCs and Alveo Data Center accelerator cards. This method is much simpler than traditional processes. We also attempted to train the Resnet18 network from AMD's model zoo, optimize it, and deploy it using Vitis AI. The following content outlines the general development process:
Set up the environment: Install Docker: xilinx/kria developer: 2022.1 on the host running Ubuntu. This Docker environment corresponds to the Vitis AI v3.0 version we are using.
Quantizing the Model: In this Docker environment, the powerful tools provided by Vitis AI can be used to analyze, quantize, and export models. S Simply import the trained model (.pth) file and complete the model detection and quantization process with very short commands.
Next, use the vai_c_xir tool to compile the model. At this point, the compiled (.xmodel) file is ready to run on the DPUCZDX8G architecture.
Burn the corresponding Petalinux image of Vitis AI onto the Kria KR260, load the corresponding DPU firmware, import models and test data, and use the Vitis AI runtime API to run it directly.
However, in the end, we didn't choose this approach for the development of rail error recognition. We needed a faster and lighter network design, along with integration with the control of the robot car and the ESP8266 module. Finally, we decided to undertake a brand-new network design and deployment.
Contributions:
To reduce the model size and deploy an appropriate rail defect detection model on the Kria KR260 development board and achieve upper computer communication and deployment of robot cars, this project adopts a five-step approach:
- Designing a neural network model specifically adapted to the resources of the Kria KR260 development board.
- Implementing post-training quantization of model parameters and using mixed-precision quantization (different quantization bit widths for different modules and precision requirements).
- Using AMD HLS tools for coding efficiency.
- Implementing communication with the upper computer through the WiFi communication module.
- Integrating the car communication protocol to achieve KR260's control of the robot car platform.
The neural network architecture used in this project is illustrated in the provided figure. The design is based on the ResNet network structure, retaining the basic residual block structure. The input image size is 128x128. The dataset is filtered and divided into two categories: images with defects and images without defects. Data augmentation techniques such as blurring, noise addition, random flipping, and rotation are applied.
- Model Parameters:261KB
- MAC Operations: 24.66M
- Top-1 Accuracy: 88.9%
Parameter Extraction and Environment Setup:
After determining the best model, the parameters need to be extracted for inference on FPGA. There are two ways to achieve this:
- Using the Netron website to extract model weights and biases.
- Writing Python code to extract model parameters is also a reliable method.
Development Environment Configuration:Once parameter extraction is complete, Vitis HLS 2022.2 is used for the inference process of the PyTorch neural network model. The following environment configurations are required:
- Vitis HLS: Choose Kria KR260 as the target, with the flow target set to Vivado IP Flow Target.
- Import source files and test bench files. Include the previously extracted weights and biases in the source.
- In Vivado 2022.2, create a project for the Kria KR260 development board and use PYNQ for communication between the Processing System (PS) and Programmable Logic (PL).
- The Processing System (PS) refers to the system typically comprised of embedded processors. It is responsible for executing the majority of software tasks, including operating systems, applications, and drivers. The PS can handle advanced computational tasks and provides a wide range of peripheral interfaces, such as serial communication, Ethernet, USB, and more. Within the PYNQ framework, the PS usually runs Python code, managing control and data processing.
- The Programmable Logic (PL) refers to the portion of the system typically constituted by FPGA. The PL is used to achieve hardware acceleration and can be dynamically configured as different hardware circuits as needed to enhance the execution efficiency of specific tasks. The PL is suitable for parallel processing, extensive data handling, and custom hardware functions. In the PYNQ framework, the PL can be programmed using hardware description languages (such as VHDL or Verilog) or high-level synthesis tools (such as Vivado HLS), allowing users to configure and utilize these hardware acceleration modules through Python interfaces.
- The Kria KR260 development board needs to be flashed with a PYNQ environment.
Code Explanation:
- The HLS code section:
- This is a standard convolution operation. The input image size is 128x128, the convolution kernel size is 5x5, and the output channel number is 32.
- Convolutional layers are the core building blocks of CNNs and where most of the computation happens.
- This is a standard max-pooling operation used to scale down the image size and extract features. Pooling layers (also known as subsampling) perform down-sampling, reducing the number of parameters in the input. Like convolutional layers, pooling operations slide a filter over the entire input. However, unlike convolutional filters, this filter has no weights. Instead, the kernel applies an aggregation function to the values within its receptive field, populating the output array. Max pooling selects the maximum value pixel to send to the output array as the filter moves over the input.
- After writing all the HLS code, synthesis and Co-simulation operations are required.
- Through these two steps, we can check resource usage and timing errors, thereby generating a synthesis report. After passing the Co-simulation, export the RTL design by running the Export RTL command, which writes to the active solution's impl folder. The export.zip file is a zip archive of the IP and its contents, which can be directly added to the Vivado IP catalog.
- Upon completing the HLS section, we move to the previously created Vivado project to start the Block Design process. First, find the settings to import the HLS-generated IP package.
- Block Design is a method for graphical design and management of FPGA hardware systems. It allows developers to create and configure hardware designs more intuitively through a graphical interface. Block Design supports modular design, enabling developers to divide complex hardware functions into multiple independent modules, each of which can be individually designed, tested, and verified. These modules can be predefined IP cores or custom hardware logic.
- After applying the settings, choose Create Block Design from the list on the left. Import the PS module and IP module and connect them. Vivado can automatically generate connections between IP cores, reducing manual wiring errors and complexity, including bus interfaces, clock signals, and reset signals.
- Verify the connections are correct, then proceed with synthesis, implementation, and bitstream generation. Synthesis is the process of converting high-level HDL (such as VHDL or Verilog) code into a gate-level netlist, which describes the specific logic gates and their connections. During synthesis, Vivado parses the design code, performs logic optimization, removes redundant logic, and maps it to FPGA's basic logic units (such as lookup tables and flip-flops).
- The goal of the implementation process is to ensure that the design runs correctly on the FPGA and meets all timing and resource constraints. Generating the bitstream converts the implemented design into a binary file that can be downloaded to the FPGA. The bitstream contains all the necessary information to configure the internal logic and connections of the FPGA and is loaded into the FPGA upon power-up or reconfiguration. After generating the bitstream, developers can download it to the FPGA to make it operate according to the design logic and connections.
- After writing the bitstream, find three important files in the Vivado project directory: (.tcl, .bit, and .hwh) files. TCL script files are used to automate and manage the FPGA design flow.
- (.tcl) files usually contain a series of commands used to set up projects, synthesize designs, implement designs, and generate bitstreams.
- (.bit) files are binary files generated by Vivado, containing all the information needed to configure the FPGA. This file can be directly downloaded to the FPGA chip, making the FPGA work according to the design's logic and connections.
- (.hwh) files are high-level metadata files describing the hardware design, typically associated with PYNQ and other embedded systems, containing detailed information about the hardware design, such as IP cores, bus connections, and register mappings.
PYNQ Development and Board Testing:
- Power on the Kria KR260 development board and open the Jupyter visualization interface.
- Upload the required project files and create a .ipynb file to write PYNQ code.
● Use the overlay to import the contents of the bit file to call the PL side.
- Read and pass image information and recognition results through the address. Find the address as follows: locate the .h file in Vitis HLS, then find the input and output address locations.
- Upon successful testing, the Kria KR260 can be used for rail defect detection.
Dataset Augmentation:
- The first step is to perform basic image enhancement operations on the images, including horizontal and vertical mirroring.
- On this basis, perform a 45° flip operation on all images after mirror processing. At this point, one image has been expanded to 8 images.
- Finally, perform four different treatments on each image after the previous basic enhancement: blur, salt, brightening, and darkening.
- In total, each original image was expanded to 40 images through data augmentation.
Development of ESP8266 and communication with upper computer:
- To write and develop programs using Arduino, first set the development board interface and configuration to "Generic ESP8266 Module" and load the relevant tool environment package. You can also manually download and import from Github: https://github.com/esp8266/Arduino
- During development, two libraries, ESP8266WiFi.h and WifiLocation.h, were used for ESP8266 networking and geographic location acquisition, respectively.
- When using ESP8266 to connect to the upper computer, because the upper computer uses the socket protocol, the programming of ESP8266 also follows the same design. After connecting to WiFi, open a socket server to wait for the connection. Implemented basic data transmission and reception, as well as the transmission of geographic location information after receiving the "sendLocation" command:
- Geographic location information is obtained through the Google API, which requires the use of Google API keys for access and clock information to obtain specific location details during calls. After ensuring that the code can function properly, as the ESP8266 selected for this project is DevSKit, it can be burned and communicated through microUSB, and can be used by connecting ESP8266 through a serial port and set the upload speed to 115200 for burning.
- The final actual running upper computer is presented below:
The functions and usage of the robot car:
- The robot car is equipped with multiple modules, but the main ones used in this project are the servo gimbal and the four-wheel drive motor. The car is equipped with a custom driver board and an STM32 core board for basic control. The entire driver board is powered by a PR12V lithium battery pack and connected to KR260 via microUSB for serial communication.
- The servo pan tilt section consists of two servo motors that control the left and right rotation and the adjustment of the up and down pitch angles of the pan tilt. It is equipped with high-definition USB cameras for precise cutting, positioning, and acquisition. Through the precise control of the servo gimbal, the camera can accurately capture every information about the railway tracks.
- The car uses four motors to control each of the four wheels to achieve four-wheel drive, enabling it to quickly accelerate and decelerate forward and backward. At the same time, because it is four-wheel drive, the car's steering system uses four-wheel differential steering to achieve stationary steering. TThis flexible maneuverability is suitable for operating in narrow environments, making it ideal for the working conditions on railway tracks.
- The car integrates a set of instruction operating systems, encapsulating a series of complex basic operations such as specific clock and interrupt operations of each module. By receiving instructions through a serial port, the corresponding functions of the instructions can be directly realized, such as operating the car servo gimbal to a certain angle. Here are some basic instructions:
- Specifically, we have designed a separate instruction system for the control of the servo gimbal. The entire program for the car is written and burned using Keil μVision V4.10. The following is an example of the servo gimbal control protocol (the example command is $4WD, PTA180,PTB90,PTC90#)
- All instructions are received through the serial port, and the Kria KR260 can achieve full control of the car through the PYNQ terminal.
Conclusion:
This project achieves real-time and accurate rail defect detection by deploying a neural network on the Kria KR260 FPGA platform and integrating the entire system into a robot car. Using WiFi communication to upload detected rail defect information to the upper computer enables remote monitoring and data analysis. It also reflects the significant improvements in FPGA development efficiency achieved with AMD tools. The project boasts advantages such as strong real-time capability, high platform performance, low power consumption, high integration, and strong adaptability, providing an efficient and reliable solution for rail safety inspection. Future work can focus on further optimizing the algorithm and system architecture to enhance detection efficiency and accuracy, expand the application scope, and contribute to the safety and efficiency of railway transportation.
Comments