In the realm of regional anesthesia, a brachial plexus block is a critical technique that involves injecting local anesthetics into or around the brachial plexus. This network of nerves supplies the arm and hand, enabling painless surgeries or procedures on the upper limb. While effective, this procedure requires high precision to avoid complications such as nerve injury or infection. Traditionally, highly trained anesthetists perform these blocks, often utilizing ultrasound guidance for better accuracy.
However, the complex anatomy of the brachial plexus makes nerve segmentation in ultrasound images challenging. This project aims to design a real-time application capable of accurately identifying and locating brachial plexus nerve trunks in ultrasound images using FPGA technology. By leveraging deep learning and hardware acceleration, we hope to democratize this procedure, making it accessible even to less experienced practitioners.
Problem StatementThe goal is to develop a hardware accelerator for real-time ultrasound image processing to accurately identify brachial plexus nerve trunks. This will help less experienced anesthetists perform nerve blocks more effectively, reducing the dependence on highly trained professionals and expanding access to this essential procedure.
Proposed SolutionWe aim to build a solution that integrates an optimized hybrid deep neural network architecture tailored for accurate recognition of brachial plexus nerve trunks. Given the computational complexity of deep learning methods, achieving real-time segmentation on edge computing devices is challenging. Therefore, we are leveraging the AMD KRIA KR260 Robotics Starter Kit as a hardware accelerator. This integration will enhance the speed and efficiency of the segmentation process, ensuring timely and precise identification of the brachial plexus nerve trunk.
Key Features1. Real-Time Processing: Utilizing the AMD KRIA KR260's FPGA capabilities to process ultrasound images in real-time.
1. Deep Learning Integration: Implementing a hybrid deep neural network, Inc+ResUNet, optimized for nerve trunk recognition.
2. Portability and Power Efficiency: Designing a portable end device that consumes minimal power, ensuring it can be used in various clinical settings without the need for internet connectivity.
3. Enhanced Accuracy: Providing accurate and reliable segmentation to assist medical professionals in underserved surgical environments.
Implementation1. Selecting a dataset
A dataset in Kaggle competition under 'Ultrasound nerve segmentation' was selected for the purpose of training and testing the models. The dataset contains two partitions training dataset and test dataset. Under the training partition, there are two types of images, which are Ultrasound images and labeled images annotated by experienced medical doctors, while the test dataset contains only original ultrasound images without labels to predict outputs. The below figure shows the original image and labeled image from the dataset.
2. Data pre-processing
Firstly, the training images and testing images were put into two separate arrays and organized in ascending order. The training dataset was further split as a training and validation dataset in order to check the accuracy of model development. Data augmentation techniques were applied to increase the number of images in the dataset and also to introduce more diverse features to the data while creating new instances that are still representative of the underlying patterns in the data.Initially, the dataset contains training data with 5635 ultrasound images and their labels. But as we want to validate the system with existing data, the training data is sectionized into two as training(5071 images and 5071 labels) and valida tion(564 images and their labels).The augmentation techniques used is shift-scale rotation which can improve generalization, reduce over-fitting and handle class imbalance and increase dataset size.
3. Model development
The next step was to find a suitable method to do the segmentation task. Neural networks perform well in the task and according to the literature U-Net architecture performs well with bio-medical image segmentation tasks.Therefore, U-Net was selected as the starting point. But then noticed that we can more improve the model so as to get a higher accuracy. Our team developed a custom CNN model as Inception+ResUNet by replacing convolution blocks with Inception V1 blocks(GoogLeNet) where Inception modules are designed to capture multi-scale features by using different filter sizes and then concatenating their outputs. The other change is the use of ResUNet architecture which is an optimization for U-Net with a residual connection which can improve learning and enable better gradient flow. The models were trained on Google Colab and the final model is obtained as a floating point trained model(.h5 file) after optimizing and evaluating with Dice coefficient.
4.Implementation of the hardware accelerator
The trained model has to deploy in the selected hardware platform, the KR260 board.Given the high computational power required for deep learning inference, we used the Vitis AI toolkit to optimize our model for deployment on the KR260 DPU (Deep Processing Unit).This process includes the two main processes quantization and compilation which involves quantizing the floating-point model and then compiling it specifically for the KR260 hardware. Here’s a detailed step-by-step guide on how we achieved this:
Step 1: Pull the Vitis AI Docker Image
First, we pulled the Vitis AI Docker image, which contains all the necessary tools and libraries for model quantization and compilation. This ensures a consistent and controlled environment for the process.
docker pull xilinx/vitis-ai-tensorflow2-cpu
Quantization is the process of converting a model's weights and activations from floating-point to a lower precision (usually 8-bit integers). This reduces the model's size and computational requirements, making it more suitable for deployment on hardware accelerators like the DPU.
Step 2:Prepare the Quantization Script
The quantize.py script that handles the quantization process. This script includes loading the trained model, applying quantization-aware training if necessary, and saving the quantized model.
Step 3:Run the Quantization Script
./docker_run.sh xilinx/vitis-ai-tensorflow2-cpu:latest
python3 quantize.py
Compile the Quantized model for the KR260 DPU: After quantizing the model, we compiled it using the Vitis AI compiler. This step converts the quantized model into a format that can be executed by the KR260 DPU.
Step 4:Prepare the Compilation Script
We created a vitisai_compile.sh script to handle the compilation process. This script specifies the target hardware (KR260).
Step 5: Run the compilation Script
Inside the Vitis AI Docker container, we ran the compile.sh script, providing the necessary arguments.
./compile.sh kr260 ./quantized_model.h5 ./compiled inc+resunet_kr260
5. PYNQ DPU Installtion
The Deep Learning Processor Unit (DPU) overlay is used with PYNQ to run the CNN model for AI inference applications. We used https://github.com/amd/Kria-RoboticsAI repo to install PYNQ in to KR260 board.
6. System Validation
The system uses a client-server architecture for validation. The server simulating an ultrasound probe continuously streaming video frames where the client is the KR260 board which processes the frames to segment the nerves and sends the results to the display.
The server.py script is run on a laptop, while the client.py script runs on the KR260 board. The laptop and the KR260 are connected via an Ethernet cable, with the laptop port forwarding to the Ethernet. In a real-world scenario, the laptop would receive the ultrasound video from an ultrasound probe, send it to the KR260 board for processing, and display the segmented images.
The below shows the system validation output taken with the hardware.
Comments