The biggest challenge in deploying Artificial Intelligence (AI) based solutions, is often finding the suitable hardware to run our model in a efficient and cost effective way.
Virtual machines offered by big Cloud Providers (AWS, Google Gloud, Azure) usually don't come with hardware suitable for AI / ML acceleration (FPGA, GPU, ASIC, etc).
Special instance types(ex. F1 / Inf1 in AWS) are needed to be able run more complex AI models. These instance types are usually more expensive, and smaller project / teams may not be able afford them. On top that, with low traffic applications, the hardware is probably not fully exploited.
To deploy AI workloads in the cloud, more efficient solutions like using shared servers could be used. Servers equipped with accelerator cards like the Xilinx VCK5000 could be used to fulfill AI Inference tasks from multiple applications.
In this project, I propose an Artificial Intelligence as a Service (AIaaS) platform, with a server featuring the Xilinx VCK5000 accelerator card. The platform allows clients to run different types of AI workloads remotely, using an easy to use API.
In the following sections I will show how I built a proof-of-concept (PoC) of this Artificial Intelligence as a Service (AIaaS) platform.
Concept & OverviewThe main idea of an Artificial Intelligence as a Service (AIaaS) platform is to offer hardware accelerated AI workloads over an easy to use REST API.
This allows applications to incorporate powerful AI features without the need to run them directly on costly hardware acceleration enabled machines.
It also allows multiple applications to take better advantage of compute capacity of a single hardware acceleration enabled machine.
In this project I will present proof-of-concept (PoC) demonstrating this ideas.
The main features of this PoC are the following:
- an AIaaS API Server featuring the Xilinx VCK 5000 Versal development card
- several Hardware-Accelerated AI Workflows exposed over a REST API(Image Classification, Image Batch Classification, Face Detection, Lane Detection, Object Detection in Video + more to come)
- a standard OpenAPI Specification file describing the REST API, which can be used to generate REST Clients and documentation
- a set of Jupyter Notebooks demonstrating the use of each API
- an Administration Interface that allows inspecting the server status, task history and task details
- Build & Installation scripts
Here is quick demo video showing how the project works:
In the following sections we will get into more details about each component of the system. Finally we will take a look at wide variety of possible new features and improvements.
ArchitectureThe High-Level Architecture of the system looks like this:
The system is composed from the Client, Server and API components. The contract between them is the Artificial Intelligence as a Service (AIaaS) REST API described in a OpenAPI specification file.
The Server part runs on a is implemented as two independent parts:
The Backend Server is a web application implementing the AIaaS REST API. The application is written in Java using the Spring Boot framework, and is responsible for general management tasks.
A set of VitisAI MicroApp-s are used to implement different hardware accelerated AI workloads. These are C++ and/or Python applications built over the Vitis AI framework / libraries, and they are accelerated by the DPU-s running on Xilinx VCK5000 card.
The Client applications can use the AIaaS REST API either by manually written, or automatically generated REST Clients.
In the following sections, I will get into more details about each of the components.
Vitis-AI MicroAppsThe Xilinx VCK5000 card can be used to accelerate AI workloads using DPU-s implemented in the programmable logic.
To access this functionality, the Vitis-AI C++ or PythonAPI-s can be used. In this project I opted for C++ API as it seems to be more feature complete.
To keep thing simple, I decided to implement the different hardware accelerated AI workloads in small self-contained applications called "Vitis-AI MicroApps".
Each micro-app implements a specific AI workload as follows:
- Image Classification (
vitis_ai_image_classify.cpp
) - image classification with custom model (exresnet50
) - Image Batch Classification (
vitis_ai_image_classify_batch.cpp
) - similar to the above but can classify multiple images at the same time - this is also a sample app for batch-processing - Image Face Detection (
vitis_ai_image_face_detect.cpp
) - face detection on images - Image Lane Detection (
vitis_ai_image_lane_detect.cpp
) - lane detection on images - Video YOLO V3 Object Detection (
vitis_ai_video_yolov3.cpp
) - YOLO V3 object detection implemented on videos - this is also a sample app for video processing(with many more to follow in the future)
These micro-apps usually take as input an image or video, and produces output in JSON format. The apps will later be called by the API sever.
Now lets see how each of the micro-apps works!
> Image ClassificationThe Image Classification micro-app (vitis_ai_image_classify
) implements a classification of a single image, using an arbitrary model like the ResNet50.
The app is called as:
$ ./vitis_ai_image_classify <model> <image-file>
and produces an JSON output like:
{
"results": [
{ "idx": 20, "class": "water ouzel, dipper,", "score": 0.999 },
{ "idx": 42, "class": "agama,", "score": 0.083 }
]
}
> Batch ClassificationThe Image Batch Classification micro-app (vitis_ai_image_classify_batch
) is similar to the previous one, but instead of process one image at time, it processes a batch of images. This way the performance should be improved.
The app is called like:
$ ./vitis_ai_image_classify_batch <model> <image-file-1> <image-file-2> ...
and the result looks like:
{
"results": [
{
"image": "image1.jpg",
"results": [
{ "idx": 20, "class": "water ouzel, dipper,", "score": 0.999 },
{ "idx": 42, "class": "agama,", "score": 0.083 }
]
},
{
"image": "image2.jpg", ...
}
]
}
This micro-app will also acts as the template for batch processing type micro-apps.
> Face DetectionThe Image Face Detection (vitis_ai_image_face_detect
) app can be used to detect one or more face in an image. It uses a model like the DenseBox (320x320) and is used like:
$ ./vitis_ai_face_detect densebox_320_320 sample_facedetect.jpg
{
"results": [
{ "x": 75, "y": 65, "width": 57, "height": 70, "score": 0.997199 },
{ "x": 204, "y": 45, "width": 55, "height": 73, "score": 0.994089 }
]
}
> Lane DetectionThe LaneDetection (vitis_ai_image_face_detect
) app lane detection on an road, and it is called like:
$ ./vitis_ai_lane_detect vpgnet_pruned_0_99 ../lanedetect/sample_lanedetect.jpg
{
"results": [
{ "type": 3, "points": [
{ "x": 164, "y": 377 },
...
> Object Detection in Videos with YOLO V3The final micro-app I implemented for this PoC demonstrates object detection in a video. It uses the YOLO V3 model, and works as follow:
$ ./vitis_ai_yolo3_video <model> <video-file>
...results for each frame
API ServerThe purpose of the API Server is to expose the hardware accelerated AI workflows as an easy to use REST API.
The API Server is a web application built using Java and Spring Boot. It is responsible for the following functions:
- implementing REST API-s defined in the OpenAPI specification file
- managing temporary files and other resources
- calling the Vitis-AI MicroApps
- collecting and parsing the output generated by Vitis-AI MicroApps
- packing and returning the results to the caller
- etc.
The REST API-s exposed by the API Server is documented using the industry standard Open API specification files.
This files describe following REST API-s exposed by the API Server:
- Image Classification -
/api/v1/images/classify
- Image Batch Classification -
/api/v1/images/classify/batch
- Image Face Detection -
/api/v1/images/face-detect
- Image Lane Detection -
/api/v1/images/lane-detect
- VideoObject Detection with YOLO V3 -
/api/v1/videos/yolo-v3
(with more to follow in the future)
The OpenAPI specification can also be used to generate HTML documentation, and as well Clients for different programming languages.
> Admin InterfaceAlong the AIaaS API, the back-end server also exposes an Administration Interface.
In this first version, the Admin Interface can be used inspect the state of the service, and the AI task history, along with task details:
The interface can be accessed under the /admin.html
path, and it is implemented HTML / JavaScript app. Under the hood it uses a set of Admin API-s implemented in the back-end server.
Note: as this project is still a PoC, the AIasS API and the Admin Interface are not protected by authentication or authorization in any way.
> Docker Images & Setup ScriptThe back-end part of the AIaaS project runs on an extended version of the Vitis-AI Docker image.
The official image was image was extended with things like Java Runtime and custom setup scripts.
The Dockerfile, the setup scripts and instructions can be found in the Backend folder of the attached GitHub repository.
Clients and ExamplesTo interact with the Artificial Intelligence as a Service (AIaaS) REST API some kind of REST / API clients need to be used. This can be either manually written REST calls or clients automatically generated from the OpenAPI specification files.
For this PoC I'm using manually written REST calls in Jupyter Notebooks. In this notebooks I prepare the input images (or video), make the calls and then process and visualize the result.
There is one example notebook prepared for each API:
The Xilinx VCK5000 Versal Development Cardis a PCI Express based card.
To use it we need either a Server of a Desktop PC, with a free PCI Express 3.0 x 16 slot.
> Hardware InstallationI opted for a Desktop PC with the following specs:
- AMD Ryzen 5 3400G CPU (with integrated graphics)
- Gigabyte X570 GAMING X motherboard
- G-Skill 16 GB DDR4 3200 MHz memory
- Seasonic Focus GX 550W ATX power supply
The VCK5000 is installed in a PCI Express 3.0(+) x16 slot. The two 8 + 6 pin PCI power connectors also needs to be connected.
> OS and PackagesOn the software side we need a Linux based OS distribution like RHEL, CentOS or Ubuntu. I went with Ubuntu 20.04.3 LTS, with the Linux kernel downgraded to the version 5.8.0-43-generic
. (Note: at the point of writing this, the latest 5.11.x
kernel is not supported by the xocl
and xclmgmt
kernel drivers)
Next we have to install a couple of software packages to get the VCK5000 working. We can follow the Vitis AI Setup Instruction for the VCK5000:
- by running the provided
./install.sh
we install the Xilinx Runtime Library (XRT), Xilinx Resource Manager (XRT) and the DPU V4E xclbin for VCK5000 - then we need download and install some DEB packages containing the firmware for the VCK5000, and as well some utilities for flashing and validation:
$ wget https://www.xilinx.com/bin/public/openDownload?filename=xilinx-vck5000-es1-gen3x16-platform-2-1_all.deb.tar.gz -O xilinx-vck5000-es1-gen3x16-platform-2-1_all.deb.tar.gz
$ tar -xzvf xilinx-vck5000-es1-gen3x16-platform-2-1_all.deb.tar.gz
$ sudo dpkg -i xilinx-sc-fw-vck5000_4.4.6-2.e1f5e26_all.deb
$ sudo dpkg -i xilinx-vck5000-es1-gen3x16-validate_2-3123623_all.deb
$ sudo dpkg -i xilinx-vck5000-es1-gen3x16-base_2-3123623_all.deb
At this point running sudo lspci -vd 10ee:
should show the VCK5000 card detected and running the with the Kernel drives xclmgmt
/ xocl
.
To flash latest firmware we can run:
$ sudo /opt/xilinx/xrt/bin/xbmgmt flash --scan
$ sudo /opt/xilinx/xrt/bin/xbmgmt flash --update
After a cold restart --scan
should show our VCK5000 is running version 4.4.6
.
The validation utility can be used the card is functioning correctly:
$ /opt/xilinx/xrt/bin/xbutil validate --device 0000:01:00.1
> Installing Vitis AIAt this point we should be ready to install Vitis AI.
I opted to run Vitis AI in a Docker container. So, the first step was to install Docker Engine by following the official installation guide.
To run Vitis AI in a container we need to clone the Vitis AI GitHub repository:
$ git clone --recurse-submodules https://github.com/Xilinx/Vitis-AI
$ cd Vitis-AI
Then we can use the provided ./docker_run.sh
script to pull and run the latest Vitis AI container:
./docker_run.sh xilinx/vitis-ai-cpu:latest
The script can also detects when a VCK5000 card is installed, and it automatically attaches the PCI-E device to our container.
This should land us in a Docker container with Vitis AI
To validate our Vitis-AI setup, we can run some demos and performance test. There are a good number of demos that came with Vitis-AI. I choose to run demos with the ResNet-50 models.
First we need to download and extract the the VCK5000 optimized version of the ResNet-50 Vitis-AI model. The official documentation recommends to download the models directly to the /usr/share/vitis_ai_library/models
folder:
$ wget https://www.xilinx.com/bin/public/openDownload?filename=resnet50-vck5000-DPUCVDX8H-r1.4.1.tar.gz -O resnet50-vck5000-DPUCVDX8H-r1.4.1.tar.gz
$ tar -xzvf resnet50-vck5000-DPUCVDX8H-r1.4.1.tar.gz
$ sudo cp resnet50 /usr/share/vitis_ai_library/models -r
As I wanted to make these downloads persistent, I decided to link /usr/share/vitis_ai_library/models
to a externally mounted folder:
$ cp -R /usr/share/vitis_ai_library/models .tmp-vck5000-models
$ sudo rm -rf /usr/share/vitis_ai_library/models
$ sudo ln -s /workspace/.tmp-vck5000-models /usr/share/vitis_ai_library/models
This will save the models in a .tmp-vck5000-models
folder in the Vitis-AI git repository from the Docker host.
Next, we need download some sample images and videos to test with:
$ wget https://www.xilinx.com/bin/public/openDownload?filename=vitis_ai_library_r1.4.0_images.tar.gz -O vitis_ai_library_r1.4.0_images.tar.gz
$ wget https://www.xilinx.com/bin/public/openDownload?filename=vitis_ai_library_r1.4.0_videos.tar.gz -O vitis_ai_library_r1.4.0_video.tar.gz
$ tar -xzvf vitis_ai_library_r1.4.0_images.tar.gz -C demo/Vitis-AI-Library/
$ tar -xzvf vitis_ai_library_r1.4.0_video.tar.gz -C demo/Vitis-AI-Library/
(note: the sample images and videos are already saved on the shared folder)
Finally we need to compile and run the sample classification application:
$ cd /workspace/demo/Vitis-AI-Library/samples/classification
$ bash -x build.sh
To run the model on the sample image we can run:
$ source /workspace/setup/vck5000/setup.sh
$ ./test_jpeg_classification resnet50 sample_classification.jpg
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0104 03:37:21.292187 177 demo.hpp:1183] batch: 0 image: sample_classification.jpg
I0104 03:37:21.292309 177 process_result.hpp:24] r.index 109 brain coral, r.score 0.982666
I0104 03:37:21.293148 177 process_result.hpp:24] r.index 973 coral reef, r.score 0.00850172
I0104 03:37:21.293203 177 process_result.hpp:24] r.index 955 jackfruit, jak, jack, r.score 0.00662115
I0104 03:37:21.293256 177 process_result.hpp:24] r.index 397 puffer, pufferfish, blowfish, globefish, r.score 0.000543497
I0104 03:37:21.293325 177 process_result.hpp:24] r.index 390 eel, r.score 0.000329648
...
We can also check the performance is as expected:
> ./test_performance_classification resnet50 test_performance_classification.list -t 8 -s 60
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0104 03:37:57.178503 242 benchmark.hpp:184] writing report to <STDOUT>
I0104 03:38:01.277192 242 benchmark.hpp:211] waiting for 0/60 seconds, 8 threads running
I0104 03:38:11.277294 242 benchmark.hpp:211] waiting for 10/60 seconds, 8 threads running
I0104 03:38:21.277393 242 benchmark.hpp:211] waiting for 20/60 seconds, 8 threads running
I0104 03:38:31.277499 242 benchmark.hpp:211] waiting for 30/60 seconds, 8 threads running
I0104 03:38:41.277577 242 benchmark.hpp:211] waiting for 40/60 seconds, 8 threads running
I0104 03:38:51.277688 242 benchmark.hpp:211] waiting for 50/60 seconds, 8 threads running
I0104 03:39:01.277866 242 benchmark.hpp:219] waiting for threads terminated
FPS=4540.63
We got 4540 frames / second, which is a pretty impressive performance.
> Building the AIaaS Docker ImageNow, as we have VitisAI up and running, we can prepare the Docker image for the AIasS on VCK5000 server.
To build the Docker image we need to navigate to the Backend
folder and ran:
$ docker build .
Then we can launch the new image as:
Vitis-AI $ ./docker_run.sh <image-id>
To be able to run the VCK5000 server a couple of steps is needed to be done:
- run the
./setup.sh
script to prepare the VitisAI environment - copy the
/VitisAI-MicroApps
folder to/workspace/demo/Vitis-AI-Library/samples/
, and compile them using the./build.sh
script - compile the API Server using the
mvn package
command, and copy the resulting.jar
file to/workspace
After that we can launch the API Server as:
$ java -jar /workspace/vitis-aiaas-0.0.1-SNAPSHOT.jar
This will expose the REST API on port :8080
, and we should be able to call it either from local or from remote machine.
As we saw this project is still in the proof-of-concept (PoC) stage, with room for many many new features and improvements.
Short term features would implement more hardware accelerated AI workloads like for different domains like Automotive, Medical, Virtual Reality and others. It would also be useful to add more video processing endpoints.
Long term goals would be to:
- provide auto-generated clients for different programming languages (Python, Java, C++, etc.)
- support for custom models
- support for video streaming endpoints
- support for customizing models by pruning / re-training
- support for additional AI/ML frameworks as TensorFlow, Caffe and others
- migrate the project to Vitis AI 2.0
- add authorization and tracking features
- and many others
Hope you enjoyed this project! :)
Comments