Created October 8, 2020 © MIT

Image compression in FPGAs using Xilinx DPUs

Designing an autoencoder based compression technique to compress images at near realtime 30 fps.

BeginnerFull instructions providedOver 1 day51

Image compression in FPGAs using Xilinx DPUs

Things used in this project

Hardware components

Avnet Ultra96-V2

This was used along with the Power supply and the SD Card.

Software apps and online services

Ubuntu 18.04.1

AMD PetaLinux

Story

With the advancements of imaging technologies, there is a variety of image types available in the market like VR and stereoscopic images. To efficiently compress these, advanced methodologies are needed. Autoencoder models have shown good compression sizes and quality comparable to JPEG [1], a traditional and extensively used compression algorithm. I decided to experiment with Autoencoder based image compression in the realtime targeting 30 fps to understand if the Xilinx DPU can be used for live video streaming.

An autoencoder model was developed and trained in Tensorflow. The encoder section of the model was deployed to the DPU for testing with Kodak PhotoCD PCD0992. Following was the Autoencoder structure developed.

1 / 3 • Autoencoder model for Image compression

The autoencoder model was deployed on the Ultra96V2 following Vitis AI 1.1 methodology as per Mario Bergeron's guide

To determine the fps achievable the architectures B512, B1024, and B2034 was implemented and the program was run and profiled. On profiling the model we see the following results along with the latency.

1 / 4

Currently the project is only compressing images at the max theoretical fps of 27.99 as calculated from the profiling of the DPU during runtime. I had loaded the board with few images that I wanted to compress and ran the auto-encoder section of the model on hardware, and then took the compressed numpy array and decoded the image in my laptop.

1 / 10 • Image of a house from Kodak image database compressed on device and decoded on computer

The project is a step towards implementing Deep Learning algorithms for compressing images of varied types building towards a compressed solution for transmitting 3D image data in compressed steps. Next in line is stereoscopic images!

Howto : Model development

The autoencoder model was developed using simple keras layers

input_img = Input(shape=(512, 768, 3))
x = Conv2D(32, [3,3], strides=(1,1), activation="relu")(input_img)
x = Conv2D(32, [3,3], strides=(2,2), activation="relu")(x)
x = Conv2D(64, [3,3], strides=(1,1), activation="relu")(x)
x = Conv2D(64, [3,3], strides=(2,2), activation="relu")(x)
encoded  = Conv2D(32, [3,3], strides=(2,2), activation="relu")(x)
encoder = Model(input_img, encoded)
latentInputs = Input(shape=(62, 94, 32))
y = Conv2DTranspose(32, [3,3], strides= (2,2), activation = "relu")(latentInputs)
y = Conv2DTranspose(64, [3,3], strides= (2,2), activation = "relu")(y)
y = Conv2DTranspose(64, [3,3], strides= (1,1), activation = "relu")(y)
y = Conv2DTranspose(32, [3,3], strides= (2,2), activation = "relu")(y)
y = Conv2DTranspose(32, [3,3], strides= (1,1), activation = "relu")(y)
decoded = Conv2DTranspose(3, [4,4], activation = "relu")(y)
decoder = Model(latentInputs, decoded)
autoencoder = Model(input_img, decoder(encoder(input_img)))
#autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
encoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])

We must train the model on dataset where the input and output data is same.

#To checkpoint training
filepath="k_model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True)
callbacks_list = [checkpoint]
#Training section
H = autoencoder.fit(x_train, x_train,
epochs=25,
batch_size=16,
shuffle=True,
validation_data=(x_valid, x_valid),
callbacks = callbacks_list)

The trick to separate the encoder from the decoder in Autoencoder is to save the model only of encoder post training.

encoder.save("encoder_weights.h5")
decoder.save("decoder_weights.h5")

The encoder_weights.h5 file needs to be processed through the Vitis AI compilation flow.

Howto : Model Deployment

Essentially VITIS AI works in the following steps going from generating hardware from DPU TRD folder to preparing the SDCARD. (For details I recommend Mario Bergeron's post)

Create Platform files. I used the platform design provided by Avnet.
Customize scripts to configure the DPU
Generate the final platform implementation by following the Vitis flow for generating DPU TRD.
Use the platform hwh file for producing a dcf file which is necessary later for model compilation.
Freeze tensorflow model using keras_2_tf.py
Quantize tensorflow model from FP32 to INT8 using VAI_Q
Compile tensorflow model into DPU Instruction code using VAI_C
Convert the elf file into a shared library so that python can refer to it during runtime

You can refer to the github code which has the compilation scripts in the order of

start_cpu_docker.sh
run_my.sh
finally need to convert the elf into shared library for which refer to Vitis AI User guide under edge flows.

Code

Credits

Debraj Das

1 project • 0 followers

Image compression in FPGAs using Xilinx DPUs

Things used in this project

Hardware components

Software apps and online services

Story

Howto : Model development

Howto : Model Deployment

Schematics

Essentials for carrying out image processing on Ultra96V2

Code

CAE Based Image Compression

Credits

Debraj Das

Comments

Embed the widget on your own site

Image compression in FPGAs using Xilinx DPUs

Image compression in FPGAs using Xilinx DPUs

Things used in this project

Hardware components

Software apps and online services

Story

Howto : Model development

Howto : Model Deployment

Schematics

Essentials for carrying out image processing on Ultra96V2

Code

CAE Based Image Compression

Credits

Debraj Das

Comments

Related channels and tags