With the advancements of imaging technologies, there is a variety of image types available in the market like VR and stereoscopic images. To efficiently compress these, advanced methodologies are needed. Autoencoder models have shown good compression sizes and quality comparable to JPEG [1], a traditional and extensively used compression algorithm. I decided to experiment with Autoencoder based image compression in the realtime targeting 30 fps to understand if the Xilinx DPU can be used for live video streaming.
An autoencoder model was developed and trained in Tensorflow. The encoder section of the model was deployed to the DPU for testing with Kodak PhotoCD PCD0992. Following was the Autoencoder structure developed.
The autoencoder model was deployed on the Ultra96V2 following Vitis AI 1.1 methodology as per Mario Bergeron's guide
To determine the fps achievable the architectures B512, B1024, and B2034 was implemented and the program was run and profiled. On profiling the model we see the following results along with the latency.
Currently the project is only compressing images at the max theoretical fps of 27.99 as calculated from the profiling of the DPU during runtime. I had loaded the board with few images that I wanted to compress and ran the auto-encoder section of the model on hardware, and then took the compressed numpy array and decoded the image in my laptop.
The project is a step towards implementing Deep Learning algorithms for compressing images of varied types building towards a compressed solution for transmitting 3D image data in compressed steps. Next in line is stereoscopic images!
Howto : Model developmentThe autoencoder model was developed using simple keras layers
input_img = Input(shape=(512, 768, 3))
x = Conv2D(32, [3,3], strides=(1,1), activation="relu")(input_img)
x = Conv2D(32, [3,3], strides=(2,2), activation="relu")(x)
x = Conv2D(64, [3,3], strides=(1,1), activation="relu")(x)
x = Conv2D(64, [3,3], strides=(2,2), activation="relu")(x)
encoded = Conv2D(32, [3,3], strides=(2,2), activation="relu")(x)
encoder = Model(input_img, encoded)
latentInputs = Input(shape=(62, 94, 32))
y = Conv2DTranspose(32, [3,3], strides= (2,2), activation = "relu")(latentInputs)
y = Conv2DTranspose(64, [3,3], strides= (2,2), activation = "relu")(y)
y = Conv2DTranspose(64, [3,3], strides= (1,1), activation = "relu")(y)
y = Conv2DTranspose(32, [3,3], strides= (2,2), activation = "relu")(y)
y = Conv2DTranspose(32, [3,3], strides= (1,1), activation = "relu")(y)
decoded = Conv2DTranspose(3, [4,4], activation = "relu")(y)
decoder = Model(latentInputs, decoded)
autoencoder = Model(input_img, decoder(encoder(input_img)))
#autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
encoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
We must train the model on dataset where the input and output data is same.
#To checkpoint training
filepath="k_model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True)
callbacks_list = [checkpoint]
#Training section
H = autoencoder.fit(x_train, x_train,
epochs=25,
batch_size=16,
shuffle=True,
validation_data=(x_valid, x_valid),
callbacks = callbacks_list)
The trick to separate the encoder from the decoder in Autoencoder is to save the model only of encoder post training.
encoder.save("encoder_weights.h5")
decoder.save("decoder_weights.h5")
The encoder_weights.h5 file needs to be processed through the Vitis AI compilation flow.
Howto : Model DeploymentEssentially VITIS AI works in the following steps going from generating hardware from DPU TRD folder to preparing the SDCARD. (For details I recommend Mario Bergeron's post)
- Create Platform files. I used the platform design provided by Avnet.
- Customize scripts to configure the DPU
- Generate the final platform implementation by following the Vitis flow for generating DPU TRD.
- Use the platform hwh file for producing a dcf file which is necessary later for model compilation.
- Freeze tensorflow model using keras_2_tf.py
- Quantize tensorflow model from FP32 to INT8 using VAI_Q
- Compile tensorflow model into DPU Instruction code using VAI_C
- Convert the elf file into a shared library so that python can refer to it during runtime
You can refer to the github code which has the compilation scripts in the order of
- start_cpu_docker.sh
- run_my.sh
- finally need to convert the elf into shared library for which refer to Vitis AI User guide under edge flows.
Comments