Team AudioVision:

Pranjal Mishra 21BCE5088

•

Ishaan Sharma 21BCE5019

•

Mohammad Abbas 21BCE1717

•

Gaurav Beswal 21BRS1555

•

Prajwal Parashar 21BCE5076

Published July 31, 2024 © GPL3+

AudioVision: Enhancing Historical Footage with GENAI

a GenAI V2A(video-to-audio) Modelthat Colorizes, Upscales, and Generates synchronized audio for old raw footages(BnW).

IntermediateWork in progressOver 2 days302

Generative AI - Radeon Pro W7900: 3rd Place

Pervasive AI Developer Contest

AudioVision: Enhancing Historical Footage with GENAI

Things used in this project

Hardware components

AMD Radeon Pro W7900 GPU

Software apps and online services

ROCm

PyTorch

Snappy Ubuntu Core

NGROK

Story

A V2A(Video-to-Audio)GenAI model that Colorizes, Upscales, and Generates audio to a raw Black and White mute video.

Colorized frame from ""Migrant Mother" by Dorothea Lange (1936)"

Colorization using DeOldify :

A modified and fine-tuned DeOldify model to add color to black-and-white videos.We used transfer learning techniques to fine-tune the model on a custom dataset, enhancing its ability to accurately colorize videos. Various data augmentation techniques, such as rotation, flipping, and color jittering, were applied to increase the diversity of the training data, ensuring the model's robustness.

Here is the sample video:

After it's colorization using deOldify, that's the output that we got:

DeOldify architecture:

DeOldify architecture

Upscaling with ESRGANs:

using fine-tuned ESRGANs to upscale the video resolution, we got this output:

Architecture:

REAL-ESRGAN

Generating Sound with SRVCGANs:

The model was trained to generate audio that is synchronized with the lip movements ,transcripts and actions in the video. Data augmentation techniques, such as time-stretching and pitch-shifting, were used to improve the model's robustness to variations in speech and sound.

Integrating Models into a Pipeline:

To achieve seamless video processing, we integrated all our AI models into a cohesive pipeline. This pipeline processes the video through each model in sequence, ensuring structured and efficient handling of video input and output.

This pipeline features a modular design, with each step implemented as a separate module, allowing for easy maintenance, updates, and scalability. Efficient data handling techniques manage the large volume of video data, optimized frame extraction, and reassembly methods. Parallel processing techniques are employed to speed up the pipeline, taking full advantage of the Radeon Pro W7900 GPU's capabilities.

PIPELINE

Conclusion:

In this project, we successfully developed a comprehensive generative AI solution that colorizes, upscales, and adds sound to videos using the Radeon Pro W7900 GPU. By leveraging state-of-theart models like DeOldify, REAL-ESRGANs, and SRVCGANs, we ensured high-quality video enhancement. Our innovative use of HIPifly allowed us to port CUDA-based models to the ROCm platform, ensuring compatibility with AMD GPUs. The integration of these models into a seamless pipeline and their deployment through a Flask web server demonstrates and gradio for UI the practical applicability and robustness of our approach. This project not only showcases the power of generative AI in multimedia processing but also highlights the importance of hardware-software synergy in achieving optimal performance. We hope that our work inspires further exploration and development in the field of AI-driven video enhancement.

However it is to be noted that we received the hardware later than the scheduled time by 2 months which resulted in the delayed start of the project considering which the company extended the deadline for us by two weeks (The new deadline for the submission of the projectis 15th August, 2024)

Code

Credits

Thanks to DeOldify and Real-ESRGAN.

Comments

Please log in or sign up to comment.

Awards

Generative AI - Radeon Pro W7900: 3rd Place

Pervasive AI Developer Contest

AudioVision: Enhancing Historical Footage with GENAI