A V2A(Video-to-Audio)GenAI model that Colorizes, Upscales, and Generates audio to a raw Black and White mute video.
A modified and fine-tuned DeOldify model to add color to black-and-white videos.We used transfer learning techniques to fine-tune the model on a custom dataset, enhancing its ability to accurately colorize videos. Various data augmentation techniques, such as rotation, flipping, and color jittering, were applied to increase the diversity of the training data, ensuring the model's robustness.
Here is the sample video:
After it's colorization using deOldify, that's the output that we got:
DeOldify architecture:
using fine-tuned ESRGANs to upscale the video resolution, we got this output:
Architecture:
The model was trained to generate audio that is synchronized with the lip movements ,transcripts and actions in the video. Data augmentation techniques, such as time-stretching and pitch-shifting, were used to improve the model's robustness to variations in speech and sound.
Integrating Models into a Pipeline:To achieve seamless video processing, we integrated all our AI models into a cohesive pipeline. This pipeline processes the video through each model in sequence, ensuring structured and efficient handling of video input and output.
This pipeline features a modular design, with each step implemented as a separate module, allowing for easy maintenance, updates, and scalability. Efficient data handling techniques manage the large volume of video data, optimized frame extraction, and reassembly methods. Parallel processing techniques are employed to speed up the pipeline, taking full advantage of the Radeon Pro W7900 GPU's capabilities.
In this project, we successfully developed a comprehensive generative AI solution that colorizes, upscales, and adds sound to videos using the Radeon Pro W7900 GPU. By leveraging state-of-theart models like DeOldify, REAL-ESRGANs, and SRVCGANs, we ensured high-quality video enhancement. Our innovative use of HIPifly allowed us to port CUDA-based models to the ROCm platform, ensuring compatibility with AMD GPUs. The integration of these models into a seamless pipeline and their deployment through a Flask web server demonstrates and gradio for UI the practical applicability and robustness of our approach. This project not only showcases the power of generative AI in multimedia processing but also highlights the importance of hardware-software synergy in achieving optimal performance. We hope that our work inspires further exploration and development in the field of AI-driven video enhancement.
However it is to be noted that we received the hardware later than the scheduled time by 2 months which resulted in the delayed start of the project considering which the company extended the deadline for us by two weeks (The new deadline for the submission of the projectis 15th August, 2024)
Comments