As a passionate photographer with a keen interest in exploring the underwater realm, I've combined my love for photography with my expertise in software engineering and machine learning. To enhance the often distorted and color-skewed images captured underwater, I've created an advanced application that include a diffusion model and Generative A.I chatbot. This application, running on the NVIDIA JetsonAGX Orin as an edge device, brilliantly tackles the unique challenges of underwater image processing. My goal is to bring the true beauty of underwater scenes to light, ensuring that each photo vividly captures the enchanting world beneath the waves.
Using the NVIDIA Jetson AGX Orin with my diffusion model architecture offers significant benefits. The Jetson AGX Orin, a high-performance edge device, provides the computational power needed for real-time image processing, crucial for underwater image enhancement. It allows the complex operations of the diffusion model to run efficiently in field conditions. This synergy between the hardware and my model architecture ensures faster processing, enhanced image quality, and real-world applicability, making it a powerful tool for vividly capturing the beauty of underwater environments.
The hardware that was used in this project include the Jetson AGX Orin, Keyboard/Mouse, Monitor, and a Logitech USB camera. The Jetson AGX Orin serve as the main compute unit that do all of the image processing and enhancement. The camera provide user an interface to capture in realtime photo and video. Keyboard/mouse and monitor serve as tool to allow the user to interact with the front-end UI of ClearWaters.
Hardware Bill of Materials:
- Nvidia Jetson AGX Orin
- USB Keyboard/Mouse
- Monitor
- Logitech HD USB Camera
The software architecture of ClearWaters include a front-end Streamlit app to allow the user to interact with various feature of the application. The user can either take a photo, enhance an image, or improve a underwater video. All of these option all have it own module for image preprocessing before the actual image is feed into an underwater image enhance model.
When using image enhancement or photo capturing mode, the user is accompany by a Generative A.I chatbot that can do various image restoration task such as image enhancement, adjust lighting, remove image artifacts, etc. When user input a prompt for image enhancement, the A.I will use a specialize underwater image enhancement model otherwise it will use the Instruct Image Restoration module(InstructIR) for other generic image tasks.
The underwater image enhancement model is the centerpiece of this project and this is where an image is transform into it colorful counterpart. The section below describe the model architecture in detail.
Software Bill of Material:
- Streamlit
- Python 3.8
- Jetpack 5.1.2
- Pytorch
- TorchVision
- Git
ClearWaters employed a two stage method of underwater image enhancement model. The first stage consist of using a diffusion model that generate a color corrected image from the original image. This newly generated image remove the the bluish water color an reveal the true color of the photo. The second stage involve using a Generative Adversarial Network (GAN) Upscaler to improve the image resolution and quality,
- Diffusion Model :
The diffusion modelin my architecture is a denoising diffusion probabilistic model. Denoising diffusion probabilistic models are a subset of generative models in the realm of generative artificial intelligence (generative AI). Both denoising diffusion models and other generative models, like GANs and VAEs, aim to learn complex probability distributions inherent in a dataset for generating new, realistic samples. What sets denoising diffusion models apart is their distinctive approach: they generate samples through an iterative denoising process, gradually transforming initial samples into ones resembling the target distribution.
This architecture aims to improve the efficiency and performance of underwater image enhancement by reducing iterations needed and enhancing the quality of the generated images. The model introduces a lightweight transformer for denoising, leveraging channel-wise attention for efficient feature encoding and reducing parameter scale.
- TransformerUpscaler:
The Upscaler is a hybrid attention transformer that is based on the architecture of a swim transformer. The purpose of the Upscaler is to improve the image resolution and allow the user to generate a higher and more detailed image.
The Upscaler's Hybrid Attention Group Transformer use a combination of channel attention and window-based self-attention mechanisms to utilize both global and local information more effectively. The architecture follows a Residual in Residual (RIR) structure with three main parts: shallow feature extraction, deep feature extraction, and image reconstruction.
TrainingThe training process of this model involves several key steps
- Data Collection: A diverse set of underwater images is collected. These images represent various underwater conditions to ensure the model learns to handle a wide range of scenarios.
- Preprocessing: The images are preprocessed to be compatible with the model architecture, including resizing, normalization, and possibly augmentation to enhance the dataset.
- Noise Addition: In the initial phase, the model learns to progressively add noise to the clear underwater images. This step is crucial for the diffusion process.
- Model Training: The core of the training involves the transformer-based neural network learning the reverse process - how to effectively remove noise and enhance the image quality. This is done through iterative training, where the model gradually learns to reconstruct clear images from the noised versions.
- Optimization: The model parameters are continuously optimized to improve performance, focusing on reducing artifacts, enhancing colors, and improving overall image clarity.
- Validation: The model is regularly validated against a set of unseen images to ensure it generalizes well to new data and real-world scenarios.
This training process leverages the computational capabilities of the NVIDIA Jetson AGX Orin, ensuring efficient and effective learning and subsequent real-time application of the model for underwater image enhancement.
Dataset use to train this model:
UIEB ( Underwater Image Enhancement Benchmark Dataset) : link
LSUI ( Large Scale Underwater Image Dataset) : link
Pre-DeploymentThis section will walkthrough all the necessary installation before running the application on the Nvidia Jetson AGX Orin.
- Update the OS
$sudo apt-get upgrade
$sudo apt-get update
$sudo reboot
- Install Jetpack
$sudo apt install nvidia-jetpack
- Install Pytorch
$sudo apt-get -y update
$sudo apt-get -y python3-pip libopenblas-dev
$export TORCH_INSTALL=https://developer.download.nvidia.cn/compute/redist/jp/v511/pytorch/torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
$python3 -m pip install --upgrade pip; python3 -m pip install numpy==’1.26.1’ python3 -m pip install --no-cache $TORCH_INSTALL
- Install Torchvision
$ sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libopenblas-dev libavcodec-dev libavformat-dev libswscale-dev
$ git clone --branch v0.15.1 https://github.com/pytorch/vision torchvision
$ cd torchvision $ export BUILD_VERSION=0.15.1
$ python3 setup.py install --user
$ cd ../
$ pip install 'pillow<7'
- Install project dependencies
$pip3 install -r requirements.txt
DeploymentOnce the model checkpoint been generated from training and all the dependencies been installed, the next step is to deploy an end to end application that allow the user to interact with the model. ClearWaters will leverage Streamlit, a Python framework, to build a Generative A.I application for underwater image enhancement task.
To run ClearWaters use the command below in the project directory:
$streamlit run app.py
ClearWaters provides three main features: photo capturing, image enhancement, and video enhancement.
- Photo Capturing allow user to capture a photo then enhance it using the Underwater Image Enhancement Model.
- Image Enhancement is similar to photo capturing, but the user upload the photo they want to enhance instead of taking one from the camera.
- Video Enhancement let the user upload their own video and enhance it. The app decompose the video into a sequence of images and these a feed into the model for enhancement. Finally, it then recombine the enhance sequence of images back into a video.
Underwater image enhancement method based on denoising diffusion probabilistic model
High Quality Image Restoration with Human Instruction
Comments