Sequential image colorization, the process of converting a series of grayscale images into colored ones, holds tremendous potential as a transformative tool across various domains.
One cool application is in manga, where a traditionally black-and-white artform can be brought to life. Our colorization model aims to empower readers by allowing them to automatically colorize their favorite manga, transforming their reading experience. Illustrators and digital artists can also benefit from our model, which can expedite their workflows, allowing them to focus more on creativity and less on manual coloring tasks. Our model is however not restrained to this particular application and can instead be used across a variety of scenarios.
We have created a novel model that combines the strengths of UNet and LSTM architectures, and leverages the high computational capabilities of the AMD Radeon Pro W7900 GPU. By harnessing the power of advanced deep learning techniques and cutting-edge hardware our solution is designed to deliver superior colorization results, vividly bringing grayscale images.
Demo:The proposed model for colorizing comics is structured into three main components:
Initial Convolutional Layer:
- Input Channels: 3 (one L channel and two channels for ab)
- Kernel Size: 9
- Padding: 'same'
- Purpose: This layer processes the initial input using a simple 2D convolution.
U-Net with Pretrained ResNet-18 Backbone:
- Backbone: Pretrained ResNet-18
- Upsampling Method: Pixel Shuffle instead of ConvTranspose
- Purpose: The U-Net architecture, with its skip connections and downsampling/upsampling paths, colors the input image. The obtained image serves as the output
Custom N-Dimensional LSTM:
- Input Shape: (N, C, H, W)
- Outputs: Two outputs, both shaped (N, C, H, W)
- Long-Term Memory: Captures long-term dependencies
- Short-Term Memory: Captures short-term dependencies
- Purpose: This custom LSTM module ensures that the model retains memory of previous pages, maintaining context and continuity in the comic book.
By combining these three components, the model effectively processes and remembers past inputs to generate coherent and contextually aware comic pages.
Architecture Diagram:
Note: This is an extremely simplified version of the original architecture
Training Process:Datasets Used:
Training Datasets:
COCO Dataset:
- Subset Size: Approximately 21, 000 images
- Purpose: The COCO dataset, known for its diversity and complexity, includes images of real-life objects and scenes. Training on this dataset helps the model tackle more intricate coloring tasks. The idea is that if the model can learn to color complex real-life objects and humans, it will be better equipped to handle the relatively simpler task of coloring comic book and manga characters and objects.
Colored Manga Dataset:
- Source: Kaggle
- Purpose: After the initial training on the COCO dataset, the model is fine-tuned on the Colored Manga dataset. Manga images are less complex but have unique stylistic elements. Fine-tuning on this dataset allows the model to adapt to the specific characteristics and styles of manga artwork. This step ensures the model learns to apply appropriate colors to manga characters and scenes, maintaining consistency with the artistic style.
Japanese Manga Dataset:
- Source: Kaggle
- Purpose: A subset of the Japanese Manga Dataset, specifically from the series "Black Clover, " is used to train the model. The aim is to teach the model to colorize manga images based on previous outputs.
Testing Dataset:
Japanese Manga Dataset::
- Source: Kaggle
- Purpose: The Japanese Manga Dataset is used to evaluate the performance of the model after training. By testing on this dataset, which contains various manga styles and themes, the model’s ability to generalize and accurately color different manga artworks can be assessed. This dataset serves as a benchmark to validate the model’s effectiveness and readiness for practical applications.
Pretraining:
- Objective: Teaching the model the fundamental skills of coloring, including understanding color distributions and relationships in images..
- Intuition: If the model grasps the basics of coloring from this initial phase, it can more rapidly and effectively learn to color complex images and adapt to different styles in later stages.
The model undergoes a crucial pretraining phase using a custom script (pretrain.py). This phase is designed to instill the basic principles of coloring into the model. During pretraining, the model learns to recognize and apply appropriate colors to various elements in the images. This foundational learning is essential because it equips the model with the basic skills needed for coloring, making it more efficient in subsequent training stages.
Training:
After pretraining, the model undergoes a training phase with specific parameters:
- Sequence Length: 10
- Batch Size: 5
- Total Images per Batch: 50 (5 separate sequences from the same dataset per batch)
This structured training approach is designed to leverage the foundational skills acquired during pretraining and refine the model’s capabilities further. Each training batch includes multiple sequences, allowing the model to learn and generalize over a diverse set of images.
Key Steps in the Training Process:
Training on COCO Dataset:
- Purpose: Addressing complex coloring tasks involving real-life objects and scenes.
- Outcome: The model learns to handle intricate details and diverse color schemes, preparing it for more specialized tasks.
Fine-Tuning on Colored Manga Dataset:
- Purpose: Adapting the model to color manga characters and objects.
- Outcome: The model fine-tunes its coloring skills to align with the artistic styles and conventions of manga artwork.
Computational Resources:
- Hardware: The training process relies on the powerful GPU provided by AMD.
- Challenges: Due to the extensive calculations involved, such as registering hooks during forward passes, the use of high-performance GPUs is essential to handle the computational load and ensure efficient training.
Evaluation
Finally, the trained model is evaluated using the Japanese Manga Dataset. This testing phase is critical for assessing the model’s performance and generalization capabilities. The evaluation helps identify any areas for improvement and ensures the model is ready for practical applications in coloring comics and other related tasks.
By following this comprehensive training and evaluation strategy, the model is well-prepared to tackle the task of coloring the given images while maintaining context and continuity across pages.
Setup:All code and instructions can be found at our GitHub repository:
https://github.com/snehilchatterjee/ColorNet
Future Work:We have successfully created the full model and trained it to achieve satisfiable results with our current training process. While the scope of our original idea has been achieved, we believe there is more potential to unlock through further training when it comes to capturing temporal and contextual information across a longer sequence of images. However, due to time constraints we were unable to fully explore this potential.
Comments