Created July 28, 2024

DiffusionX - Refining Image Generation

A precise stable diffusion model to rectify the output until the generated image is up to the user requirement.

Things used in this project

Hardware components

Minisforum Venus UM790 Pro with AMD Ryzen™ 9

LCD Monitor, Widescreen

Software apps and online services

AMD Vitis Unified Software Platform

Story

The on-device Generative AI content landscape has been predominantly shaped by LLMs, excelling in text generation. However, with advancements in hardware like NPUs, image generation models are poised to enter this arena. Text-to-image models, once accessible only to cloud-based platforms, are now within reach of on-device deployment. This project explores the potential of refining Stable Diffusion, a leading text-to-image model, for on-device applications. By combining Stable Diffusion's generative capabilities with a CNN for object detection and rectification, we aim to create a system that produces high-quality images while addressing potential shortcomings.

Stable Diffusion

Contributions to Stable Diffusion are growing rapidly along with improvements in aspects such as image quality, accessibility and adaptability. Using the pipeline architecture provided by Hugging Face in support of the AMD Ryzen A.I. architecture it is as easy to test the model on your device with a few lines of code:

import torch
from diffusers import StableDiffusionPipeline

# Load the Stable Diffusion model
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# Generate an image
prompt = "a beautiful cat sitting on a couch"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("generated_image.png")

This requires the dependencies of AMD's RyzenAI to be preinstalled along with the PyTorch package and an environment setup for utilizing the NPU. The accessible pipeline architecture loads the entire requirements into the operating kernel, this is quite often a lengthy process as the size of a S.D. model increases with improvements in performance. The above model will load multiple file of around 10 Gb into the current kernel to perform the generation tasks.

CNN

The Convolution Neural Network model is utilized to identify the components of the image where in the model uses image recognition over a segment of image to identify the component and separate it from the rest of the image. After identification the pixel location is marked along with size of the object in picture to approximate pixel length and breadth. A simple example of CNN detection :

import torch
import cv2
import numpy as np

# Load the pre-trained CNN model
model = torch.load("object_detection_model.pth")
model.eval()

# Load the generated image
image = cv2.imread("generated_image.png")

# Preprocess the image for the CNN
# ... (resize, normalization, etc.)

# Perform object detection
with torch.no_grad():
    detections = model(image)
    # ... (post-processing to get bounding boxes)

# Rectification (simplified example)
for box in detections:
    x1, y1, x2, y2 = box  # bounding box coordinates
    # ... (crop the object, apply rectification, paste back)

DiffusionX - Refining Image Generation

Things used in this project

Hardware components

Software apps and online services

Story

Stable Diffusion

CNN

Credits

Abhay Bhosle

Comments

Embed the widget on your own site

DiffusionX - Refining Image Generation

DiffusionX - Refining Image Generation

Things used in this project

Hardware components

Software apps and online services

Story

Stable Diffusion

CNN

Credits

Abhay Bhosle

Comments