Created July 24, 2024

ControlNet Acceleration on RadeonPro W7900

Accelerating ControlNet Pipeline through ROCm, Pytorch etc. Making it comparable performance as A100.

Things used in this project

Hardware components

ThinkStation P620

RadeonPro W7900

Software apps and online services

ROCm

Story

ControlNet-Acceleration-on-RadeonProW7900

Accelerating ControlNet Pipeline through ROCm, Pytorch etc. Making it comparable performance as A100.

Contents

Task Brief
Environment Settings
Code explaination
References

Task Brief

ControlNet Introductions

intro based on Original Code Repo.

ControlNet 1.0 Adding Conditional Control to Text-to-Image Diffusion Models is a neural network structure to control diffusion models by adding extra conditions.

It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.

The "trainable" one learns your condition. The "locked" one preserves your model.

Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models.

The "zero convolution" is 1×1 convolution with both weight and bias initialized as zeros.

Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion.

No layer is trained from scratch. You are still fine-tuning. Your original model is safe.

This allows training on small-scale or even personal devices.

This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers.

Stable Diffusion + ControlNet

By repeating the above simple structure 14 times, we can control stable diffusion in this way:

In this way, the ControlNet can reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls. Many evidences (like this and this) validate that the SD encoder is an excellent backbone.

How to Accelerate this generation pipeline

Environment Settings

Hardware Preparation and OS Setup

Hardware elements

Regarding my choice of hardware, I consulted the product brief on AMD's official website for the RadeonPro W7900, which is documented at around 295W, though I measured 241W using rocm-smi. I also prioritized the stability of the power supply, opting for a 1000W unit.

I rented a server with an AMD 3945WX processor and 128GB of RAM, and installed the RadeonPro W7900 in the PCIe slot.

Here is system info:

❯ lscpu
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: AuthenticAMD
Model name: AMD Ryzen Threadripper PRO 3945WX 12-Cores
❯ free -h
total used free shared buff/cache available
Mem: 125Gi 35Gi 951Mi 9.0Mi 89Gi 89Gi
Swap: 14Gi 14Gi 0B

Power Settings(optional)

No need when running this repo but recommended when training!

To avoid overheat, I put the server in a data center with cooling systems which can control the environment temperature to about **19** degree celcius, to reducing the heat pressure, I used LACT to surpress the powerCap to **220W**, and finally got about 73 degree celcius for this card.

Note that this is only the apparent Pwr Consumption of the card, and the actual peak power consumption is more than this.

OS version and driver choices

The os versioni that shown in uname -a is:

❯ uname -a
6.2.0-26-generic #26~22.04.1-Ubuntu

The version of RadeonPro for Enterprise GPU Driver is, and can be found here:

amdgpu-install_6.0.60002-1_all.deb # my version#
# install commands from AMD offical website with ROCm6.1.3
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.1.3/ubuntu/jammy/amdgpu-install_6.1.60103-1_all.deb
sudo apt install ./amdgpu-install_6.1.60103-1_all.deb
sudo amdgpu-install -y --usecase=graphics,rocm
sudo usermod -a -G render,video $LOGNAME

Set Groups permissions

sudo usermod -a -G render,video $LOGNAME
sudo reboot

Post-install verification checks

Verify that the current user is added to the render and video groups.

# 1 check groups
groups
#output
<username> adm cdrom sudo dip video plugdev render lpadmin lxd sambashare

Check if amdgpu kernel driver is installed.

# 2 check gpu driver kernel module
dkms status
#output
amdgpu/x.x.x-xxxxxxx.xx.xx, x.x.x-xx-generic, x86_64: installed

Finally, check rocminfo:

# 3 check rocminfo (very important and can affect the following steps, especially the pytorch cuda support)
❯ rocminfo
# output
[...]
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-d063992628998a27               
  Marketing Name:          AMD Radeon PRO W7900               
  Vendor Name:             AMD
[...]

Software Preparations(Conda Environment)

There are two ways to run this repo, one is through docker, and the other is through conda environment. Here I will introduce the conda environment settings.

Clone this repo

git clone https://github.com/jedibobo/ControlNet-Acceleration-on-RadeonProW7900.git

Conda Environment

First create a new conda environment

conda env create -f environment.yaml
conda activate control

Second replace the nvdia cuda pytorch to ROCm enabled Pytorch. The command can be found in Pytorch offical website.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Third verify pytorch installation and GPU support:

❯ python
Python 3.9.19 (main, Mar 21 2024, 17:11:28)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'AMD Radeon PRO W7900'

FAQ:

1.If return "False" in torch.cuda.is_available(), please first check if "rocminfo" command returns valid information instead of "no permission" errors. If the latter comes up, please refer to this [link](https://github.com/ROCm/ROCm/issues/1211) and this [link](https://github.com/ROCm/ROCm/issues/1798) to resolve this issue.

Model Downloading

Download model control_sd_canny.pth from huggingface, and place it in models dir.

Run the code

python3 compute_score.py

Output image:

Command Line:

Speed Comparison

The speed comparison between A100 and RadeonPro W7900 is shown below table:

This shows potential of RadeonPro W7900 in accelerating the ControlNet pipeline, which is comparable to A100.

Software Preparations(Docker)(Optional method)

For docker setup, I recommend to use root and the official ROCm pytorch docker image. The commands are shown below:

#1. Install Docker following https://docs.docker.com/engine/install/ubuntu/

#2. Pull ROCm docker image
docker pull rocm/pytorch:rocm6.1.3_ubuntu22.04_py3.10_pytorch_release-2.1.2

#3. Run the docker image
sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri -v $PWD:/workspace/ --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 64G --name control-rocm613-pyt212 rocm/pytorch:rocm6.1.3_ubuntu22.04_py3.10_pytorch_release-2.1.2 bash

#4. pull this repo and install python requirements
git clone https://github.com/jedibobo/ControlNet-Acceleration-on-RadeonProW7900.git
pip install -r requirements.txt

Note this step should not change the pytorch version in the docker.

Code explaination

class `amdpervasivecontest with two methods: initialize and process.

initialize method: This method initializes some necessary objects and models. First, it creates a CannyDetector object for edge detection. Then, it loads a pre-trained model and places it on the GPU for computation. Finally, it creates a DDIMSampler object for sampling operations.

process method: This method processes an input image based on various parameters. Here's a step-by-step breakdown:

It resizes the input image to the specified resolution.
It applies the Canny edge detection to the resized image.
It converts the detected map to a PyTorch tensor and prepares it for the model.
It sets a random seed for reproducibility if no seed is provided.
It prepares the conditioning for the model based on the prompts and control map.
It adjusts the model's control scales based on the guess_mode and strength parameters.
It uses the DDIMSampler to generate samples based on the provided parameters.
It decodes the samples to get the final images and returns them.

Some failed attempts

1. ROCm Xformers Installation(both conda and docker)

2.ROCm DeepSpeed Installation(docker)

After these failed attempts, especially the ROCm official repo that is not updated for a long time, which hinders the development of ROCm community project because some of them are not compatible and quite hard to solove for a single participants of this contest.

References

ControlNet official repository

Credits

刘毅波

3 projects • 2 followers

Embed the widget on your own site

ControlNet Acceleration on RadeonPro W7900

ControlNet Acceleration on RadeonPro W7900

Things used in this project

Hardware components

Software apps and online services

Story

ControlNet-Acceleration-on-RadeonProW7900

Task Brief

Environment Settings

Some failed attempts

References

Code

ControlNet-Acceleration-on-RadeonProW7900

Credits

刘毅波

Comments