Not much has been published regarding building a GenOps Platform (MLOps for Generative AI models) and running GenAI workloads on Radeon Pro GPUs using the newest ROCM releases. I believe this makes people hesitant to want to use ROCm and AMD Workstation class GPUs for AI workloads. I want to show that you can run production-class GenAI workflows using AMD GPUs and the latest ROCm releases and educate the open source community with detailed examples of how to leverage AMD AI Hardware for Model Fine-Tuning, Retrieval Augmented Generation (RAG), and serving models at scale using Kubeflow and other Open-Sourced ML and AI tools. Right now there isn't much in the way of guides or tutorials on how to do this using ROCm and AMD GPUs readily available online (eg. either via blogs or Youtube). Even worse much of the AI Hardware ecosystem remains closed sourced with Cloud providers like AWS, GCP, Azure, Lambda Cloud, Runpod.io, etc. only offering options to leverage NVidia GPUs and CUDA. I hope to change that by providing detailed demonstrations, guides and documentation on how to set up AMD GPUs in a Kubernetes environment and run GenAI workloads on them to bring about wider adoption and get more people using AMD GPUs and ROCm so we can all work towards more open science initiatives.
In this project I will be building an open source GenOps platform using Kubernetes based off open source Kubeflow along with some other open source tooling to run Generative AI workloads all centered around the use of AMD hardware, in this case the AMD Radeon Pro W7900 GPU to show that AMD AI hardware is industry leading and a perfect choice for running GenAI applications. I will provide a few working example applications and end-to-end workflows on this platform leveraging the Radeon Pro W7900. The existing implementation of Kubeflow does not currently support the use of AMD GPUs in their notebook or model serving images (only Nvidia and CUDA are currently supported) so I have done the work to build and package up AMD ROCm specific container images for all the examples you will see below. I hope the below guide will be useful both to the AMD community and the open-sourced community as a whole as the documentation guide and demos I have put together showcase how you can leveraging AMD Hardware for running Generative AI workloads on your own systems.
--- Part 1 - Installation and Setup ---While all the steps you need to perform to get this GenOps platform up and running are outlined below. For those that prefer a video walk through of the steps instead I have included a YouTube Video Tutorial below:
[ VIDEO TUTORIAL COMING SOON]Step 1: Installing KubernetesSince our GenOps platform will be running on Kubernetes we will need to first install it. There are many different ways to run Kubernetes, but the simplest by far is using k3d. k3d is a lightweight wrapper to run k3s (Rancher Lab’s minimal Kubernetes distribution) in docker. If you don't already have docker installed you can install it using Snap:
snap install docker --classic
Once docker is installed make sure to create the docker group if it doesn't already exist and add yourself to the group so you can run docker commands without the use of sudo. You will also need to update the permissions on /var/run/docker.soc so that you can run docker commands without sudo or root if you are part of the docker group:
sudo groupadd docker
sudo usermod -a -G docker $USER
newgrp docker
sudo chown root:docker /var/run/docker.sock
Now let's install the latest release of k3d:
wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
We can now create our very first Kubernetes cluster. You can give your cluster any name, but I have chosen to call mine "aiserver":
k3d cluster create aiserver
K3d will automatically register the new cluster with your.kube config file. In order to access the cluster and issue commands you will need to install kubectl if you dont already have it installed. The easiest way to do this is through snap:
sudo snap install kubectl --classic
To test and verify that kubectl can connect to your new cluster run the following command which should provide the details of the nodes in your cluster. In this case you will only have 1 node (the PC you are running this on):
kubectl describe nodes
Notice that your cluster does not currently have any allocatable GPU resources. Don't worry we will fix that in the next step.
Finally you can see what resources are running on your cluster with the following command (you should see a bunch of system related services, deployments and pods running in the kube-system namespace):
kubectl get all --all-namespaces
Step 3 - Install the ROCm k8s Device PluginIn order for our Kubernetes Cluster to detect our Radeon Pro GPU you will need to install the ROCm k8s Device Plugin provided by AMD. Details for this plugin can be found at the ROCm/k8s-device-plugin repo on Github. To install the plugin as a deamonset in the kube-system namespace run the following:
kubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml
To verify that the cluster now has access to your GPU you can describe your node as before and check the details. Be sure to give it a minute or so for the amdgpu-device-plugin-daemonset to finish starting up:
kubectl describe nodes
You should now see a reference to amd.com/gpu listed in the node's Capacity and Allocatable section with the number of GPUs you have in your system.
Capacity:
amd.com/gpu: 1
cpu: 32
ephemeral-storage: 959786032Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 64936596Ki
pods: 110
Allocatable:
amd.com/gpu: 1
cpu: 32
ephemeral-storage: 933679851198
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 64936596Ki
pods: 110
Note if you have an integrated AMD GPU or multiple GPUs in your system the amd.com/gpu will show more than 1.
Step 4 - Installing KustomizeTo install and setup the GenOps platform we will be using a fork of the kubeflow/manifest repository. The original Kubeflow manifest files use Kustomize to generate the Kubernetes yaml files that deploy the resources onto the cluster. Deploying apps with Kustomize is an alternate to using something you may be more familiar with such as Helm. There are several ways to install Kustomize, but since we are on Ubuntu we will use snap again to install it:
sudo snap install kustomize
Step 5 - Deploying Kubeflow onto your ClusterBefore we can deploy Kubeflow to our cluster we need to first make a couple changes to the Linux kernel subsystem to support running many pods. If this is not done some of the pods will fail to run with an error stating "too many open files
." To fix this run:
echo sysctl "fs.inotify.max_user_instances=2280" | sudo tee -a /etc/sysctl.conf
echo sysctl "fs.inotify.max_user_watches=1255360" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
Next clone the custom Kubeflow manifests repo. Traditionally Kubeflow is an MLOps platform meant to run many ML workloads. We will be installing Kubeflow from a fork of the kubeflow/manifest repo which I have created that makes a bunch of modifications to turn Kubeflow from an MLOps platform into a GenOps platform, one more suitable to run AI workloads using AMD GPUs and ROCm. This version includes new menu options specific to Gen AI such as the Model Hub, Weights & Biases, etc. along with container images I built specifically for AMD GPUs. For example the default Notebook container comes preinstalled with ROCm for the AMD Radeon Pro W7900 GPUs.
Clone the custom repo as follows:
git clone git@github.com:farshadghodsian/kubeflow-manifests.git
Once
the repo has been cloned we will want to choose a new password for our default user before we deploy Kubeflow. To do this we will first need to encrypt our password by using a python library called passlib:
python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'
Running the above will prompt you to enter a password and will display the bcrypt hash of your password.
Password:
$2y$12$XIHDYKU4ddCWTNuAxXSYmO76exJVHZpQ29k6JdFEmCfXUGblnUwCS
Once you have this hash edit the kubeflow-manifests/common/dex/base/dex-passwords.yaml
file and replace the DEX_USER_PASSWORD
with the new hash.
Now that that is done you can go into the kubeflow-manifests folder in your terminal and run the following to install Kubeflow:
cd kubeflow-manifests
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done
The above while loop will look through all the manifest files and use Kustomize to generate the yaml files needed to deploy all the components of Kubeflow to the Kubernetes cluster. The command will loop several times waiting for some pods to be running before other pods can be deployed. Be patient and eventually the command will exit once all containers have been deployed. The install of Kubeflow can take 10 minutes or longer so site back or go grab something to eat or drink while you wait.
To check the status of all the newly deployed pods on your cluster open a new terminal and run the following:
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n sandbox
Don't be alarmed if you see some of the pods in a failed state with an "ImagePullBackOff"
error. This is due to the fact that you are making too many download requests to the container registries that host the Kubeflow images and are getting rate limited. This issue should resolve itself if you wait 5-10 minutes. Eventually all pods should be in a Running
state.
As you can see from the large number of pods that have now been deployed to your cluster Kubeflow is made up of many different components. While I wont go into too much detail on all of the inner workings of Kubeflow here you can watch a Youtube video of me explaining the components in more detailhere. What I will say here is that the main entry point into your newly deployed GenOps platform is via the istio-ingressgateway. The istio gateway acts as a kind of load balancer into your cluster and routes connections from outside the cluster into the appropriate namespace and pod using what are called virtual services. We will ignore that for now and just run the below command to expose the istio-ingressgateway which runs on port 8080 to our local computer on port 8081. You may already have another app running on port 8080 like I do hence why I have chosen port 8081.
kubectl port-forward -n istio-system $(kubectl get pods -l app=istio-ingressgateway -o jsonpath='{.items[0].metadata.name}' -n istio-system) 8082:8080 &
You will need to keep the terminal running kubectl port-forward open in order to keep the connection to your the istio-ingress-gateway working. Congrats! You should now be able to access the Kubeflow Central Dashboard via http://localhost:8081. You will first be forwarded to Dex, the authentication service used by Kubeflow, to login. To login user the following:
Username: user@ai.server
Password: [password you setup earlier]
Step 6 - Simplify You Kubernetes Cluster Management with K9SWhile the above kubectl port-forward command comes in handy it is far easier to manage the resources deployed on your cluster using K9S, a handy utility that makes it easier to view all the pods running on your cluster, see their logs, restart pods, and port-forward individual ports as needed. Trust me you will thank me later!To install k9s release page and download and install the most recent k9s_linux_amd64.deb package. I have included the command line commands to do this if you would rather prefer that:
wget install https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_linux_amd64.deb
sudo dpkg -i k9s_linux_amd64.deb
In addition to the above I like to also add these handy aliases to your .bashrc
file to make it easier to run the kubectl command and to switch your default namespace:
echo -e "\n# Kubernetes Shortcuts\nalias k='kubectl'\nks() { set -u; kubectl config set-context --current --namespace=\"\$1\"; }" >> ~/.bashrc
bash
With these two aliases in place you can now simple use "k" to reference "kubectl":
k get pods -A
And use "ks" to switch your default namespace
ks istio-system
Notice that switching your default namespace will now only show pods from that specific namespace when you use a kubectl command
k get pods
NAME READY STATUS RESTARTS AGE
cluster-local-gateway-595b55bdb4-hq7dh 1/1 Running 10 (7h44m ago) 5d
istio-ingressgateway-5698f99697-jlhnn 1/1 Running 10 (7h44m ago) 5d
istiod-d889bdb44-smdq6 1/1 Running 10 (7h44m ago) 5d
You can switch to another namespace such as the main kubeflow namespace:
ks kubeflow
This comes in real handy when as you will see when running K9S that it will default to whatever namespace is set as the default in your current-context.
To run K9S simple type:
k9s
Now to view all pods from every namespace you can press 0.
To switch back to your default namespace press 1.
Press 0
again and find the istio-ingressgateway
from the list of all pods and press Shift+f
to bring up the port forward menu. From here set the container port to istio-proxy::8080
and the local port to 8081
and press OK (see screenshot below).
This has the same effect of the kubectl port-forward command we used earlier, but is much more user friendly. Remember to keep K9S open if you want to keep the port-forward connection active.
You can do other cool stuff with K9S such as viewing the logs of a pod by simply by navigating to the pod and pressing Enter
to ask the containers in that pod and then Enter
again on the container you would like to see the logs for. Press Esc
a couple times to get back out of the logs and to the main screen.
I know the above was a lot, but you will soon be glad you went through all the trouble to setup and install your new GenOps platform as you now have an end-to-end system to run all kinds of Generative AI workloads. Below you will find a number of applications that you can now take advantage of all in one cool Web UI. To save you the trouble of all the reading and in the interest of time I will explain each one via a Youtube tutorial (sorry there are too many tutorials to pack all into one single Hackster project):
Running Jupyter Notebooks with AMD ROCmFixing GPU Device permissions
If you are having trouble accessing the GPU in your notebook you can fix the permissions for the /dev/dri and /dev/kfd devices as sometimes when the notebook starts up for the first time the permissions don't get set properly. To fix them open a terminal session in your JupyterLab notebook and run:
sudo chown root:video /dev/kfd
sudo chown root:video /dev/dri/*
sudo chmod 666 /dev/kfd
sudo chmod 666 /dev/dri/*
Installing NVTop for better GPU monitoring
cd /home/jovyan/
git clone https://github.com/Syllo/nvtop.git
sudo apt -y install libdrm-dev libsystemd-dev cmake libncurses5-dev libncursesw5-dev
mkdir -p nvtop/build && cd nvtop/build
cmake .. -DNVIDIA_SUPPORT=ON -DAMDGPU_SUPPORT=ON -DINTEL_SUPPORT=ON
make
sudo make install # Install globally on the system
rm -rf /home/\$LOGNAME/nvtop #remove nvtop git repo after install
bash #run bash command at the end to register the new `nvtop` command
Large Language Model Deployments via KServe and OllamaYAML to create Ollama PVC volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: ollama-volume
namespace: sandbox
labels:
type: local
spec:
storageClassName: local
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
- ReadOnlyMany
- ReadWriteMany
hostPath:
path: /usr/share/ollama/.ollama/
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-pvc
namespace: sandbox
spec:
storageClassName: local
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
YAML to run Ollama Model Server
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: ollama-server
annotations:
"sidecar.istio.io/inject": "false"
spec:
predictor:
containers:
- name: kserve-container
image: ollama/ollama:0.2.8-rocm
ports:
- name: user-port
protocol: TCP
containerPort: 11434
env:
- name: STORAGE_URI
value: "pvc://ollama-pvc/"
- name: OLLAMA_MODELS
value: "/mnt/models"
- name: OLLAMA_DEBUG
value: "1"
- name: HIP_VISIBLE_DEVICES
value: "0"
resources:
limits:
amd.com/gpu: 1
memory: "48Gi"
cpu: "16"
requests:
memory: "16Gi"
cpu: "8"
Your own Personal ChatGPT with Open WebUIBelow is a demo of how you can run your own ChatGPT UI using the Ollama model server we deployed above and Open WebUI. Although I did not have time to create a full video tutorial on this one due to encountering issues with newer versions of Ollama and Open WebUI breaking the service to service communication. I do plan to fix this and provide a full tutorial video at a later date. This video was taken with an earlier version of the GenOps Platform.
YAML to deploy Open WebUIapiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: open-webui
spec:
predictor:
containers:
- name: kserve-container
image: ghcr.io/open-webui/open-webui:v0.2.5
ports:
- name: h2c
protocol: TCP
containerPort: 8080
env:
- name: OLLAMA_BASE_URL
value: "http://ollama-server.sandbox.svc.cluster.local"
Running VS Code server with Built-In Local Coding Assistant[ VIDEO TUTORIAL TO BE ADDED AT A LATER DATE]
Model Fine-Tuning with TorchTune[ VIDEO TUTORIAL TO BE ADDED AT A LATER DATE]
LivePortrait Facial Animation with ComfyUI on AMD GPUs[VIDEO TUTORIAL TO BE ADDED AT A LATER DATE]
Comments