Published August 1, 2024 © Apache-2.0

GenOps Platform for End-to-End Generative AI using AMD GPUs

A GenOps Generative AI platform on Kubernetes using Open Source Kubeflow to run Generative AI workflows using the AMD Radeon Pro W7900 GPU

AdvancedFull instructions provided3 hours290

GenOps Platform for End-to-End Generative AI using AMD GPUs

Things used in this project

Hardware components

AMD Radeon Pro W7900

AMD Ryzen 9 7950X3D 4.2 GHz 16-Core Processor

Noctua NH-D15 chromax.black 82.52 CFM CPU Cooler

Asus ProArt X670E-CREATOR WIFI ATX AM5 Motherboard

Corsair Vengeance 64 GB (2 x 32 GB) DDR5-5600 CL40 Memory

Corsair MP600 PRO XT 1 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive

Seasonic Prime TX-1600 Power Supply

Software apps and online services

Ubuntu 22.04.4

AMD ROCM 6.1.2

Pytorch 2.1.2 (or latest)

K3D

AMD GPU device plugin for Kubernetes

Kubeflow

KServe

HuggingFace Hub

Ollama

Open WebUI

k9s

Story

GenOps Platform for End-to-End Generative AI using AMD GPUs

Not much has been published regarding building a GenOps Platform (MLOps for Generative AI models) and running GenAI workloads on Radeon Pro GPUs using the newest ROCM releases. I believe this makes people hesitant to want to use ROCm and AMD Workstation class GPUs for AI workloads. I want to show that you can run production-class GenAI workflows using AMD GPUs and the latest ROCm releases and educate the open source community with detailed examples of how to leverage AMD AI Hardware for Model Fine-Tuning, Retrieval Augmented Generation (RAG), and serving models at scale using Kubeflow and other Open-Sourced ML and AI tools. Right now there isn't much in the way of guides or tutorials on how to do this using ROCm and AMD GPUs readily available online (eg. either via blogs or Youtube). Even worse much of the AI Hardware ecosystem remains closed sourced with Cloud providers like AWS, GCP, Azure, Lambda Cloud, Runpod.io, etc. only offering options to leverage NVidia GPUs and CUDA. I hope to change that by providing detailed demonstrations, guides and documentation on how to set up AMD GPUs in a Kubernetes environment and run GenAI workloads on them to bring about wider adoption and get more people using AMD GPUs and ROCm so we can all work towards more open science initiatives.

In this project I will be building an open source GenOps platform using Kubernetes based off open source Kubeflow along with some other open source tooling to run Generative AI workloads all centered around the use of AMD hardware, in this case the AMD Radeon Pro W7900 GPU to show that AMD AI hardware is industry leading and a perfect choice for running GenAI applications. I will provide a few working example applications and end-to-end workflows on this platform leveraging the Radeon Pro W7900. The existing implementation of Kubeflow does not currently support the use of AMD GPUs in their notebook or model serving images (only Nvidia and CUDA are currently supported) so I have done the work to build and package up AMD ROCm specific container images for all the examples you will see below. I hope the below guide will be useful both to the AMD community and the open-sourced community as a whole as the documentation guide and demos I have put together showcase how you can leveraging AMD Hardware for running Generative AI workloads on your own systems.

--- Part 1 - Installation and Setup ---

While all the steps you need to perform to get this GenOps platform up and running are outlined below. For those that prefer a video walk through of the steps instead I have included a YouTube Video Tutorial below:

[ VIDEO TUTORIAL COMING SOON]

Step 1: Installing Kubernetes

Since our GenOps platform will be running on Kubernetes we will need to first install it. There are many different ways to run Kubernetes, but the simplest by far is using k3d. k3d is a lightweight wrapper to run k3s (Rancher Lab’s minimal Kubernetes distribution) in docker. If you don't already have docker installed you can install it using Snap:

snap install docker --classic

Once docker is installed make sure to create the docker group if it doesn't already exist and add yourself to the group so you can run docker commands without the use of sudo. You will also need to update the permissions on /var/run/docker.soc so that you can run docker commands without sudo or root if you are part of the docker group:

sudo groupadd docker
sudo usermod -a -G docker $USER
newgrp docker
sudo chown root:docker /var/run/docker.sock

Now let's install the latest release of k3d:

wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash

Output from creating your "aiserver" Kubernetes cluster

Step 2: Creating your Kubernetes cluster

We can now create our very first Kubernetes cluster. You can give your cluster any name, but I have chosen to call mine "aiserver":

k3d cluster create aiserver

K3d will automatically register the new cluster with your.kube config file. In order to access the cluster and issue commands you will need to install kubectl if you dont already have it installed. The easiest way to do this is through snap:

sudo snap install kubectl --classic

To test and verify that kubectl can connect to your new cluster run the following command which should provide the details of the nodes in your cluster. In this case you will only have 1 node (the PC you are running this on):

kubectl describe nodes

Notice that your cluster does not currently have any allocatable GPU resources. Don't worry we will fix that in the next step.

Example kubectl describe node command output

Finally you can see what resources are running on your cluster with the following command (you should see a bunch of system related services, deployments and pods running in the kube-system namespace):

kubectl get all --all-namespaces

Step 3 - Install the ROCm k8s Device Plugin

In order for our Kubernetes Cluster to detect our Radeon Pro GPU you will need to install the ROCm k8s Device Plugin provided by AMD. Details for this plugin can be found at the ROCm/k8s-device-plugin repo on Github. To install the plugin as a deamonset in the kube-system namespace run the following:

kubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml

To verify that the cluster now has access to your GPU you can describe your node as before and check the details. Be sure to give it a minute or so for the amdgpu-device-plugin-daemonset to finish starting up:

kubectl describe nodes

You should now see a reference to amd.com/gpu listed in the node's Capacity and Allocatable section with the number of GPUs you have in your system.

Capacity:
  amd.com/gpu:        1
  cpu:                32
  ephemeral-storage:  959786032Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             64936596Ki
  pods:               110
Allocatable:
  amd.com/gpu:        1
  cpu:                32
  ephemeral-storage:  933679851198
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             64936596Ki
  pods:               110

Note if you have an integrated AMD GPU or multiple GPUs in your system the amd.com/gpu will show more than 1.

Step 4 - Installing Kustomize

To install and setup the GenOps platform we will be using a fork of the kubeflow/manifest repository. The original Kubeflow manifest files use Kustomize to generate the Kubernetes yaml files that deploy the resources onto the cluster. Deploying apps with Kustomize is an alternate to using something you may be more familiar with such as Helm. There are several ways to install Kustomize, but since we are on Ubuntu we will use snap again to install it:

sudo snap install kustomize

Step 5 - Deploying Kubeflow onto your Cluster

Before we can deploy Kubeflow to our cluster we need to first make a couple changes to the Linux kernel subsystem to support running many pods. If this is not done some of the pods will fail to run with an error stating "too many open files." To fix this run:

echo sysctl "fs.inotify.max_user_instances=2280" | sudo tee -a /etc/sysctl.conf
echo sysctl "fs.inotify.max_user_watches=1255360" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Next clone the custom Kubeflow manifests repo. Traditionally Kubeflow is an MLOps platform meant to run many ML workloads. We will be installing Kubeflow from a fork of the kubeflow/manifest repo which I have created that makes a bunch of modifications to turn Kubeflow from an MLOps platform into a GenOps platform, one more suitable to run AI workloads using AMD GPUs and ROCm. This version includes new menu options specific to Gen AI such as the Model Hub, Weights & Biases, etc. along with container images I built specifically for AMD GPUs. For example the default Notebook container comes preinstalled with ROCm for the AMD Radeon Pro W7900 GPUs.

Clone the custom repo as follows:

git clone git@github.com:farshadghodsian/kubeflow-manifests.git

Once the repo has been cloned we will want to choose a new password for our default user before we deploy Kubeflow. To do this we will first need to encrypt our password by using a python library called passlib:

python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'

Running the above will prompt you to enter a password and will display the bcrypt hash of your password.

Password:  
$2y$12$XIHDYKU4ddCWTNuAxXSYmO76exJVHZpQ29k6JdFEmCfXUGblnUwCS

Once you have this hash edit the kubeflow-manifests/common/dex/base/dex-passwords.yaml file and replace the DEX_USER_PASSWORD with the new hash.

Edit the dex-passwords.yaml file with your favorite text editor such as VS Code

Now that that is done you can go into the kubeflow-manifests folder in your terminal and run the following to install Kubeflow:

cd kubeflow-manifests

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done

The above while loop will look through all the manifest files and use Kustomize to generate the yaml files needed to deploy all the components of Kubeflow to the Kubernetes cluster. The command will loop several times waiting for some pods to be running before other pods can be deployed. Be patient and eventually the command will exit once all containers have been deployed. The install of Kubeflow can take 10 minutes or longer so site back or go grab something to eat or drink while you wait.

To check the status of all the newly deployed pods on your cluster open a new terminal and run the following:

kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n sandbox

Don't be alarmed if you see some of the pods in a failed state with an "ImagePullBackOff" error. This is due to the fact that you are making too many download requests to the container registries that host the Kubeflow images and are getting rate limited. This issue should resolve itself if you wait 5-10 minutes. Eventually all pods should be in a Running state.

Step 5 - Connecting to the Kubeflow Central Dashbord Web UI

As you can see from the large number of pods that have now been deployed to your cluster Kubeflow is made up of many different components. While I wont go into too much detail on all of the inner workings of Kubeflow here you can watch a Youtube video of me explaining the components in more detailhere. What I will say here is that the main entry point into your newly deployed GenOps platform is via the istio-ingressgateway. The istio gateway acts as a kind of load balancer into your cluster and routes connections from outside the cluster into the appropriate namespace and pod using what are called virtual services. We will ignore that for now and just run the below command to expose the istio-ingressgateway which runs on port 8080 to our local computer on port 8081. You may already have another app running on port 8080 like I do hence why I have chosen port 8081.

kubectl port-forward -n istio-system $(kubectl get pods -l app=istio-ingressgateway -o jsonpath='{.items[0].metadata.name}' -n istio-system) 8082:8080 &

You will need to keep the terminal running kubectl port-forward open in order to keep the connection to your the istio-ingress-gateway working. Congrats! You should now be able to access the Kubeflow Central Dashboard via http://localhost:8081. You will first be forwarded to Dex, the authentication service used by Kubeflow, to login. To login user the following:

Username:  user@ai.server 
Password:  [password you setup earlier]

Step 6 - Simplify You Kubernetes Cluster Management with K9S

While the above kubectl port-forward command comes in handy it is far easier to manage the resources deployed on your cluster using K9S, a handy utility that makes it easier to view all the pods running on your cluster, see their logs, restart pods, and port-forward individual ports as needed. Trust me you will thank me later!To install k9s release page and download and install the most recent k9s_linux_amd64.deb package. I have included the command line commands to do this if you would rather prefer that:

wget install https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_linux_amd64.deb

sudo dpkg -i k9s_linux_amd64.deb

In addition to the above I like to also add these handy aliases to your .bashrc file to make it easier to run the kubectl command and to switch your default namespace:

echo -e "\n# Kubernetes Shortcuts\nalias k='kubectl'\nks() { set -u; kubectl config set-context --current --namespace=\"\$1\"; }" >> ~/.bashrc
bash

With these two aliases in place you can now simple use "k" to reference "kubectl":

k get pods -A

And use "ks" to switch your default namespace

ks istio-system

Notice that switching your default namespace will now only show pods from that specific namespace when you use a kubectl command

k get pods

NAME                                  READY   STATUS      RESTARTS         AGE 
cluster-local-gateway-595b55bdb4-hq7dh 1/1     Running     10 (7h44m ago)   5d 
istio-ingressgateway-5698f99697-jlhnn  1/1     Running     10 (7h44m ago)   5d 
istiod-d889bdb44-smdq6                 1/1     Running     10 (7h44m ago)   5d

You can switch to another namespace such as the main kubeflow namespace:

ks kubeflow

This comes in real handy when as you will see when running K9S that it will default to whatever namespace is set as the default in your current-context.

To run K9S simple type:

k9s

Now to view all pods from every namespace you can press 0. To switch back to your default namespace press 1. Press 0 again and find the istio-ingressgateway from the list of all pods and press Shift+f to bring up the port forward menu. From here set the container port to istio-proxy::8080 and the local port to 8081 and press OK (see screenshot below).

Port-forwarding the istio-ingressgateway to port 8081 using K9S

This has the same effect of the kubectl port-forward command we used earlier, but is much more user friendly. Remember to keep K9S open if you want to keep the port-forward connection active.

You can do other cool stuff with K9S such as viewing the logs of a pod by simply by navigating to the pod and pressing Enter to ask the containers in that pod and then Enter again on the container you would like to see the logs for. Press Esc a couple times to get back out of the logs and to the main screen.

--- Part 2 - The Fun Stuff! ---

I know the above was a lot, but you will soon be glad you went through all the trouble to setup and install your new GenOps platform as you now have an end-to-end system to run all kinds of Generative AI workloads. Below you will find a number of applications that you can now take advantage of all in one cool Web UI. To save you the trouble of all the reading and in the interest of time I will explain each one via a Youtube tutorial (sorry there are too many tutorials to pack all into one single Hackster project):

Running Jupyter Notebooks with AMD ROCm

How to run JupyterLab Notebooks which utilizes an AMD GPU on your new GenOps Plataform

Fixing GPU Device permissions
If you are having trouble accessing the GPU in your notebook you can fix the permissions for the /dev/dri and /dev/kfd devices as sometimes when the notebook starts up for the first time the permissions don't get set properly. To fix them open a terminal session in your JupyterLab notebook and run:

sudo chown root:video /dev/kfd
sudo chown root:video /dev/dri/*
sudo chmod 666 /dev/kfd
sudo chmod 666 /dev/dri/*

Installing NVTop for better GPU monitoring

cd /home/jovyan/
git clone https://github.com/Syllo/nvtop.git
sudo apt -y install libdrm-dev libsystemd-dev cmake libncurses5-dev libncursesw5-dev
mkdir -p nvtop/build && cd nvtop/build
cmake .. -DNVIDIA_SUPPORT=ON -DAMDGPU_SUPPORT=ON -DINTEL_SUPPORT=ON
make
sudo make install # Install globally on the system
rm -rf /home/\$LOGNAME/nvtop #remove nvtop git repo after install
bash #run bash command at the end to register the new `nvtop` command

Large Language Model Deployments via KServe and Ollama

Running an LLM Model Server with Ollama and KServe on your GenOps Platform

YAML to create Ollama PVC volume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ollama-volume
  namespace: sandbox
  labels:
    type: local
spec:
  storageClassName: local
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
    - ReadOnlyMany
    - ReadWriteMany
  hostPath:
    path: /usr/share/ollama/.ollama/
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
  namespace: sandbox
spec:
  storageClassName: local
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi

YAML to run Ollama Model Server

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: ollama-server
  annotations:
    "sidecar.istio.io/inject": "false"
spec:
  predictor:
    containers:
      - name: kserve-container
        image: ollama/ollama:0.2.8-rocm
        ports:
        - name: user-port
          protocol: TCP
          containerPort: 11434
        env:
        - name: STORAGE_URI
          value: "pvc://ollama-pvc/"
        - name: OLLAMA_MODELS
          value: "/mnt/models"
        - name: OLLAMA_DEBUG
          value: "1"
        - name: HIP_VISIBLE_DEVICES
          value: "0"
        resources:
          limits:
            amd.com/gpu: 1
            memory: "48Gi"
            cpu: "16"
          requests:
            memory: "16Gi"
            cpu: "8"

Your own Personal ChatGPT with Open WebUI

Below is a demo of how you can run your own ChatGPT UI using the Ollama model server we deployed above and Open WebUI. Although I did not have time to create a full video tutorial on this one due to encountering issues with newer versions of Ollama and Open WebUI breaking the service to service communication. I do plan to fix this and provide a full tutorial video at a later date. This video was taken with an earlier version of the GenOps Platform.

Demo of how to run your own ChatGPT Assistant with Open WebUI on the GenOps Platform

YAML to deploy Open WebUI

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: open-webui

spec:
  predictor:
    containers:
      - name: kserve-container
        image: ghcr.io/open-webui/open-webui:v0.2.5
        ports:
        - name: h2c
          protocol: TCP
          containerPort: 8080
        env:
        - name: OLLAMA_BASE_URL
          value: "http://ollama-server.sandbox.svc.cluster.local"

Running VS Code server with Built-In Local Coding Assistant

[ VIDEO TUTORIAL TO BE ADDED AT A LATER DATE]

Model Fine-Tuning with TorchTune

[ VIDEO TUTORIAL TO BE ADDED AT A LATER DATE]

LivePortrait Facial Animation with ComfyUI on AMD GPUs

[VIDEO TUTORIAL TO BE ADDED AT A LATER DATE]

Credits

Farshad Ghodsian

1 project • 0 followers

Contact

Thanks to Kubeflow Open Source Community.

Comments

Please log in or sign up to comment.

Embed the widget on your own site

GenOps Platform for End-to-End Generative AI using AMD GPUs

GenOps Platform for End-to-End Generative AI using AMD GPUs

Things used in this project

Hardware components

Software apps and online services

Story

GenOps Platform for End-to-End Generative AI using AMD GPUs

--- Part 1 - Installation and Setup ---

[ VIDEO TUTORIAL COMING SOON]

Step 1: Installing Kubernetes

Step 2: Creating your Kubernetes cluster

Step 3 - Install the ROCm k8s Device Plugin

Step 4 - Installing Kustomize

Step 5 - Deploying Kubeflow onto your Cluster

Step 5 - Connecting to the Kubeflow Central Dashbord Web UI

Step 6 - Simplify You Kubernetes Cluster Management with K9S

--- Part 2 - The Fun Stuff! ---

Running Jupyter Notebooks with AMD ROCm

Large Language Model Deployments via KServe and Ollama

Your own Personal ChatGPT with Open WebUI

YAML to deploy Open WebUI

Running VS Code server with Built-In Local Coding Assistant

Model Fine-Tuning with TorchTune

LivePortrait Facial Animation with ComfyUI on AMD GPUs

Code

kubeflow-manifests

GenOps Docker Images

Credits

Farshad Ghodsian

Comments

Related channels and tags