ProjectOverview:
The Minisforum UM790 Pro with AMD Ryzen 9 7940HS is built around the AMD Ryzen™ 7040 series processors that are built on cutting-edge 4nm process node to deliver outstanding performance and battery life in an ultrathin laptop. Select models offer the new Ryzen AI Engine, the first dedicated AI engine on a Windows x86 processor[1]. Ryzen AI technology is designed to deliver AI-driven capabilities that you’ve never experienced before, in real time, right on your laptop and PC. Therefore Meetig-AI-T is an open source meeting assistant software, taking advantage of this great technology built into the AMD Ryzen™ 7040 series of processors to give your PC the special ability of taking down notes for you during meetings by transcribing your words and that of your peer. You can choose to do what ever you want with the transcript by exporting it or even send it out for summary with Large Language Models like ChatGPT or summarize it locally right inside Meetig-AI-T if you have Ollama installed with Mistral from Mistral AI. Meetig-AI-T also gives you the flexibility of applying effects to your webcam feed and send it to zoom or any meeting platform of your choice. Interesting right? I know you want that for yourself😉, jump to the next part...lol.
Development Journey with RyzenAI_1.1:This is a short story about my development of Meetig-AI-T with RyzenAI. It's really interesting that RyzenAI enabled PCs can handle multiple inferences at once freeing up a lot of resources for the CPU to handle other things, giving that Ryzen 9 series of CPU have multiple core and gives room for multiple threads of processes to be ran by the CPU, but RyzenAI gives it a lot more super powers to free up the CPU and run the inferences on the in-built NPU of AMD Ryzen 9 7940HS, so I to take advantage of this capability, the AI models have to be quantized to int8 to work really well on the NPU according to the documentations of the RyzenAI. After mapping out the features and functionalities of Meetig-AI-T, I decided to run 3 things on the NPU, which are transcription, summarization of transcripts and background removal with yolov8m-seg or yolov8n-seg, so I later discovered that transcription won't be possible as quantized version of whisper produced a lot of implausible hallucinations, so I told the AMD developers about it and they advised to do away with it for the mean time. Which means transcription is out of the NPU it will be ran directly on the CPU with whisper locally. Then for summarization, to use open source LLMs with my use case, I quantized the model like expected, to use the model I have to make use of the huggingface transformer module, which really works well for text generations are work with very "short texts", I said short texts because whatever input I give to the model it will reply me back with same texts and desired answers, I tried every possible thing to make it work without giving me back my inputs, nothing really worked. Going on and working with the model I encountered a lot of trouble because, for my kind of use case pipeline model optimization is the only way to go about it, of which RyzenAI is not supported for pipelining in the huggingface transformer, huggingface have a pipeline implementation for the AMD hardware accelerators specifically (RyzenAI and ROCm)..but text generation optimization with LLMs is not supported for RyzenAI... again? 😩...I had to switch to another alternative which is the Ollama, I picked Mistral as a model of choice as it's more robust than the initial model I used (Llama2). With that the only thing left to handle on the NPU is the background removal with yolov8m-seg, the model available AMD model zoo is the yolov8m which doesn't support segmentation like the v8m-seg versions it works well but only worked for detection. To get around it, I quantized the yolov8n-seg int8 to use on the NPU. The inferencing was very slow and there was no results displayed for the inference sessions instead I kept getting errors to provide scales of the model for onnx to use after a lot of tweaks. I decided to add effects to the videos being sent to zoom by the virtual camera since background removal is not currently available, so I chose the Anime Effects to use with AnimeGAN, I found the models, I quantized them to int8 I encountered same problems I encountered with yolov8n-seg, then I decided to try something new and decided to quantize them to float16 and see what happens with the NPU, and viola, every model worked🎉🎉🎉...so that's how the anime effects came about in Meetig-AI-T, they run on the NPU but the inferences is rather a little slow, but faster than float32 but it works. And yes for the yolo effect, it's meant to remove background, that will come later as Meetig-AI-T is still a pre-release.
Observations ofRyzenAI_1.1:
- Float16 works on the RyzenAI NPU but the inference is not really fast which I think it's that way because I was applying it on a live camera and video, but I really hope that it more robust with RyzenAI_1.2 and later.
- The RyzenAI quantization documentation needs to be more specific with quant requirements are parameters, most were not clearly outlined.
- Int8 inferencing on the NPU takes a lot more blows that what is required, most models mostly don't work because the quantization parameters are not clearly specified. Really works well with LLMs for text generation.
- Pipeline optimization for LLMs doesn't currently work with RyzenAI.
BuildIt:
This process covers the installation of drivers for RyzenAI version 1.1, where as RyzenAI 1.2 was released July 29th 2024, which was very close to the end of the contest.
I will guide you through the process of building Meetig-AI-T for yourself in this step. So let's dive into it.
Enable the IPU/NPU:To enable the NPU on the UM790 Pro, boot into BIOS mode. To boot into BIOS, while powering up your PC and hold the "Delete" key on your keyboard, you should see a page that comes up, click on "Advanced" and open the "CPU Configuration" like this:
After opening the CPU Configuration you should see another page, go to the IPU Control section and set it "enabled"
After doing that save it and boot into your PC normally.
Install The Drivers:After booting into your PC, to be able to use Meetig-AI-T, you have to install three drivers: The IPU driver, VB-Cable, and Virtual Camera.
To install the IPU driver, download the driver from here.
Open command prompt in admin mode and execute the bat file
./amd_install_kipudrv.bat
If you did not enable the IPU in Bios, you will get errors.
Ensure that AMD IPU Device driver (Version: 10.1109.8.128, Date:2/13/2024) is correctly installed by opening Device Manager -> System Device -> AMD IPU Device.
Moving on, after installing the NPU driver, you have to install virtual microphone, the Virtual microphone being used is the VB-Cable which can be downloaded from VB-Audio. After downloading, extract the folder and run the setup file. When the installation is done, you should see it "CABLE Input (VB-Audio Virtual Cable)" among your list of sound output devices like below.
With virtual cable requirement met, let's install the Virtual camera that handles the zoom video effects from Meetig-AI-T.
You can use either the OBS Virtual Camera or install the Unity Capture Camera, Meetig-AI-T will pick from the first virtual camera it finds out of the two of them. Learn more about the camera picking technique from pyvirtualcam GitHub.
In my case I'm using the Unity Capture Camera, so let's install it. First download it from GitHub with the Download ZIP button or by cloning the repository.To register the Unity Capture Camera to be available in Windows programs, run the Install.bat inside the Install directory.
Make sure the files in the install directory are placed where you want them to be you can't move them about, best is to create a separate folder for them. If you want to move or delete the files, run Uninstall.bat first. I put mine in the C:\ directory
If you have problems registering or unregistering, right click on the Install.bat and choose "Run as Administrator".The script Install.bat registers just a single capture device usable for capturing a single Unity camera. If you want to capture multiple cameras simultaneously you can instead run the InstallMultipleDevices.bat script which prompts for a number of capture devices you wish to register.
Note: Register only one camera to avoid unidentifiable errors.
After installation you should see it amongst your list of available Cameras. Let's look at it on zoom.
With these step done, you are ready to start building Meetig-AI-T, if you don't want to build it no problems all you have to do is jump to the "Usage" part of this article and have all the drivers installed, I will be releasing a prebuilt version of Meetig-AI-T later (keep an eye on the GitHub page). Let's keep moving.
Install Visual Studio 2019 Community Edition. You need VC 2019 Community Edition(Free). So you need Microsoft account. Go to Visual Studio Older Downloads page and join Dev Essentials program and download 2019 version. I chose the following:"Python development""Desktop development with C++"
Install CMakeGo to Cmake download page and downloadI used Windows x64 Installer(cmake-3.29.0-windows-x86_64.msi).And choose "Add Cmake to the system PATH for the current user"Install MinicondaGo to miniconda download page and download it.I used Miniconda3-latest-Windows-x86_64.msi.
Add miniconda to environment variables
I didn't install python as miniconda has version 3.12 installed in it.
Install RyzenAI software
Download RyzenAI-SW-1.1 installation package(ryzen-ai-sw-1.1.zip) and extract it.
The Ryzen AI Software directory will be referenced many times later during inference, so make sure to store it in a fixed location such as under the C:\, not in a temporary location just like you did for the UnityCapture Camera.
Start CMD as administrator.
cd \ryzen-ai-sw-1.1
.\install.bat
A conda environment will be created. To see your list of conda environments use
conda env list
You will see your newly created env by the.bat script.
Initialize conda:
conda init
Activate your virtual environment with:
conda activate your-env-name
Run your test script
cd quicktest
Python quicktest.py
You should see something like this if everything is working correctly.
[ W : o n n x r u n t i m e : , s e s s i o n _ s t a t e . c c : 1 1 7 1 o n n x r u n t i m e : : V e r i f y E a c h N o d e I s A s s i g n e d T o A n E p ] R e r u n n i n g w i t h v e r b o s e o u t p u t o n a n o n - m i n i m a l b u i l d w i l l s h o w n o d e a s s i g n m e n t s .
Test Passed
Once test is passed you can close the terminal, to continue you have to create a conda environment to build Meetig-AI-T or use the one that the RyzenAI created for you (I used this option).
Now clone the Meetig-AI-T repository to your desired directory and install the requirements.
Open a terminal, then
~ git clone https://github.com/zeeblaze/meetig-AI-T.git
~ cd meetig-AI-T
~ conda activate your-env-name
~ pip install -r requirements.txt
Now you can run Meetig-AI-T
python App.py
If all is successful, you should see your beautiful Meetig-AI-T UI (very basic though...lol..)
Usage:On successful launch of the Meetig-AI-T you should see the UI like below, congratulations 🎉 🎉
In this usage example, I will be using zoom as my meeting platform of choice, you can use anyone you like. Let's get started.
When Meetig-AI-T is launched, to transcribe the voices of your peers from the meeting, launch zoom and select the VB-Audio Virtual Cable that we installed previously as your speaker device (Cable Input)
All the audio coming from your peers in the meeting will be routed through to Meetig-AI-T through that audio virtual cable.
To start transcription in Meetig-AI-T. Pick "Vmic (VB-Audio Cable)" from the drop down menu
If you pick the "Default Mic" your voice will be the one to be transcribed, and if you pick both all will be transcribed which is you and your meeting peers.
To start transcribing, go to the "Transcriber" action bar and select a transcriber, in my case I picked the openAI whisper for local transcription
Then select a transcription method, there are two methods, one to allow you listen to the meeting while transcribing and one to allow you transcribe only and not listen to it. In my case, I just want to transcribe and not listen so I picked the "Transcribe Only" method
When a method is picked, then you can click on "Transcribe Audio" from the audio actions frame and you should see your transcriptions in the transcript output section. After you stop the transcriptions you can choose to summarize it, but you have to pick an LLM to use. In my case, I installed mistral in my ollama, so I'm using that.
Then you can click on "Summarize" to get the meeting summary in the summary output section. With that out of the way, you can also use Meetig-AI-T to stream a virtual effect to the meeting instead of your direct webcam. All you have to do is select your previously installed virtual camera as your meeting camera, in my case I'm using the Unity Capture
Then from Meetig-AI-T UI you can select either a video from your files or your webcam, if you choose to use a video from your gallery then you don't necessarily have to select a camera for Meetig-AI-T, but I'm using the webcam in my case, so I have to select a camera or Meetig-AI-T will crash on me. So select the camera first from camera action
Then you can go ahead and select "webcam" form "Video" action bar. To stream your videos to Zoom, check "Stream to Vcam" to streaming to zoom, you can uncheck it. To apply effects to your video, quantize your models with the script in the git repo or download the pre-quantized one and put it in your models folder and you are good to go. Then select an effect from a list of effects in the effects drop down
Then you can check "Apply Effect" to apply the effect to the video. I picked the one of the GAN effects here
After applying the effect, if you are streaming to zoom, the effect will also be applied to the zoom camera.
Background removal is not currently implemented (I have to submit before the deadline...lol)
Note: Firstly, if you are using a RyzenAI enabled PC, if you apply any effect, Meetig-AI-T will use the NPU as the default for inferencing. If NPU is not available then it will use the CPU (for PCs without RyzenAI) you can modify this in the code to use the CPU or Even CUDA. Secondly, if you are using CPU or CUDA the "cuteGAN" may not work because all the models are quantized to float16 to be able to inference on the NPU. A demonstration video is linked below. Enjoy!!
Conclusions:Like I said, all code will be available on GitHub you can modify it however you like and you can change it to use the float32 models.
Comments