There are many benefits to a local LLM. Primarily privacy, ability to scale without heavy costs, and the ability to have fine control over the API. The Jetson MagnaMirror attempts to do just that by taking an old concept and iterating on it just a little bit to provide an awesome "Magic Mirror" experience.
What is MagicMirror2MagicMirror^2 is an open source project that provides software to run a hidden assistant interface that lives within the confines of a mirror. The project uses an acrylic sheet which when nothing is applied to the back appears on the surface as any other mirror. The key is that once light from the screen is enabled the user interface shines through and the user is able to further interact.
Jetson Orin AGXThe Jetson Orin AGX is a powerful developer kit for running local LLMs, generative AI, and other heavy compute related activities.
There are a lot of great guides already for configuring a Jetson Orin AGX including: https://www.hackster.io/shahizat/getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-5a55b5 which can be referenced for further knowledge.
Update and install the associated nvidia jetpack package:
sudo apt update
sudo apt install nvidia-jetpack
Enable max performance mode and set max frequency for the clocks:
sudo nvpmodel -m 0
sudo jetson_clocks
NVMe SSDThe first thing to keep in mind when dealing with the Jetson Orin AGX is that in order to run machine learning models of moderate size you will need to have a NVMe SSD installed. You won't even be able to get Riva, one of the requirements, installed fully without it as it takes up a significant amount of space.
It's outside of the scope of this guide and more of a system setup issue but for completeness sake the first steps are:
- Get and install an SSD on the device
- After formatting and preparing your SSD make sure to change docker's location to use the SSD for containers
For me the following commands were used (where /mnt/storage is my NVMe mount):
sudo vim /etc/docker/daemon.json
Update "data-root" to point to a folder for your docker. In my case I updated it to: "/mnt/storage/docker":
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
"data-root": "/mnt/storage/docker"
}
Make sure to restart docker after:
sudo systemctl restart docker
I believe I may have had to change permissions to get things fully working (was in my notes but if things work for you ignore):
chown -R root:root /mnt/storage/docker
chmod 701 /mnt/storage/docker
HardwareIn addition to the Jetson Orin AGX, a microphone, a speaker, and the monitor you will need to get a frame and an acrylic reflecting sheet. I've included links to the ones we used for this project to make it easier in the future for those following.
Outside of the frame setup, which we go into more detail further on, there isn't much to do aside from plugging in the associated cords for the screen and inputs.
Frame setupngc-cli is needed the UI provides a download link if you're logged in on the following page: https://org.ngc.nvidia.com/setup/installers/cli make sure to add the CLI to your path, create an API key, and login.
RivaWith ngc available you can download and install Riva:
ngc config set
ngc registry resource download-version "nvidia/riva/riva_quickstart_arm64:2.14.0"
Once downloaded you can CD into that directory and test Riva by starting it up:
cd riva_quickstart_arm64_v2.14.0
bash riva_init.sh
bash riva_start.sh
You should see a response like (it will take a bit to run the first time as it needs to download models):
Riva Speech already running. Skipping...
Riva server is ready...
Use this container terminal to run applications:
Testing TTSYou can test TTS by running the following command as per their examples:
riva_tts_client --voice_name=English-US.Female-1 --text="Hello, this is a speech synthesizer." --audio_file=/opt/riva/wav/output.wav
You can then copy this to your local Downloads folder like so:
docker cp riva-speech:/opt/riva/wav/output.wav ~/Downloads/output.wav
As I was using a remote session for testing with Riva I then copied this to my local machine (from my user root such that it ended up in ~/Downloads):
scp user@192.168.1.100:~/Downloads/output.wav ./Downloads
Testing ASRIt can be helpful to use an audio device of your own choosing when testing ASR.
First, follow this guide for getting your Jetson Orin AGX setup with Python audio: https://jetsonhacks.com/2023/08/07/speech-ai-on-nvidia-jetson-tutorial/
The following guide can be used to further test with that installed: https://github.com/dusty-nv/jetson-containers/tree/master/packages/audio/riva-client#list-audio-devices
For example:
./run.sh --workdir /opt/riva/python-clients $(./autotag riva-client:python) \
python3 scripts/list_audio_devices.py
In my case I see the output:
AUDIO DEVICES:
0: HD Pro Webcam C920: USB Audio (hw:0,0) (inputs=2 outputs=0 sampleRate=32000)
This indicates my webcam is setup and able to receive audio with a sample rate of 32000.
I'll go ahead and use that with the example test they provided to trigger the transcription of my microphone:
./run.sh --workdir /opt/riva/python-clients $(./autotag riva-client:python) python3 scripts/asr/transcribe_mic.py --input-device=0 --sample-rate-hz=32000
I can see the response as I speak in the terminal:
## i'm testing my microphone now and it's working this time
## when i had an issue previously what happened was my microphone would completely freeze and i'd be unable to even control c out of the console
Side note: Microphone issuesI ran into quite a few issues finding a compatible microphone device for inferencing. I was able to use one logitech camera I owned, and had been able to use with a Google Coral dev mini board previously, for video inferencing on the device but nothing I could do would work for audio input.
I would note that when I used an invalid microphone the ASR script would just freeze and become unusable. The working microphone just worked without any effort. One of my microphones did work for one line but then freeze immediately after the first prompt leading me to waste quite a bit of time trying to debug it thinking I was close. Luckily we had a spare old webcam that did work I could use here. In the future I'll look for a more attractive USB microphone for the setup.
MagicMirror SetupMagicMirror's setup is fairly straightforward on the Orin AGX. Install Node.js and then follow the associated guide: https://docs.magicmirror.builders/getting-started/installation.html#manual-installation
Here is the full list of commands I ran to do such on my Orin AGX:
sudo apt-get install curl
curl -sL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install nodejs -y
git clone https://github.com/MagicMirrorOrg/MagicMirror
cd MagicMirror/
npm run install-mm
cp config/config.js.sample config/config.js
npm run start
After that you'll see your linked screen taken over by the MagicMirror install. You can use this configuration file to further adjust your settings and prepare your install for normal use. The MagicMirror documentation includes further details on configuration: https://docs.magicmirror.builders/configuration/introduction.html
Jetson MagnaMirror ModuleThe last element of this project and the key component that ties them together are the modules I've created to interface with the llama using riva for voice recognition.
First fetch my repository on your device:
https://github.com/Cosmic-Bee/MMM-JetsonMagnaMirror
You'll need to place most of the repository under your modules folder in MagicMirror. Please refer to their documentation on module instalation in case of any issues but in general as long as the folder exists under modules it can be found if configured.
In the configuration file for your mirror then enable it and select the portion of your mirror you want it to show up:
{
module: "MMM-MagnaMirror",
position: 'top_right',
},
Pairing Bluetooth AudioInitially I had hoped to use my monitor's connected speakers but I found the audio to be very choppy when running the ASR logic (the basic demo to read what it's working on so nothing special on my part). To deal with this issue I had to switch gears and use a different audio approach. I checked my home for any USB audio based speakers but found none and then remembered the Orin AGX supports bluetooth so it should support audio likely.
The Jetson Orin AGX comes with bluetooth in a limited manner where it can't be used for audio related aspects. A quick modification to one configuration file, some updates, and installation of the audio logic and you'll be able to get it working like I did though. In the end this was a happy issue I had to debug as the speakers from inside my device, realizing now, would have been blocked by the frame so I would have needed to get the audio out some other way.
sudo vim /lib/systemd/system/bluetooth.service.d/nv-bluetooth-service.conf
Adjust the line:
ExecStart=/usr/lib/bluetooth/bluetoothd -d --noplugin=audio,a2dp,avrcp
To be instead:
ExecStart=/usr/lib/bluetooth/bluetoothd -d
sudo apt-get install pulseaudio-module-bluetooth
sudo reboot
After this I was able to attach my bluetooth audio device by searching for bluetooth from the Jetson UI and configuring the now available speaker. After it was configured I used the sound settings to change the default speaker for my device.
Running the text-generation-webuiAfter your initial startup you can specify which model you want the web UI to use. From inside the web UI you can also download additional models. Start it up now and go download a model from within.
Starting text generation web-ui:
./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) python3 server.py --listen --verbose --api --model-dir=/data/models/text-generation-webui --model=TheBloke_Llama-2-7b-Chat-GPTQ --loader=llamacpp --n-gpu-layers=128 --n_ctx=4096 --n_batch=4096 --threads=$(($(nproc)- 2))
This provides an API that can be used as the basis of our chat. So Riva provides speech recognition which is converted into a prompt, a pass is made over the prompt to determine if the 'wake word' was said, and if so the prompt (sans wake word) is sent to the text-generation-webui API for providing a response block in the chat. This is then updated in the mirror and if there are enough messages they are scrolled away. Once the messages are out of view they are dropped as currently I am not providing a means to scroll.
With the API up the next step is to run the MagicMirror install itself, you will need to wait a few moments for the text-generation-webui to finish loading so its API is available but once it is the MagicMirror install can be started with `npm run start` from its downloaded location.
In addition to the MagicMirror install the python script for running the Riva wake word processing and message passing to the MagicMirror install will need to be run.
That can be run via (note: it looks for a USB microphone as device 0 and has a sample rate predefined to 32000):
python3 scripts/magna-mirror.py
Assuming you have downloaded and installed the script to that location relative to your current location (I placed mine in a home directory with a subdirectory called 'scripts' hence the above).
With that running you can test ASR by saying the wake words "Other Me" or "Mirror Mirror On the Wall" both of which will then send the prompt to the local text-generation-webui API for further processing. The rest should work automatically for you with responses appearing on the screen. Additional models can be used and further adjustments can be made to the module to further improve it. It's not as fully fleshed out of course as llamaspeak but a nice start to the module for interfacing with the Jetson Orin AGX generative APIs.
Next StepsThere are several further steps that could be taken for this project. Primarily additional work could be done to support a webcam as the input image for doing all sorts of fun local based related imaging. I could imagine trying on different outfits or getting advice about a weird mark (although probably best not to do that via a machine learning model but to go visit a real doctor -- for now at least).
From a hardware perspective I would like to do something better for the microphone input. Either something fancy with a red button for triggering the voice commands (to avoid needing to inference all the time) or perhaps just something hidden as part of the frame but some addition there would be welcome.
I also need to monitor the heat situation with the frame. My wife added a bunch of buffers around the corners to offset the device such that the airflow would not be constricted but I'm not sure how this will work in practice for long time use. We do have the Jetson Orin AGX outside of the enclosure though so it keeps it light and avoids having to keep that in a potentially hot situation.
Comments