GR00T is Nvidia's latest robotic foundation model for humans. Naturally, there is a vast interest in robotists to adapt such a model to all kinds of robotic applications. Fine-tuning is a process in which we teach the model new skills, for example, manipulating novel objects with new robotic hardware. However, fine-tuning foundation models has been computationally prohibitive for many of us who are GPU-poor - until now. Here, we are going to explore how we can use LoRa to realize Parameter Efficient Fine Tuning (PEFT) on a personal computer.
Background
Training a neural network typically requires a few times more memory compared to inferencing. This is because during training, in addition to the model weights, modern optimizers need to store the intermediate activations, gradient values, and other states that help the optimizer to adapt to the complex training dynamic.
Conventional fine-tuning is just as expensive as any training process. It still uses about the same amount of memory. You are out of luck if you do not have access to H100s. However, PEFT techniques like LoRA allow us to just fine-tune a small portion of the model.
LoRA
The Low-Rank Adaptor (LoRA), in my view, is the most elegant realization of PEFT. Instead of training the model in its entirety, it overlays a very small set of weights to the model's original weights called LoRA adaptors. The model's original weights are frozen, only the LoRA adaptors are trainable, and they usually range from 0.5% to 5% in size compared to the original network. With a much smaller set of trainable parameters and fewer optimizer states to store, it makes fine-tuning with PEFT a lot more computationally friendly. Once trained, these small sets of weights are added back to the original weights.
output = input * W
W : original weight matrix; [input_dim, output_dim]
ΔW : delta weight; same dimension as W
W_ft : fine-tuned weight matrix = W + ΔW
Since the fine-tuning only makes small changes to the original model weights, the delta weight, ΔW, should only hold a fraction of the information compared to the original weight matrix. Thus, instead of having ΔW as the same size as the original weight matrix, we can represent it as a product of two skinny matrices:
ΔW_lora = B x A ~= ΔW
B: [input_dim, r]
A: [r, output_dim]
ΔW_lora is an approximation to ΔW, and r represents the number of rank we would like to use to achieve that. Think of r almost like the bit-rate you would like to use for your music files, except, here, we are dealing with matrix reconstruction error instead of audio loss.
The maximum rank of ΔW is max(input_dim, output_dim). However, because ΔW does not contain rich information, we can use a very small r, e.g. 16 to 128, while still achieve good result. This is also the root behind LoRA's name, Low-Rank Adaption.
ΔW_lora is iteratively applied to all transformer weight matrices in the model. Because we are only training ΔW_lora, it only requires a fraction of the memory compare to the conventional fine-tuning. This grants us the freedom to trade between fine-tuning performance and computational budget.
If you are curious about the implementation details, please check out the PR which has been upstreamed to Nvidia Isaac-GR00T's official repo.
Setting up The Environment
Let's start by setting up Pyenv and miniconda
brew install pyenv
Setup your shell according to the instructions here. For example, if you were using zsh:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init - zsh)"' >> ~/.zshrc
Reload your shell and test Pyenv's installation
pyenv install --list
Install Miniconda
pyenv install miniconda3-3.10-25.1.1-2
conda -n gr00t python=3.10
conda activate gr00t
Getting the model
git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
pip install --upgrade setuptools
pip install -e .
pip install --no-build-isolation flash-attn==2.7.1.post4
Fine-Tuning the Model
To run the fine-tune example:
python scripts/gr00t_finetune.py --dataset-path ./demo_data/robot_sim.PickNPlace \
--num-gpus 1 \
--lora_rank 16 \
--batch-size 16
What's next?
More readings:
Special thanks to Ed, Albert, and members of the team for building out the foundation of this work in such a short span at the hackathon. Without it, we wouldn’t have had the opportunity to explore PEFT on Gr00t.
If you like this sort of work, feel free to connect on X and let me know if you guys would like a more in-depth dive into LoRA.
Comments
Please log in or sign up to comment.