•

•

•

•

•

Published April 7, 2025 © Apache-2.0

Fine-Tuning Nvidia GR00T using only 0.5% of the parameters

A brief introduction to PEFT and how it helps us to fine-tune the Gr00t foundation on consumer GPUs.

IntermediateProtip1 hour1,112

Fine-Tuning Nvidia GR00T using only 0.5% of the parameters

Story

GR00T is Nvidia's latest robotic foundation model for humans. Naturally, there is a vast interest in robotists to adapt such a model to all kinds of robotic applications. Fine-tuning is a process in which we teach the model new skills, for example, manipulating novel objects with new robotic hardware. However, fine-tuning foundation models has been computationally prohibitive for many of us who are GPU-poor - until now. Here, we are going to explore how we can use LoRa to realize Parameter Efficient Fine Tuning (PEFT) on a personal computer.

Background

Training a neural network typically requires a few times more memory compared to inferencing. This is because during training, in addition to the model weights, modern optimizers need to store the intermediate activations, gradient values, and other states that help the optimizer to adapt to the complex training dynamic.

A training memory profiling snapshot in PyTorch. The parameters, colored in green, do not occupy the majority of the GPU memory used. Instead, the activations, gradients, and other optimizer states do.

Conventional fine-tuning is just as expensive as any training process. It still uses about the same amount of memory. You are out of luck if you do not have access to H100s. However, PEFT techniques like LoRA allow us to just fine-tune a small portion of the model.

LoRA

The Low-Rank Adaptor (LoRA), in my view, is the most elegant realization of PEFT. Instead of training the model in its entirety, it overlays a very small set of weights to the model's original weights called LoRA adaptors. The model's original weights are frozen, only the LoRA adaptors are trainable, and they usually range from 0.5% to 5% in size compared to the original network. With a much smaller set of trainable parameters and fewer optimizer states to store, it makes fine-tuning with PEFT a lot more computationally friendly. Once trained, these small sets of weights are added back to the original weights.

output = input * W

W : original weight matrix; [input_dim, output_dim]

ΔW : delta weight; same dimension as W

W_ft : fine-tuned weight matrix = W + ΔW

Since the fine-tuning only makes small changes to the original model weights, the delta weight, ΔW, should only hold a fraction of the information compared to the original weight matrix. Thus, instead of having ΔW as the same size as the original weight matrix, we can represent it as a product of two skinny matrices:

ΔW_lora = B x A ~= ΔW

B: [input_dim, r]

A: [r, output_dim]

ΔW_lora is an approximation to ΔW, and r represents the number of rank we would like to use to achieve that. Think of r almost like the bit-rate you would like to use for your music files, except, here, we are dealing with matrix reconstruction error instead of audio loss.

The maximum rank of ΔW is max(input_dim, output_dim). However, because ΔW does not contain rich information, we can use a very small r, e.g. 16 to 128, while still achieve good result. This is also the root behind LoRA's name, Low-Rank Adaption.

From Manyi's blog: More about LoraConfig from PEFT

ΔW_lora is iteratively applied to all transformer weight matrices in the model. Because we are only training ΔW_lora, it only requires a fraction of the memory compare to the conventional fine-tuning. This grants us the freedom to trade between fine-tuning performance and computational budget.

If you are curious about the implementation details, please check out the PR which has been upstreamed to Nvidia Isaac-GR00T's official repo.

Setting up The Environment

Let's start by setting up Pyenv and miniconda

brew install pyenv

Setup your shell according to the instructions here. For example, if you were using zsh:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init - zsh)"' >> ~/.zshrc

Reload your shell and test Pyenv's installation

pyenv install --list

Install Miniconda

pyenv install miniconda3-3.10-25.1.1-2
conda -n gr00t python=3.10
conda activate gr00t

Getting the model

git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
pip install --upgrade setuptools
pip install -e .
pip install --no-build-isolation flash-attn==2.7.1.post4

Fine-Tuning the Model

To run the fine-tune example:

python scripts/gr00t_finetune.py  --dataset-path ./demo_data/robot_sim.PickNPlace \
--num-gpus 1 \
--lora_rank 16  \
--batch-size 16

What's next?

Thanks to Manyi and Aaron Shi, Zachary DeVito.

Comments

Please log in or sign up to comment.

Embed the widget on your own site

Fine-Tuning Nvidia GR00T using only 0.5% of the parameters

Fine-Tuning Nvidia GR00T using only 0.5% of the parameters

Story

Credits

Neil Tan

Ed W

Albert Yang

Qi Tang

RZA

Andrei Ciobanu

Comments

Related channels and tags