Introduction
SAMURAI Overview
Setting up the Hardware
Installing SAMURAI for AMD GPUs
Launching the Demo
Limitations on CPU Memory
Specifying the Input
Generating the Output Masks
Summary & Next Steps
Version History
References

Published March 18, 2025 © Apache-2.0

SAMURAI - Getting Started Guide for AMD GPUs

Tutorial on how to use SAMURAI instance segmentation and tracker for Single Object Tracking (SOT), with AMD GPUs.

IntermediateFull instructions provided4 hours393

Things used in this project

Hardware components

AMD Radeon Pro W7900 GPU

Story

Introduction

I have been attempting to track climbers in video footage for several years now.

My motivation is two-fold:

explore algorithms on a subject that I am passionate about (sport climbing)
analyze climbing technique between two climbers

During this exploration, I stumbled across an algorithm called "SAMURAI" that truly was a breakthrough for my specific use case.

I will not dwell on my specific use case in this project, but if this subject interests you, the following references may be of interest:

[2021] The Long Road to the 2020 Tokyo Olympics
http://avnet.me/rock-climbing-ai-part1
[2021] Deep Learning meets the "odd human" dataset
http://avnet.me/rock-climbing-ai-part2
[2022] Detecting Climbers with YoloV7
http://avnet.me/rock-climbing-ai-part3
[2023] The Mechanics of Climbing
http://avnet.me/rock-climbing-ai-part4
[2024] Modern SAMURAI tracks Flying Monkeys !
http://avnet.me/rock-climbing-ai-part5
[2025] Hackster Impact Spotlights - Applying AI to Climbing
http://avnet.me/rock-climbing-ai-part5

This project will provide a Getting Started Guide for SAMURAI, specifically for AMD GPUs.

SAMURAI Overview

SAMURAI builds on top of the Segment Anything Model 2.1 (SAM2.1), and improves its tracking capabilities.

The SAM and SAM2 models are revolutionary in the sense that they can segment "anything", even without prior application-specific training. These models have become very popular models for auto-labelling images and video.

In addition to this, they can also "track" the objects identified in video content.

SAMURAI improves on this "tracking", as shown in their demonstration video:

https://yangchris11.github.io/samurai/website/videos/samurai_demo.mp4

SAM2 versus SAMURAI demo video (📹 : https://github.com/yangchris11/samurai)

There are other solutions that offer similar tracking improvements, with some claiming better tracking, even with identical objects (ie. distractors). One of these solutions is Distractor-Aware-Memory for SAM 2.1, or DAM4SAM.

https://jovanavidenovic.github.io/dam-4-sam/static/videos/kylie.mp4

SAM2.1 versus DAM4SAM demo video (📹 : https://github.com/jovanavidenovic/DAM4SAM)

DAM4SAM claims to have better tracking than SAMURAI, specifically in the case of distractors (ie. identical objects):

https://jovanavidenovic.github.io/dam-4-sam/static/videos/monkey.mp4

SAM2.1 versus SAMURAI versus DAM4SAM demo video (📹 : https://github.com/jovanavidenovic/DAM4SAM)

I have evaluated both of these solutions to see which performed better for my specific use case : tracking the Climber of Interest (CoI). I invite you to perform similar comparisons for your specific use cases.

The following video illustrates why SAMURAI works better on my use case, compared to DAM4SAM.

SAMURAI versus DAM4SAM (📹 : AlbertaBeef)

The main take-aways of this experiment are the following:

SAMURAI is 6X faster than DAM4SAM (with the AMD Radeon Pro W7900 GPU)
DAM4SAM loses the climber (failing in its tracking task), whereas SAMURAI is successful

Setting up the Hardware

My setup includes an AMD Radeon Pro W7900 GPU, but these instructions are expected to work with any AMD GPU, using the AMD ROCm software stack.

For more information on how to I setup my AMD GPU, including upgrading my power supply, and adding a front chassis fan, please refer to the following project:

Introducing the AMD Radeon Pro W7900 GPU

We should start by validating that the driver for our AMD GPU is installed, as follows:

rocm-smi -u

========================== ROCm System Management Interface ==========================
================================= % time GPU is busy =================================
GPU[0]          : GPU use (%) : 1
======================================================================================
================================ End of ROCm SMI Log =================================

The latest version of ROCm, as of this writing, is version 6.2.4.

Installing SAMURAI for AMD GPUs

SAMURAI provides excellent installation instructions, but only apply to NVIDIA GPUs.

The SAMURAI can be installed from the following repository:

https://github.com/yangchris11/samurai

For the purpose of this project, I will refer instead to the following repo, which has a specific version of the SAMURAI repository linked as a gitmodule:

git clone --recursive --branch samurai https://github.com/AlbertaBeef/pyClimbSegment
cd pyClimbSegment

The "samurai" sub-directory contains a specific version of the original repository, which can be installed as follows:

cd samurai

cd sam2
pip install -e .
pip install -e ".[notebooks]"

During installation, we can notice that, amongst others, the following packages have been installed supporting NVIDIA CUDA by default:

torch>=2.3.1
torchvision>=0.18.1

We need to, therefore, install the ROCm equivalent of these packages, as described here:

https://pytorch.org/get-started/locally/

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4

We can also validate the AMD GPU support by PyTorch, as follows in a python session:

python3
>>> import torch

>>> torch.cuda.is_available()
True

>>> torch.cuda.device_count()
1

>>> torch.cuda.current_device()
0

>>> torch.cuda.get_device_name(0)
'AMD Radeon Pro W7900'

>>> exit()

We can also monitor the GPU usage with "nvtop":

nvtop

The following screenshot illustrates the output for nvtop. In my system, I have an AMD Radeon Pro W7900 GPU (48GB), as well as a smaller NVIDIA T400 GPU (4GB).

nvtop - AMD GPU usage 65% for SAMURAI (📷: AlbertaBeef)

Launching the Demo

The demo script can be launched from the samurai directory, as follows:

albertabeef@albertabeef-HP-Z4-G4-Workstation:/media/albertabeef/Tycho/pyClimbSegment/samurai$ python3 ./scripts/demo.py --help
usage: demo.py [-h] --video_path VIDEO_PATH --txt_path TXT_PATH 
               [--model_path MODEL_PATH] [--video_output_path VIDEO_OUTPUT_PATH]
               [--save_to_video SAVE_TO_VIDEO]

options:
  -h, --help            show this help message and exit
  --video_path VIDEO_PATH
                        Input video path or directory of frames.
  --txt_path TXT_PATH   Path to ground truth text file.
  --model_path MODEL_PATH
                        Path to the model checkpoint.
  --video_output_path VIDEO_OUTPUT_PATH
                        Path to save the output video.
  --save_to_video SAVE_TO_VIDEO
                        Save results to a video.

albertabeef@albertabeef-HP-Z4-G4-Workstation:/media/albertabeef/Tycho/pyClimbSegment/samurai$

The following optional argument specified which model to use:

model_path : model to use (default = sam2/checkpoints/sam2.1_hiera_base_plus.pt)

Other variations of the model are : tiny, small, base_plus (default), and large.

I have only experimented with the default base_plus model.

The demo script has the following arguments specifying input:

video_path : path to video file (mp4 only), or images directory (jpg only)
txt_path : text file containing initial bounding box (x, y, w, h) of object to track

The demo script also has the following arguments for optionnal output:

save_to_video : boolean indicating wether to save output video file
video_output_path : path to output video file

I have implemented my own scripts, based on the original demo.py, which will be described in the following sections.

samurai_step01.py
samurai_step02.py

Limitations on CPU Memory

The SAMURAI models use <2GB of VRAM (even for the large model), so should work with any GPU.

The main loop, however, loads all the input images (from video of image files) in memory, which may require a significant amount of CPU memory.

The following screenshot illustrates how SAMURAI has allocated ~60% of my 64GB of CPU memory for a use case with 3000 images of size 1920x1080.

System Monitor - Allocating 60% of 64GB CPU Memory (📷: AlbertaBeef)

According to my calculations, this use case should only need ~30% of my 64GB memory, since 3000 * (1920*1080*3) = 17.38GB. Therefore, we can conclude that SAMURAI requires twice the memory of the input images for its execution.

This will limit the amount of input images you can provide to the SAMURAI algorithm.

In order to go beyond this limitation, it is possible to run several runs, taking the bounding box of the last mask of the previous iteration as the bounding box for the next iteration.

Specifying the Input

The demo script allows specifying the input as a video file or as input images.

Using the default "demo.py" script, I was not successful in getting the video file input to work, resulting in segmentation faults.

For this reason, I created my own pre-processing script to convert my input videos to output images. I also integrated the following features into my script:

skip frames : specified with argument, allowing to reduce the number of images generated
start frame selection : specified by user with 's' key
end frame selection : specified by user with 'e' key
ROI selection : specify by user with mouse

The "samurai_step01.py" script can be used to extract images from the input video. It can be used as follows:

samurai_step01.py - Usage (📷: AlbertaBeef)

The script will launch two windows:

samurai_step01 - Controls
samurai_step01

The "samurai_step01 - Controls" window allows you to adjust the size of the video content (scaleFactor), and navigate through the video (frameNum).

samurai_step01 - Controls (📷: AlbertaBeef)

The first step is to navigate to a frame that will be the start frame for the extracted images, select it with the 's' key, then select the ROI for the object to track using the mouse. In my use case, this is a climber, but you can select any object.

samurai_step01.py - Select Start Frame and Climber ROI (📷: AlbertaBeef)

The script will generate verbose "[INFO] ..." to confirm your interaction with the GUI.

The next step is to navigate to the end frame, select it with the 'e' key, then press the 'w' key to extract the images.

samurai_step01.py - Select End Frame and Extract Images (📷: AlbertaBeef)

This script will generate the following content in the specified work directory:

samurai_step01.py - Output artifacts (📷: AlbertaBeef)

Generating the Output Masks

With the images extracted from the video, we can now run the SAMURAI algorithm, using the "samurai_step02.py" script:

samurai_step02.py - Usage (📷: AlbertaBeef)

This script will generate the following content in the specified work directory:

samurai_step02.py - Output artifacts (📷: AlbertaBeef)

The three output videos (mp4) can be used to view the output of the SAMURAI algorithm with masks and/or bounding boxes.

Note that generating these videos reduces the frame rate of the SAMURAI. Feel free to comment out the code generating these videos if you want to speed things up a bit.

I have seen performance up to 6fps with my AMD Radeon Pro W7900 GPU.