I have been attempting to track climbers in video footage for several years now.
My motivation is two-fold:
- explore algorithms on a subject that I am passionate about (sport climbing)
- analyze climbing technique between two climbers
During this exploration, I stumbled across an algorithm called "SAMURAI" that truly was a breakthrough for my specific use case.
I will not dwell on my specific use case in this project, but if this subject interests you, the following references may be of interest:
- [2021] The Long Road to the 2020 Tokyo Olympics
http://avnet.me/rock-climbing-ai-part1 - [2021] Deep Learning meets the "odd human" dataset
http://avnet.me/rock-climbing-ai-part2 - [2022] Detecting Climbers with YoloV7
http://avnet.me/rock-climbing-ai-part3 - [2023] The Mechanics of Climbing
http://avnet.me/rock-climbing-ai-part4 - [2024] Modern SAMURAI tracks Flying Monkeys !
http://avnet.me/rock-climbing-ai-part5 - [2025] Hackster Impact Spotlights - Applying AI to Climbing
http://avnet.me/rock-climbing-ai-part5
This project will provide a Getting Started Guide for SAMURAI, specifically for AMD GPUs.
SAMURAI OverviewSAMURAI builds on top of the Segment Anything Model 2.1 (SAM2.1), and improves its tracking capabilities.
The SAM and SAM2 models are revolutionary in the sense that they can segment "anything", even without prior application-specific training. These models have become very popular models for auto-labelling images and video.
In addition to this, they can also "track" the objects identified in video content.
SAMURAI improves on this "tracking", as shown in their demonstration video:
https://yangchris11.github.io/samurai/website/videos/samurai_demo.mp4
There are other solutions that offer similar tracking improvements, with some claiming better tracking, even with identical objects (ie. distractors). One of these solutions is Distractor-Aware-Memory for SAM 2.1, or DAM4SAM.
https://jovanavidenovic.github.io/dam-4-sam/static/videos/kylie.mp4
DAM4SAM claims to have better tracking than SAMURAI, specifically in the case of distractors (ie. identical objects):
https://jovanavidenovic.github.io/dam-4-sam/static/videos/monkey.mp4
I have evaluated both of these solutions to see which performed better for my specific use case : tracking the Climber of Interest (CoI). I invite you to perform similar comparisons for your specific use cases.
The following video illustrates why SAMURAI works better on my use case, compared to DAM4SAM.
The main take-aways of this experiment are the following:
- SAMURAI is 6X faster than DAM4SAM (with the AMD Radeon Pro W7900 GPU)
- DAM4SAM loses the climber (failing in its tracking task), whereas SAMURAI is successful
My setup includes an AMD Radeon Pro W7900 GPU, but these instructions are expected to work with any AMD GPU, using the AMD ROCm software stack.
For more information on how to I setup my AMD GPU, including upgrading my power supply, and adding a front chassis fan, please refer to the following project:
We should start by validating that the driver for our AMD GPU is installed, as follows:
rocm-smi -u
========================== ROCm System Management Interface ==========================
================================= % time GPU is busy =================================
GPU[0] : GPU use (%) : 1
======================================================================================
================================ End of ROCm SMI Log =================================
The latest version of ROCm, as of this writing, is version 6.2.4.
Installing SAMURAI for AMD GPUsSAMURAI provides excellent installation instructions, but only apply to NVIDIA GPUs.
The SAMURAI can be installed from the following repository:
For the purpose of this project, I will refer instead to the following repo, which has a specific version of the SAMURAI repository linked as a gitmodule:
git clone --recursive --branch samurai https://github.com/AlbertaBeef/pyClimbSegment
cd pyClimbSegment
The "samurai" sub-directory contains a specific version of the original repository, which can be installed as follows:
cd samurai
cd sam2
pip install -e .
pip install -e ".[notebooks]"
During installation, we can notice that, amongst others, the following packages have been installed supporting NVIDIA CUDA by default:
- torch>=2.3.1
- torchvision>=0.18.1
We need to, therefore, install the ROCm equivalent of these packages, as described here:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4
We can also validate the AMD GPU support by PyTorch, as follows in a python session:
python3
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.current_device()
0
>>> torch.cuda.get_device_name(0)
'AMD Radeon Pro W7900'
>>> exit()
We can also monitor the GPU usage with "nvtop":
nvtop
The following screenshot illustrates the output for nvtop. In my system, I have an AMD Radeon Pro W7900 GPU (48GB), as well as a smaller NVIDIA T400 GPU (4GB).
The demo script can be launched from the samurai directory, as follows:
albertabeef@albertabeef-HP-Z4-G4-Workstation:/media/albertabeef/Tycho/pyClimbSegment/samurai$ python3 ./scripts/demo.py --help
usage: demo.py [-h] --video_path VIDEO_PATH --txt_path TXT_PATH
[--model_path MODEL_PATH] [--video_output_path VIDEO_OUTPUT_PATH]
[--save_to_video SAVE_TO_VIDEO]
options:
-h, --help show this help message and exit
--video_path VIDEO_PATH
Input video path or directory of frames.
--txt_path TXT_PATH Path to ground truth text file.
--model_path MODEL_PATH
Path to the model checkpoint.
--video_output_path VIDEO_OUTPUT_PATH
Path to save the output video.
--save_to_video SAVE_TO_VIDEO
Save results to a video.
albertabeef@albertabeef-HP-Z4-G4-Workstation:/media/albertabeef/Tycho/pyClimbSegment/samurai$
The following optional argument specified which model to use:
- model_path : model to use (default = sam2/checkpoints/sam2.1_hiera_base_plus.pt)
Other variations of the model are : tiny, small, base_plus (default), and large.
I have only experimented with the default base_plus model.
The demo script has the following arguments specifying input:
- video_path : path to video file (mp4 only), or images directory (jpg only)
- txt_path : text file containing initial bounding box (x, y, w, h) of object to track
The demo script also has the following arguments for optionnal output:
- save_to_video : boolean indicating wether to save output video file
- video_output_path : path to output video file
I have implemented my own scripts, based on the original demo.py, which will be described in the following sections.
- samurai_step01.py
- samurai_step02.py
The SAMURAI models use <2GB of VRAM (even for the large model), so should work with any GPU.
The main loop, however, loads all the input images (from video of image files) in memory, which may require a significant amount of CPU memory.
The following screenshot illustrates how SAMURAI has allocated ~60% of my 64GB of CPU memory for a use case with 3000 images of size 1920x1080.
According to my calculations, this use case should only need ~30% of my 64GB memory, since 3000 * (1920*1080*3) = 17.38GB. Therefore, we can conclude that SAMURAI requires twice the memory of the input images for its execution.
This will limit the amount of input images you can provide to the SAMURAI algorithm.
In order to go beyond this limitation, it is possible to run several runs, taking the bounding box of the last mask of the previous iteration as the bounding box for the next iteration.
Specifying the InputThe demo script allows specifying the input as a video file or as input images.
Using the default "demo.py" script, I was not successful in getting the video file input to work, resulting in segmentation faults.
For this reason, I created my own pre-processing script to convert my input videos to output images. I also integrated the following features into my script:
- skip frames : specified with argument, allowing to reduce the number of images generated
- start frame selection : specified by user with 's' key
- end frame selection : specified by user with 'e' key
- ROI selection : specify by user with mouse
The "samurai_step01.py" script can be used to extract images from the input video. It can be used as follows:
The script will launch two windows:
- samurai_step01 - Controls
- samurai_step01
The "samurai_step01 - Controls" window allows you to adjust the size of the video content (scaleFactor), and navigate through the video (frameNum).
The first step is to navigate to a frame that will be the start frame for the extracted images, select it with the 's' key, then select the ROI for the object to track using the mouse. In my use case, this is a climber, but you can select any object.
The script will generate verbose "[INFO] ...
" to confirm your interaction with the GUI.
The next step is to navigate to the end frame, select it with the 'e' key, then press the 'w' key to extract the images.
This script will generate the following content in the specified work directory:
With the images extracted from the video, we can now run the SAMURAI algorithm, using the "samurai_step02.py" script:
This script will generate the following content in the specified work directory:
The three output videos (mp4) can be used to view the output of the SAMURAI algorithm with masks and/or bounding boxes.
Note that generating these videos reduces the frame rate of the SAMURAI. Feel free to comment out the code generating these videos if you want to speed things up a bit.
I have seen performance up to 6fps with my AMD Radeon Pro W7900 GPU.
Summary & Next StepsUsing the previous scripts, I was able to successfully perform single object tracking (SOT) in these challenging videos.
For my use case, my next steps are to find how to align these videos to perform comparison in technique for two climbers.
Here is a first glimpse at attempting to decipher patterns in the bounding boxes:
For which application would you use SAMURAI ? Please share your use case in the comments...
Version History- 2025/03/18 - Initial Version
State-of-the-Art (SOTA) References:
- SAM 2.1 : https://github.com/facebookresearch/sam2
- SAMURAI : https://github.com/yangchris11/samurai
- DAM4SAM : https://github.com/jovanavidenovic/DAM4SAM
My journey towards Applying AI to Climbing:
- [2021] The Long Road to the 2020 Tokyo Olympics
http://avnet.me/rock-climbing-ai-part1 - [2021] Deep Learning meets the "odd human" dataset
http://avnet.me/rock-climbing-ai-part2 - [2022] Detecting Climbers with YoloV7
http://avnet.me/rock-climbing-ai-part3 - [2023] The Mechanics of Climbing
http://avnet.me/rock-climbing-ai-part4 - [2024] Modern SAMURAI tracks Flying Monkeys !
http://avnet.me/rock-climbing-ai-part5 - [2025] Hackster Impact Spotlights - Fitness : Applying AI to Climbing
http://avnet.me/rock-climbing-ai-part6
Comments
Please log in or sign up to comment.