Depth Anything presents a practical solution for robust monocular depth estimation by leveraging large-scale unlabeled data. It scales up the dataset using a data engine that collects and annotates 62 million diverse images, enhancing data coverage and reducing generalization error. The approach involves challenging the model with data augmentation and auxiliary supervision to improve its robustness and zero-shot capabilities across various scenarios. Fine-tuning with metric depth information from datasets like NYUv2 and KITTI results in new state-of-the-art performances. This technology has wide-ranging applications in image processing, augmented reality (AR), virtual reality (VR), and robotic navigation.
This tutorial shares the process our team followed to deploy the Depth Anything monocular depth estimation algorithm on the edge device reComputer J4012, based on Jetson Orin NX. Our goals are to:
1. Deploy the Depth Anything monocular depth estimation algorithm on edge devices with a single click,
2. Allow selection of models and different input sizes through a visual interactive interface,
3. Enable one-click conversion of models to TensorRT,
4. Support real-time detection of various content such as videos, images, and Webcam feeds.
These steps aim to improve the deployment and development efficiency for both users and developers.
Hardware SetupreComputer J4012 is the hand-size edge AI box built with NVIDIA Jetson Orin™ NX 16GB module which delivers up to 100 TOPS AI performance and has a rich set of IOs including USB 3.2 ports(4x), HDMI 2.1, M.2 key E for WIFI, M.2 Key M for SSD, RTC, CAN, GPIO 40-pin and more.
Features
●Brilliant AI Performance for production: on-device processing with up to 100 TOPS AI performance with low power and low latency
●Expandable with rich I/Os: 4x USB 3.2, HDMI 2.1, 2xCSI, 1xRJ45 for GbE, M.2 Key E, M.2 Key M, CAN, and GPIO.
●Hand-size edge AI device: compact size at 130mm x120mm x 58.5mm.Support desktop, wall mount, fit in anywhere
●Accelerate solution to market: pre-installed Jetpack with NVIDIA JetPack™ 5.1.1 on the included 128GB NVMe SSD, Linux OS BSP, support Jetson software architecture and leading software frameworks
●Comprehensive certificates: FCC, CE, RoHS, UKCA
Install the Operating SystemTo use the reComputer J4012, you need to install an operating system. This tutorial utilizes NVIDIA L4T 35.3.1 and Jetpack 5.1.1, but the project also supports Jetpack 5.1.2 and Jetpack 6.0. You can follow the system flash tutorial provided in our Wiki for system installation. After installing the initial system, enter the following command in the terminal:
sudo apt-get update
sudo apt-get install python3-pip # install pip
sudo pip install jetson-stats # install jtop
sudo apt-get install nvidia-jetpack # install jetpack
sudo pip3 install jetson-examples # install the jetson-examples package
Install Docker-CE
The one-click deployment of Depth Anything for this project is achieved through Docker containers. We provide a Docker image that has all the necessary environments configured for Depth Anything, reducing the deployment cost for developers. Here is the Docker installation tutorial:
- Install some necessary packages that allow apt to use repositories over HTTPS:
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
- Add Docker's official GPG key:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
- Use the following command to set up the Docker repository:
sudo add-apt-repository \
"deb [arch=arm64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
- Install Docker Engine:
sudo apt-get update
sudo apt-get install docker-ce
- To ensure Docker starts automatically at boot, make sure it is enabled:
sudo systemctl enable docker
sudo systemctl start docker
- Add the user to the docker group:
sudo usermod -aG docker $USER
sudo reboot # restart the system
Runing Depth Anything with One-clickPlease ensure that you have installed nvidia-jetpack and jetson-examples as part of the system initialization following the "Install the Operating System" instructions, and that Docker-CE has been successfully installed.Let's start the one-click experience with Depth Anything (initial run may take some time for setup):
reComputer run depth-anything
When the project runs successfully, the terminal will display the local network login URL for the device.
Log in to this URL in your browser to successfully open the Depth Anything WebUI interactive interface.
🗝️WebUI Features
● Choose model: Select from depth_anything_vits14 models. (S, B, L)
● Choose input size: Select the desired input size.(308, 384, 406, 518)
● Grayscale option: Option to use grayscale.
● Choose source: Select the input source (Video, Image, Camera).
● Export Model: Automatically download and convert the model from PyTorch (.pth) to TensorRT format.
● Start Estimation: Begin depth estimation using the selected model and input source.
● Stop Estimation: Stop the ongoing depth estimation process.
TensorRT model exportAfter successfully launching the WebUI, you need to perform TensorRT model export before experiencing Depth Anything on the edge device. We provide a one-click export button for the model. Next, we will demonstrate how to export the TensorRT model.
①:Select the model size. You have three options to choose from: Depth_anything_vit{s, b, l}. These models are from the Depth Anything Model Zoo.
②:Select the input size. We offer four options: 308, 384, 406, and 518.
③:Click on the model export button. Depending on the network environment and the hardware of the edge device, the model conversion may take some time.
When the model is successfully exported, the WebUI will display "Model export successful!". The converted model will be saved in the `/weights` directory under the root of the code. You can now begin to experience Depth Anything for real-time monocular depth estimation.
Once the model is successfully exported, you can start real-time inference for monocular depth estimation:
①:Select the input type: Camera, Image, Video. When the input type is Camera, ensure that the USB Camera is properly connected before starting Depth Anything.
②:Click "Start Estimation" to begin the inference. The first run may take approximately 30 seconds for the model to load.
The visualization of the monocular depth estimation will be displayed in real-time on the right side, with the current FPS value and model file location shown at the bottom.
Monocular depth estimation technology has various applications in security surveillance, including crowd counting and management, intrusion detection, behavior analysis, object recognition and classification, virtual fencing, and intelligent video analysis. By providing more accurate depth information and multi-dimensional data analysis, this technology significantly enhances the intelligence and efficiency of surveillance systems, offering more comprehensive protection for public safety.
Monocular depth estimation technology has various applications in traffic monitoring, including vehicle detection and counting, speed measurement, distance detection, traffic accident analysis, pedestrian detection and protection, intelligent traffic light control, and illegal parking detection. By providing accurate depth information and real-time analysis, this technology significantly enhances the intelligence and efficiency of traffic monitoring systems, offering strong support for traffic safety and urban traffic management.
Monocular depth estimation technology has various applications in underwater scenarios, including underwater robot navigation, marine biology research, underwater terrain mapping, shipwreck and relic detection, environmental monitoring, and underwater rescue and search operations. By providing accurate depth information and 3D data, this technology significantly enhances the precision and efficiency of underwater operations and research, offering strong technical support for ocean exploration, environmental protection, and underwater safety.
Monocular depth estimation technology has widespread applications in autonomous driving, including obstacle detection and avoidance, vehicle distance maintenance, pedestrian and animal detection, traffic signal recognition, road and lane recognition, parking assistance, and driving in adverse weather conditions. By providing accurate depth information, this technology significantly enhances the perception capabilities and decision-making accuracy of autonomous driving systems, offering strong technical support for achieving safer and smarter autonomous driving.
Seeed Studio Team
Depth Anything
Comments
Please log in or sign up to comment.