This work presents Depth Anything V2. It significantly outperforms V1 in fine-grained details & robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, higher depth accuracy, & a robust upgraded Gradio WebUI as well as both image & video .bat scripts for more intuitive CLI usage (if that is your more preferred method of use).
- 2024-06-14: Paper, project page, code, models, demo, & benchmark are all released.
- 2024-06-20: The repo has been upgraded & is also now running on .safetensors models instead of .pth models.
- 2024-06-23: Updated installation process to be a simpler one_click_install.bat file. It automatically downloads the depth models into a 'checkpoints' folder, the triton wheel into the repo's main folder & installs all of the dependencies needed. [Also updated this README.md file to provide more clarity!]
- 2024-06-24: pravdomil has provided a much need update to UDAV2 for 16bit image creation in order to make stunning 3D Bas-Reliefs! I am currently in the process of updating the gradio webui to include both 16bit single image & 16bit batch image creation which will be pushed in the coming days.
- 2024-06-25: I'm currently working on a beta version of UDAV2 as an automatic1111 extension & will be released next week, so stay-tuned!
- 2024-06-27: A1111 extension released! sd-webui-udav2
- 2024-06-29: Updated Forge extension release sd-forge-udav2, to prevent conflicts w/ pre-existing installed extensions in Forge!
- 2024-07-01: sd-webui-udav2 has now been added to the extension index.json! You can now install the extension directly inside A1111.
- 2024-07-03: [v1.1.452] sd-webui-controlnet now has a depth_anything_v2 preprocessor🔥! Update transformers dependency to transformers-4.44.1 to use the new depth_anything_v2 controlnet preprocessor.
All you need to do is copy & paste (or right-click), each of the following lines in-order into cmd & everything will be installed properly.
git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
one_click_install.bat
That's it! All you have to do now is pick one of the run_-------.bat files, double-click & you're off to depthing!
Run the following commands in your terminal.
git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
source one_click_install.sh
or
git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
pip install requirements_macos.txt
Then manually download & place all 3 of the Depth Anything V2 models [download links found below] into a folder call checkpoints
& you'll be good to go.
To use the upgraded gradio webui locally:
run_gradio.bat
You can also try the online gradio demo, though it is FAR less capable than this Upgraded Depth Anything V2 repo.
python run_gradio.py
It works for both single image depth processing & batch image depth processing.
run_image-depth_16bit.bat
The images used to make the following depth maps were created using Dreamshaper Turbo.*
It works for both single image depth processing & batch image depth processing.
run_image-depth_8bit.bat
or
python run_image-depth.py --encoder <vits | vitb | vitl> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale]
Options:
--img-path
: You can either 1.) point it to an image directory storing all interested images, 2.) point it to a single image, or 3.) point it to a text file storing all image paths.--input-size
(optional): By default, we use input size518
for model inference. You can increase the size for even more fine-grained results.--pred-only
(optional): Only save the predicted depth map, without raw image.--grayscale
(optional): Save the grayscale depth map, without applying color palette.
For example:
python run_image-depth.py --encoder vitl --img-path assets/examples --outdir depth_vis
It works for both single video depth processing & batch video depth processing.
run_video-depth.bat
or
python run_video-depth.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis
We provide three models of varying scales for robust relative depth estimation (the fourth model is still a WIP):
All three models are automatically downloaded to a 'checkpoints' folder in your repo when you run the one_click_install.bat. (I only provided the download link here incase you want to download them elsewhere for use outside this repo)
Models | Params | Checkpoints |
---|---|---|
Depth-Anything-V2-Small model | 48.4M | Download |
Depth-Anything-V2-Base model | 190.4M | Download |
Depth-Anything-V2-Large model | 654.9M | Download |
Depth-Anything-V2-Giant model | 1.3B | Coming soon |
Please note that the larger (vitl) model has better temporal consistency on videos.
This dependency .whl is automatically downloaded to the main/tree repo-folder when you run the one_click_install.bat. (I only provided the download link here incase you want to download it elsewhere for use outside this repo.)
Dependency | Params | Wheel |
---|---|---|
Triton==2.1.0 | 306.7M | Download |
(Once it has been installed & the gradio webui is running properly, you can delete it or use it elsewhere in a similar fashion.)
- Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this issue). In V1, we unintentionally used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
- I will be updating the training scripts to support .safetensors output pre-trained models in the coming weeks so stay-tuned for more UDAV2 depthing updates!
Lihe Yang1 · Bingyi Kang2†· Zilong Huang2 · Zhen Zhao · Xiaogang Xu · Jiashi Feng2 · Hengshuang Zhao1*
Legend Keys - [ HKU 1 · TikTok 2 · project-lead †· corresponding author * ]
Please refer to metric depth estimation &/or to DA-2K benchmark.
Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.
If you find this project useful, please consider citing below, give this upgraded repo a star & share it w/ others in the community!
@article{depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Zhao, Zhen & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
journal={arXiv:2406.09414},
year={2024}
}
@inproceedings{depth_anything_v1,
title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
booktitle={CVPR},
year={2024}
}