Skip to content

Upgraded repo includes more capabilities, converted the cmd .py scripts to function more intuitively, added 147 different depth output colour map methods, introduced batch image as well as video processing, everything is automatically saved to an outputs folder (w/ file-naming conventions) & I've converted the .pth models to .safetensors.

License

Notifications You must be signed in to change notification settings

MackinationsAi/Upgraded-Depth-Anything-V2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Upgraded Depth Anything V2 - UDAV2

This work presents Depth Anything V2. It significantly outperforms V1 in fine-grained details & robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, higher depth accuracy, & a robust upgraded Gradio WebUI as well as both image & video .bat scripts for more intuitive CLI usage (if that is your more preferred method of use).

UDAV2 Outputs

DepthV2_Outputs

Gradio Example

Single_Image_Processing

News

  • 2024-06-14: Paper, project page, code, models, demo, & benchmark are all released.
  • 2024-06-20: The repo has been upgraded & is also now running on .safetensors models instead of .pth models.
  • 2024-06-23: Updated installation process to be a simpler one_click_install.bat file. It automatically downloads the depth models into a 'checkpoints' folder, the triton wheel into the repo's main folder & installs all of the dependencies needed. [Also updated this README.md file to provide more clarity!]
  • 2024-06-24: pravdomil has provided a much need update to UDAV2 for 16bit image creation in order to make stunning 3D Bas-Reliefs! I am currently in the process of updating the gradio webui to include both 16bit single image & 16bit batch image creation which will be pushed in the coming days.
  • 2024-06-25: I'm currently working on a beta version of UDAV2 as an automatic1111 extension & will be released next week, so stay-tuned!
  • 2024-06-27: A1111 extension released! sd-webui-udav2
  • 2024-06-29: Updated Forge extension release sd-forge-udav2, to prevent conflicts w/ pre-existing installed extensions in Forge!
  • 2024-07-01: sd-webui-udav2 has now been added to the extension index.json! You can now install the extension directly inside A1111.
  • 2024-07-03: [v1.1.452] sd-webui-controlnet now has a depth_anything_v2 preprocessor🔥! Update transformers dependency to transformers-4.44.1 to use the new depth_anything_v2 controlnet preprocessor.

Windows Installation

All you need to do is copy & paste (or right-click), each of the following lines in-order into cmd & everything will be installed properly.

git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
one_click_install.bat

That's it! All you have to do now is pick one of the run_-------.bat files, double-click & you're off to depthing!

MacOS & Linux Installation

Run the following commands in your terminal.

git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
source one_click_install.sh

or

git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
pip install requirements_macos.txt

Then manually download & place all 3 of the Depth Anything V2 models [download links found below] into a folder call checkpoints & you'll be good to go.

Usage

Gradio WebUi

To use the upgraded gradio webui locally:

For Windows

run_gradio.bat

You can also try the online gradio demo, though it is FAR less capable than this Upgraded Depth Anything V2 repo.

For MacOS & Linux

python run_gradio.py

Running run_image-depth_16bit.py CLI script to make 16bit images for creating 3D Bas-Reliefs!

It works for both single image depth processing & batch image depth processing.

run_image-depth_16bit.bat

3D Bas-Relief from 16bit Image Depth Maps Examples

The images used to make the following depth maps were created using Dreamshaper Turbo.* making-3d-bas-reliefs-with-depth-anything-v2-16-bit-for-v0-dw0wzydrie8d1

making-3d-bas-reliefs-with-depth-anything-v2-16-bit-for-v0-o25m0zdrie8d1

making-3d-bas-reliefs-with-depth-anything-v2-16-bit-for-v0-dsnahzdrie8d1

Running run_image-depth_8bit.py CLI script

It works for both single image depth processing & batch image depth processing.

run_image-depth_8bit.bat

or

python run_image-depth.py --encoder <vits | vitb | vitl> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale]

Options:

  • --img-path: You can either 1.) point it to an image directory storing all interested images, 2.) point it to a single image, or 3.) point it to a text file storing all image paths.
  • --input-size (optional): By default, we use input size 518 for model inference. You can increase the size for even more fine-grained results.
  • --pred-only (optional): Only save the predicted depth map, without raw image.
  • --grayscale (optional): Save the grayscale depth map, without applying color palette.

For example:

python run_image-depth.py --encoder vitl --img-path assets/examples --outdir depth_vis

Running run_video-depth.py CLI script

It works for both single video depth processing & batch video depth processing.

run_video-depth.bat

or

python run_video-depth.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis

Pre-trained Models [.safetensors]

We provide three models of varying scales for robust relative depth estimation (the fourth model is still a WIP):

All three models are automatically downloaded to a 'checkpoints' folder in your repo when you run the one_click_install.bat. (I only provided the download link here incase you want to download them elsewhere for use outside this repo)

Models Params Checkpoints
Depth-Anything-V2-Small model 48.4M Download
Depth-Anything-V2-Base model 190.4M Download
Depth-Anything-V2-Large model 654.9M Download
Depth-Anything-V2-Giant model 1.3B Coming soon

Please note that the larger (vitl) model has better temporal consistency on videos.

Triton Dependency Wheel

This dependency .whl is automatically downloaded to the main/tree repo-folder when you run the one_click_install.bat. (I only provided the download link here incase you want to download it elsewhere for use outside this repo.)

Dependency Params Wheel
Triton==2.1.0 306.7M Download

(Once it has been installed & the gradio webui is running properly, you can delete it or use it elsewhere in a similar fashion.)

Notes:

  • Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this issue). In V1, we unintentionally used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
  • I will be updating the training scripts to support .safetensors output pre-trained models in the coming weeks so stay-tuned for more UDAV2 depthing updates!

Original DAV2 Github Repo Creds

Lihe Yang1 · Bingyi Kang2† · Zilong Huang2 · Zhen Zhao · Xiaogang Xu · Jiashi Feng2 · Hengshuang Zhao1*

Legend Keys - [ HKU 1 · TikTok 2 · project-lead † · corresponding author * ]

Paper PDF Project Page Benchmark

teaser

Fine-tuned to Metric Depth Estimation & DA-2K Evaluation Benchmark

Please refer to metric depth estimation &/or to DA-2K benchmark.

LICENSE

Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.

Citation

If you find this project useful, please consider citing below, give this upgraded repo a star & share it w/ others in the community!

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Zhao, Zhen & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}

About

Upgraded repo includes more capabilities, converted the cmd .py scripts to function more intuitively, added 147 different depth output colour map methods, introduced batch image as well as video processing, everything is automatically saved to an outputs folder (w/ file-naming conventions) & I've converted the .pth models to .safetensors.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •