Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interested in building an autocast friendly version #472

Open
gchhablani opened this issue Oct 30, 2024 · 5 comments
Open

Interested in building an autocast friendly version #472

gchhablani opened this issue Oct 30, 2024 · 5 comments

Comments

@gchhablani
Copy link

gchhablani commented Oct 30, 2024

Use case:
While trying to use gsplat for inference at scale (talking 32+ GS inferences in parallel on a single device), I want to potentially use something like torch.autocast around gsplat to make things faster - to do inference in mixed-precision or Float16 - this will help reduce memory consumption and probably make things faster. Quality drop is not of significant concern but the speed is.
Currently my gaussian splats take up 1GB each so this is not a significant concern at the moment, but I expect them to increase in size as I progress.

Potential solution:
I am not sure if this already exists, or if there is support for it, I would like to potentially use fp16 for everything, or mixed-precision - this will help reduce GPU memory consumption leaving space for any other operations happening in parallel, and potentially speed things up as well.

When I try to wrap project_gaussians around torch.autocast, I get errors like Expected Float, got Half. I am using an older version v0.1.11 so not sure if this is the problem there.

I am interesting in working on this and helping build a solution - looking for some guidance, though. Open to other solutions to this problem as well. Curious to hear what the team thinks.

@gchhablani gchhablani changed the title Interested in building a autocast friendly version Interested in building an autocast friendly version Oct 30, 2024
@maturk
Copy link
Collaborator

maturk commented Oct 30, 2024

Old version does not support mixed precision.

@gchhablani
Copy link
Author

gchhablani commented Oct 30, 2024

@maturk Does the newer version support that?

Any thoughts on whether there can be other solutions to optimize on speed/memory including fp16?

@maturk
Copy link
Collaborator

maturk commented Oct 31, 2024

It is not supported currently I think.

@kkaytekin
Copy link

There is this functionality in autograd that allows you to make particular functions fp32, maybe that helps?

@antonzub99
Copy link

There is this functionality in autograd that allows you to make particular functions fp32, maybe that helps?

Can confirm, though you will have to clone the repo first, add custom_fwd(...) and custom_bwd context managers in gsplat/cuda/_wrapper.py, and then do pip install .

Other than that works like a charm, now you can train in fp16/bf16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants