Interested in building an autocast friendly version #472

gchhablani · 2024-10-30T21:51:05Z

Use case:
While trying to use gsplat for inference at scale (talking 32+ GS inferences in parallel on a single device), I want to potentially use something like torch.autocast around gsplat to make things faster - to do inference in mixed-precision or Float16 - this will help reduce memory consumption and probably make things faster. Quality drop is not of significant concern but the speed is.
Currently my gaussian splats take up 1GB each so this is not a significant concern at the moment, but I expect them to increase in size as I progress.

Potential solution:
I am not sure if this already exists, or if there is support for it, I would like to potentially use fp16 for everything, or mixed-precision - this will help reduce GPU memory consumption leaving space for any other operations happening in parallel, and potentially speed things up as well.

When I try to wrap project_gaussians around torch.autocast, I get errors like Expected Float, got Half. I am using an older version v0.1.11 so not sure if this is the problem there.

I am interesting in working on this and helping build a solution - looking for some guidance, though. Open to other solutions to this problem as well. Curious to hear what the team thinks.

The text was updated successfully, but these errors were encountered:

maturk · 2024-10-30T21:59:43Z

Old version does not support mixed precision.

gchhablani · 2024-10-30T23:11:46Z

@maturk Does the newer version support that?

Any thoughts on whether there can be other solutions to optimize on speed/memory including fp16?

maturk · 2024-10-31T13:02:10Z

It is not supported currently I think.

kkaytekin · 2024-11-01T16:18:51Z

There is this functionality in autograd that allows you to make particular functions fp32, maybe that helps?

antonzub99 · 2024-11-04T09:52:24Z

There is this functionality in autograd that allows you to make particular functions fp32, maybe that helps?

Can confirm, though you will have to clone the repo first, add custom_fwd(...) and custom_bwd context managers in gsplat/cuda/_wrapper.py, and then do pip install .

Other than that works like a charm, now you can train in fp16/bf16

gchhablani changed the title ~~Interested in building a autocast friendly version~~ Interested in building an autocast friendly version Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interested in building an autocast friendly version #472

Interested in building an autocast friendly version #472

gchhablani commented Oct 30, 2024 •

edited

Loading

maturk commented Oct 30, 2024

gchhablani commented Oct 30, 2024 •

edited

Loading

maturk commented Oct 31, 2024

kkaytekin commented Nov 1, 2024

antonzub99 commented Nov 4, 2024

Interested in building an autocast friendly version #472

Interested in building an autocast friendly version #472

Comments

gchhablani commented Oct 30, 2024 • edited Loading

maturk commented Oct 30, 2024

gchhablani commented Oct 30, 2024 • edited Loading

maturk commented Oct 31, 2024

kkaytekin commented Nov 1, 2024

antonzub99 commented Nov 4, 2024

gchhablani commented Oct 30, 2024 •

edited

Loading

gchhablani commented Oct 30, 2024 •

edited

Loading