Are there non-obvious best practices for saving memory and operations #1482
-
I've been running some simulations GPU that are very limited by the GPU's memory. Although it seems like we're getting close to be able to run multi-GPU simulations (thanks, @ali-ramadhan!), it's good practice I think to try and save memory in any case, and also save on operations (id est, to run as few operations as possible). I know there are obvious to do that (like running smaller simulations, limiting the amount of tracers, etc), but what are the non-obvious ways? For example, I noticed that in computed fields you can specify the Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
I think I think you touched on the obvious ways! I guess another way is to find a bigger GPU. Some of the higher-end NVIDIA GPUs have 32 GB of memory, but not sure there are any common ones with more. A riskier way is to use I assume you're already using
A dirty trick Oceananigans.jl used to do was to use A cleaner solution is to do what @glwagner did for the LESbrary.jl: use one scratch field per location, e.g. https://github.com/CliMA/LESbrary.jl/blob/cf31b0ec20219d5ad698af334811d448c27213b0/examples/three_layer_constant_fluxes.jl#L380-L385 |
Beta Was this translation helpful? Give feedback.
-
I think apart from OP @tomchor's recommendation, @ali-ramadhan mentioned the most important memory saving technique for simulations with lots of diagnostics (using a single scratch space for I'll just say that there are two other possibly important techniques: 1) eliminating the hydrostatic pressure as an auxiliary variable as discussed on #1443 (which reduces Oceananigans memory footprint by one field), and 2) figuring out how to use We could also implement a |
Beta Was this translation helpful? Give feedback.
I think
IncompressibleModel{MultiGPU}
might still be a little far off since we need distributed FFT support for GPUs from PencilFFTs.jl. Could happen soon but no ETA.ShallowWaterModel{MultiGPU}
does work thanks to @francispoulin but might need some profiling to find bottlenecks.I think you touched on the obvious ways!
I guess another way is to find a bigger GPU. Some of the higher-end NVIDIA GPUs have 32 GB of memory, but not sure there are any common ones with more.
A riskier way is to use
Float32
to half your memory footprint, but then you might end up having to manage truncation errors as discussed in #1410.I assume you're already using
advection = WENO5()
but you could use a higher…