Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory allocations improvements #53

Closed
giordano opened this issue May 18, 2020 · 2 comments · Fixed by #57
Closed

Possible memory allocations improvements #53

giordano opened this issue May 18, 2020 · 2 comments · Fixed by #57

Comments

@giordano
Copy link
Member

giordano commented May 18, 2020

Now that the timer tooling is directly in the main code, it's a bit easier to play with it.

By looking at the output of this run:

julia> tdac(TDAC.tdac_params(;nprt = 1, n_time_step = 1, n_integration_step = 251, time_step = 251.0, nobs = 4, enable_timers = true));
────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             469ms / 100%             667MiB / 100%     

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 Particle State Update        1    185ms  39.5%   185ms    306MiB  46.0%   306MiB
 True State Update            1    168ms  35.9%   168ms    306MiB  46.0%   306MiB
 Initialization               1   92.8ms  19.8%  92.8ms   43.6MiB  6.54%  43.6MiB
 Process Noise                1   21.6ms  4.62%  21.6ms   8.55MiB  1.28%  8.55MiB
 Particle Variance            1    694μs  0.15%   694μs   1.83MiB  0.27%  1.83MiB
 Particle Mean                1    225μs  0.05%   225μs     0.00B  0.00%    0.00B
 Resample                     1    122μs  0.03%   122μs     96.0B  0.00%    96.0B
 State Copy                   1   81.3μs  0.02%  81.3μs     32.0B  0.00%    32.0B
 Weights                      1   14.5μs  0.00%  14.5μs   1.03KiB  0.00%  1.03KiB
 Observations                 2   11.2μs  0.00%  5.62μs      112B  0.00%    56.0B
 Observation Noise            1    828ns  0.00%   828ns     48.0B  0.00%    48.0B
 ────────────────────────────────────────────────────────────────────────────────

We can see that most of memory allocations come from the updates of the states, which call the function tsunami_update!. We can save some memory allocations by having state to be a matrix of the right shape, and just slice the needed parts, to get eta, mm, nn, etc...

An additional improvement for memory allocations is to pass to LLW2d.timestep! the preallocated buffers for dxeta, dyeta, dxM, and dyN. This shouldn't give much speedup, but it would reduce sensibly memory allocations, and hence the pressure on the garbage collector.

@giordano
Copy link
Member Author

giordano commented May 19, 2020

I had a quick go to preallocate the buffers for LL2wd.timestep!:

julia> tdac(TDAC.tdac_params(;nprt = 1, n_time_step = 1, n_integration_step = 251, time_step = 251.0, nobs = 4, enable_timers = true));
 ────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             449ms / 100%            54.7MiB / 100%     

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 Particle State Update        1    174ms  38.9%   174ms   9.52KiB  0.02%  9.52KiB
 True State Update            1    154ms  34.4%   154ms   8.69KiB  0.02%  8.69KiB
 Initialization               1   99.1ms  22.1%  99.1ms   44.2MiB  81.0%  44.2MiB
 Process Noise                1   19.4ms  4.33%  19.4ms   8.55MiB  15.6%  8.55MiB
 Particle Variance            1    607μs  0.14%   607μs   1.83MiB  3.35%  1.83MiB
 Resample                     1    271μs  0.06%   271μs     96.0B  0.00%    96.0B
 Particle Mean                1    213μs  0.05%   213μs     0.00B  0.00%    0.00B
 State Copy                   1   61.9μs  0.01%  61.9μs     32.0B  0.00%    32.0B
 Observations                 2   17.7μs  0.00%  8.84μs      224B  0.00%     112B
 Weights                      1   14.9μs  0.00%  14.9μs   1.03KiB  0.00%  1.03KiB
 Observation Noise            1    884ns  0.00%   884ns     48.0B  0.00%    48.0B
 ────────────────────────────────────────────────────────────────────────────────

Total memory used is cut down by more than 90%.

@giordano
Copy link
Member Author

giordano commented May 19, 2020

It turned out that when we have many particles the contribution of "Process Noise" becomes very large, see tables in #61. This has been addressed by #60 and #63. With these PRs the memory allocations is vastly dominated by the initialisation steps, as we'd expect it to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant