Possible memory allocations improvements #53

giordano · 2020-05-18T15:23:40Z

Now that the timer tooling is directly in the main code, it's a bit easier to play with it.

By looking at the output of this run:

julia> tdac(TDAC.tdac_params(;nprt = 1, n_time_step = 1, n_integration_step = 251, time_step = 251.0, nobs = 4, enable_timers = true));
────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             469ms / 100%             667MiB / 100%     

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 Particle State Update        1    185ms  39.5%   185ms    306MiB  46.0%   306MiB
 True State Update            1    168ms  35.9%   168ms    306MiB  46.0%   306MiB
 Initialization               1   92.8ms  19.8%  92.8ms   43.6MiB  6.54%  43.6MiB
 Process Noise                1   21.6ms  4.62%  21.6ms   8.55MiB  1.28%  8.55MiB
 Particle Variance            1    694μs  0.15%   694μs   1.83MiB  0.27%  1.83MiB
 Particle Mean                1    225μs  0.05%   225μs     0.00B  0.00%    0.00B
 Resample                     1    122μs  0.03%   122μs     96.0B  0.00%    96.0B
 State Copy                   1   81.3μs  0.02%  81.3μs     32.0B  0.00%    32.0B
 Weights                      1   14.5μs  0.00%  14.5μs   1.03KiB  0.00%  1.03KiB
 Observations                 2   11.2μs  0.00%  5.62μs      112B  0.00%    56.0B
 Observation Noise            1    828ns  0.00%   828ns     48.0B  0.00%    48.0B
 ────────────────────────────────────────────────────────────────────────────────

We can see that most of memory allocations come from the updates of the states, which call the function tsunami_update!. We can save some memory allocations by having state to be a matrix of the right shape, and just slice the needed parts, to get eta, mm, nn, etc...

An additional improvement for memory allocations is to pass to LLW2d.timestep! the preallocated buffers for dxeta, dyeta, dxM, and dyN. This shouldn't give much speedup, but it would reduce sensibly memory allocations, and hence the pressure on the garbage collector.

The text was updated successfully, but these errors were encountered:

giordano · 2020-05-19T11:46:53Z

I had a quick go to preallocate the buffers for LL2wd.timestep!:

julia> tdac(TDAC.tdac_params(;nprt = 1, n_time_step = 1, n_integration_step = 251, time_step = 251.0, nobs = 4, enable_timers = true));
 ────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             449ms / 100%            54.7MiB / 100%     

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 Particle State Update        1    174ms  38.9%   174ms   9.52KiB  0.02%  9.52KiB
 True State Update            1    154ms  34.4%   154ms   8.69KiB  0.02%  8.69KiB
 Initialization               1   99.1ms  22.1%  99.1ms   44.2MiB  81.0%  44.2MiB
 Process Noise                1   19.4ms  4.33%  19.4ms   8.55MiB  15.6%  8.55MiB
 Particle Variance            1    607μs  0.14%   607μs   1.83MiB  3.35%  1.83MiB
 Resample                     1    271μs  0.06%   271μs     96.0B  0.00%    96.0B
 Particle Mean                1    213μs  0.05%   213μs     0.00B  0.00%    0.00B
 State Copy                   1   61.9μs  0.01%  61.9μs     32.0B  0.00%    32.0B
 Observations                 2   17.7μs  0.00%  8.84μs      224B  0.00%     112B
 Weights                      1   14.9μs  0.00%  14.9μs   1.03KiB  0.00%  1.03KiB
 Observation Noise            1    884ns  0.00%   884ns     48.0B  0.00%    48.0B
 ────────────────────────────────────────────────────────────────────────────────

Total memory used is cut down by more than 90%.

giordano · 2020-05-19T18:56:56Z

It turned out that when we have many particles the contribution of "Process Noise" becomes very large, see tables in #61. This has been addressed by #60 and #63. With these PRs the memory allocations is vastly dominated by the initialisation steps, as we'd expect it to be.

giordano mentioned this issue May 18, 2020

Use multi-dimensional arrays to represent states #55

Merged

giordano mentioned this issue May 19, 2020

Preallocate the buffer for tsunami update #57

Merged

giordano closed this as completed in #57 May 19, 2020

tkoskela mentioned this issue May 21, 2020

Slow performance in Process Noise #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory allocations improvements #53

Possible memory allocations improvements #53

giordano commented May 18, 2020 •

edited

Loading

giordano commented May 19, 2020 •

edited

Loading

giordano commented May 19, 2020 •

edited

Loading

Possible memory allocations improvements #53

Possible memory allocations improvements #53

Comments

giordano commented May 18, 2020 • edited Loading

giordano commented May 19, 2020 • edited Loading

giordano commented May 19, 2020 • edited Loading

giordano commented May 18, 2020 •

edited

Loading

giordano commented May 19, 2020 •

edited

Loading

giordano commented May 19, 2020 •

edited

Loading