Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize State Copy #116

Open
tkoskela opened this issue Jul 24, 2020 · 2 comments
Open

Optimize State Copy #116

tkoskela opened this issue Jul 24, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@tkoskela
Copy link
Member

tkoskela commented Jul 24, 2020

The point-to-point communications in State Copy can cause significant slowdown when going off-node. See example below. The algorithm should be optimized to avoid copies of duplicate particles. At present the algorithm works in roughly the following way

1. Work out what particles rank i has
2. Work out what particles rank i needs after resampling
3. Work out which rank has the particles rank i needs
4. Work out which ranks need the particles rank i has
5. Send each particle from rank i to each process that needs it
6. Receive each particle from the process that has it to rank i

https://github.com/Team-RADDISH/TDAC.jl/blob/90a318dbbbd4f80e88e96d2a80522150745705bb/src/TDAC.jl#L452

This is kind of a brute-force solution, it often happens in reality that a single particle has an extremely high weight and is copied many times to many processes. This algorithm will send the same particle over the network every time, even multiple times to the same process. A step should be added to the algorithm that identifies duplicate particles and copies them locally after a single copy has been received.

image

@tkoskela tkoskela added the enhancement New feature or request label Jul 24, 2020
@tkoskela tkoskela self-assigned this Jul 24, 2020
@tkoskela
Copy link
Member Author

The root of the issue is that state_copy creates wild load imbalances due to the nature of the algorithm that are hard to fix. They will only cause a slowdown at the next collective call, which happens to be the MPI_Reduce in get_mean_and_var. The idea above may help but is unlikely to completely fix the issue.

@tkoskela
Copy link
Member Author

https://github.com/Team-RADDISH/TDAC.jl/blob/8c67e8c6c3e492267983628f56406daac0c29dfe/src/TDAC.jl#L458-L465 Could be done on master and only the results sent via MPI. This would probably save some time compared to broadcasting the whole resampling_indices vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant