Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review GA_Fence() et al for thread safety #27

Open
jeffdaily opened this issue Apr 7, 2017 · 3 comments
Open

review GA_Fence() et al for thread safety #27

jeffdaily opened this issue Apr 7, 2017 · 3 comments
Assignees

Comments

@jeffdaily
Copy link
Member

The premise of calling GA_Fence_init() and later ending the fence with GA_Fence() is by design not thread safe since state is stored globally between these functions.

Do we consider an API change where we return a handle?

@jeffhammond
Copy link
Member

jeffhammond commented Apr 7, 2017

Another possibility is to assume/require that the main thread calls GA_Init_fence and GA_Fence, during which it will allocate and free global state. You can make the intervening put/get/acc (PGA) calls thread-safe by having them atomically update the global state. If you go with O(nproc) state, the PGA calls just increment a counter associated with each target. You can also do O(ntarget) state with a linked list or similar and use slightly more expensive atomic operations to update the list. GA_Fence just walks the list and calls ARMCI_Fence for all the targets.

The other option, which is what I did in my thread-safe branch, is to make GA_Init_fence a no-op and call ARMCI_AllFence in GA_Fence, which is thread-safe as long as ARMCI is. This is more expensive than necessary if ARMCI_AllFence is O(nproc), but in the cases where ARMCI already tracks the active target list, then it is basically equivalent. In the case where ARMCI_AllFence is essentially O(1) because all outstanding remote ops are tracked via a single counter, then the lazy approach is in fact optimal. We assume networks will implement the latter optimization, and that it can be exploited in e.g. MPI_Win_flush_all. I think dmapp_gsync is an example of this.

In any case, I don't care that much what happens here, because I don't believe NWChem uses GA_Fence.

@abhinavvishnu
Copy link
Contributor

@jeffhammond. Good point. This requires a bit more discussion -- hence it would be safe to assume that this would not be thread-safe, till we get to understand the implications.

@jeffhammond
Copy link
Member

It might be prudent to start by adding a "sparse" fence to ARMCI, meaning an ARMCI fence routine that takes a list or array of targets to fence. Once you know what the most efficient implementation is inside of ARMCI, just map GA to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants