-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch3: use group to build communicator vc tables #7242
Open
hzhou
wants to merge
33
commits into
pmodels:main
Choose a base branch
from
hzhou:2412_ch3_vcrt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hzhou
force-pushed
the
2412_ch3_vcrt
branch
6 times, most recently
from
December 20, 2024 15:58
1ab1165
to
9cc9027
Compare
4 tasks
hzhou
force-pushed
the
2412_ch3_vcrt
branch
5 times, most recently
from
December 21, 2024 14:07
706bc6a
to
033a063
Compare
hzhou
changed the title
ch3: refactor to remove the usage of comm mapper
ch3: use group to build communicator vc tables
Dec 21, 2024
hzhou
force-pushed
the
2412_ch3_vcrt
branch
4 times, most recently
from
December 22, 2024 17:29
3a44553
to
27cdee9
Compare
test:mpich/ch3/most |
Miscellenous typo fixes to appease the spellchecker.
This test requires to access MPICH internals, thus won't be used with the current design.
We no longer use this file.
Hide the internal fields of MPIR_Group from unnecessary access. Outside group_util.c and group_impl.c, it only need assume the MPIR_Lpid integer type, creation routines based on lpid map or lpid stride description, and access routine to look up lpid from a group rank.
For most external usages, we only need MPIR_Group_rank_to_lpid.
Avoid access group internal fields.
Group similar functions together to facilitate refactoring. There is no changes in this commit other than moving functions around. The 4 incl/excl functions are very similar. The 3 difference/intersection/union functions are very similar.
Use MPIR_Group_{rank_to_lpid,lpid_to_rank} to avoid directly access MPIR_Group internal fields. For most group creation routines, just populate an lpid lookup map and call MPIR_Group_create_map to create the group.
* add option to use stride to describe group composition * remove the linked list design
This is the same as MPID_Comm_get_lpid. NOTE: we'll will remove MPID_Comm_get_lpid as well once we move the ownership of lpid to the MPIR-layer.
There is no real difference between lpid and gpid. Thus rename gpid in the device layer to lpid for clarification. Replace the usage of uint64_t as the type of lpid to MPIR_Lpid. This improves consistency.
We need a device-independent way of identifying processes. One way is to use the combination of (world_idx, world_rank). Thus, we need maintain a list of worlds so that the world_idx points to the world record. This may not fit in the concept of MPI group, but since the group need a ways of id processes, thus it seems most closely related. The first world, world_idx 0, is always initialized at init. Due to session re-init, we need make sure to reset num_worlds to 0 at finalize. New worlds will be added upon spawning or connecting dynamic processes (to-be-implemented).
We need reset num_worlds so that Session re-init will work.
Add builtin MPIR_GROUP_WORLD and MPIR_GROUP_SELF, so we can create builtin communicators from builtin groups.
Internally the only reason to duplicate a group is to copy from NULL session to a new session. Otherwise, we can just use the same group and increment the reference count.
Since builtin groups can be returned to users, they should be allowed to free. They are reference counted anyway.
To make MPI group a first-class citizen, we will always have group before creating communicators, so that when device layer activate communiators, e.g. in MPID_Comm_commit_pre_hook, it can rely on the group to look up the involved processes. It also removes the necessity to maintain any other process addressing schemes.
Many places we just return MPIR_Group_empty without increment the ref_count. This is fixable. But for now, let's avoid freeing it.
The init_comm does the release manually.
Add assertions to make sure the local_group and remote_group (for inter communicators) are always set before MPID_Comm_commit_pre_hook.
Otherwise, the MPI_T functions may not able to convert builtin datatypes.
When we run tests as functions, the stray output in MPI_Finalize, such as the debug messages in debug builds, are not captures previously. This patch make sure we report such stray output as failures.
Now that we always have group inside a communicator, we can simply return the lpid from the group. Because this will be used in the hot path, make it inline.
Add the following macros: MPIR_LPID_WORLD_INDEX MPIR_LPID_WORLD_RANK MPIR_LPID_FROM
Fix a typo in setting the size of MPIR_GROUP_SELF. Add ref_count if we return MPIR_GROUP_EMPTY to prevent freeing the builtin when it is released internally. Unfortunately, since user can directly use MPI_GROUP_EMPTY, we can't keep ref_count accurate. But at least we can keep it positive to prevent an actual free.
The builtin groups are in session NULL. We need duplicate the groups in MPIR_Group_from_session_pset_impl to return a group in the correct session.
Group are a natural place to host vcrt (virtual connection reference table). When communicators are duplicated, groups are simply inherited and reference counted. Thus we won't end up with duplication of vcrt.
Because the tmp_comm uses a temporary vc that doesn't belong to any pg, it is incompatible to the new comm init process (that relies on lpid lookup to construct vcrt tables). Turns out we only need tmp_comm to perform basic send/recv (MPIC_Sendrecv) and we don't need most of the facility of a normal communicator. Shortcut the tmp_comm construction and destroy greatly simplifies the code.
Replace the usage of mapper with comm->local_group and comm->remote_group in MPIDI_CH3I_Comm_commit_pre_hook.
The only logic for whether to release a vc is whether this vc is for a dynamic process. It has nothing to do with the whether MPI_Comm_disconnect is called. The semantics of MPI_Comm_disconnect is just to wait for all communication complete. It is orthogonal to how the comm is destroyed.
In MPIR_Comm_create_inter, we know whether the remote group is empty after the exchange, thus it is unnecessary to create and commit the intercomm then delete it later. Simply don't create it in the first place. The device layer is not necessarily equipped to handle intercomm commit with empty groups.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Based on #7235, #7237
Now the
local_group
andremote_group
inMPIR_Comm
can fully replace the functions of mapper, refactor ch3 to use group instead of mapper inMPIDI_CH3I_Comm_commit_pre_hook
.[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.