-
Notifications
You must be signed in to change notification settings - Fork 1
2016 08 08 webex
-
Attendees
- Jeff Squyres
-
Discussion points for today:
- Continue to discuss feedback from the 2016 June/Bellevue WA, USA MPI Forum meeting
Continue discussing feedback from Forum meeting.
-
Continue discussion about how to handle errors for APIs not involving sessions (e.g., info, op, errhandler, datatype)
- See slide 56
-
Dan: **Can we have a function to translate a group from one session to another?
- See new slide 48
- Pavan:
MPI_Session_finalize
as presented is kind of collective. We don't want that. Who would it be collective with?- Tony: If we say that send cancel with sessions is illegal, does that make
MPI_Session_finalize
non-collective? - Hubert:
MPI_Request_free
- Everyone: Crap...
- Martin: What if we say that all communication taking place in the session must be done?
- Aurelien: What about sends where the data is buffered but not transferred?
- 2016-07-25: We want to be analogous to MPI_FINALIZE. The only language we have a problem with in the MPI-3.1 definition of MPI_FINALIZE is the sope of "collective", and fix the reference to MPI_COMM_WORLD: "MPI_FINALIZE is collective over all connected processes. If no processes were spawned, accepted or connected then this means over MPI_COMM_WORLD; otherwise it is collective over the union of all processes that have been and continue to be connected, as explained in Section 10.5.4.
- 2016-07-25: Dan's insight: yes, session is local, but just define "collective" to be over all the communication objects that still exist that were derived from the session (comms, files, windows). Yes! This seems like a good way to view it / move forward.
- 2016-08-08: Still like Dan's answer. Done with this one.
- Tony: If we say that send cancel with sessions is illegal, does that make
-
Jeff: How do you abort "all connected processes" when you may not have connected to all processes in
mpi://WORLD
?- Wesley: This would make the new error handler definitions very gross (leverages "all connected processes" to mean everyone in
MPI_COMM_WORLD
+ connected dynamics when definingMPI_ERRORS_ARE_FATAL
. -
2016-08-08 - Dan: Should we ask the runtime to abort everyone for us? Probably not as this is a different semantic (kill the job) from the current behavior (abort connected processes).
- Let's just keep the existing behavior where aborting will cascade through all connected processes. Most likely, this will abort everyone anyway if the error handler was not changed.
- Wesley: This would make the new error handler definitions very gross (leverages "all connected processes" to mean everyone in
-
Martin/Pavan: If you can't create the global address table at init time, that could make the common case of address tracking expensive because you may have to have per-communicator arrays to track all addressing info.
- Pavan: You may be able to recreate this by allocating the big array to potentially hold all procs at
MPI_Session_create
time. - Jeff: This already isn't a problem for OMPI because it uses a dynamically growing array of pointers to proc structs.
- 2016-08-08 - Wesley: We're fine with going with the answers that Pavan & Jeff gave.
- Pavan: You may be able to recreate this by allocating the big array to potentially hold all procs at
-
Martin: In MPI 3.1, does
MPI_Init
still need to be collective?- 2016-08-08 - Dan - Maybe, maybe not. But let's just leave the MPI 3.1 functions as they are.
-
Pavan:
MPI_IO
can't be the same on all communicators. In fact, many of the built in attribute keys may not want to be the same on all communicators.- All: Should we make the special attributes be allowed to be different per communicator? Probably, especially for
MPI_TAG_UB
andMPI_IO
. -
2016-08-08 - Wesley: We don't specify that the attributes need to be the same everywhere, just defined everywhere. We'll definitely need to allow them to be different on different processes.
- Dan - e.g. If we set
only_any_tag
, thenTAG_UB
may be set to0
for a particular communicator.
- Dan - e.g. If we set
- All: Should we make the special attributes be allowed to be different per communicator? Probably, especially for
-
Aurelien: Instead of using a
parent_comm
forMPI_Exec
, why not use a group and tag like other communicator creation functions?- 2016-08-08: Agree. Make it symmetric to the new definitions for inter-communicators (use group + tag).
-
Wesley: The new runtime sets from
MPI_Exec
will not be visible everywhere (can only see the sets you're in). Any one involved process will see at most two out of three.- You can construct the other with group subtraction.
-
All: Is there a good use case for needing all three exec sets anyway? We can derive the set we are in (parent vs. children). We can't get the other one (because we're not in it).
- The only one we need is the new big set that includes all processes in parent and children.
- 2016-08-08: It may not be required, but it's a nice optimization that the implementation can provide cheaper than the user can do it themselves.
-
Pavan: How do we know when processes are done so it's safe to spawn again?
- 2016-08-08: The application should handle this themselves. Just do a collective to know when it's safe.
-
Pavan: MPI doesn't need replace because it can
MPI_Session_finalize
andexecvp
.- Anh: That doesn't exist in Windows.
- 2016-08-08 - Dan: Just wrap everything up in a script that launches both the app and the follow-on job (e.g data analysis, etc.)
-
Jeff: Add thread safety to
MPI_Session_init_comm
.- Wesley: What about error handler and info?
-
Pavan: Multithreading may be a problem where the tag isn't enough because the threads can be executed in any order.
- Pavan: However, one MPI call can't block the entire stack so maybe it's ok.
-
Martin: The wording around
set_name
onMPI_Session_init_comm
needs to get cleaned up.