-
Notifications
You must be signed in to change notification settings - Fork 0
2017 03 14
Aurelien Bouteiller edited this page Mar 14, 2017
·
1 revision
- Aurelien Bouteiller (UTK)
- Murali Emani (LBNL)
- Keita Teranishi (Sandia)
- Nawrin Sultana (Auburn)
- Alexander Calvert (Auburn)
https://docs.google.com/presentation/d/1YOccLbrHd42vUtgt0KZWymXME0HtlVg8BndnIe3n6jc/edit?usp=sharing
The doodle poll has yielded results. Proposed biweekly 2pm CST from March 29 on.
Wesley Absent, this topic has been left dormant.
- As discussed in the WG f2f, auto jumping is problematic. Keita notes that one can jump only in a parent of the stack frame, which limits where the setjmp can be done (not in a function call, basically, and in most cases only in main is sane).
- Murali to investigate how MPI_Reinit deals with the issue. If they have a nice solution, we can reuse, but as it seems now, auto jumping is out of the question and must be delegated to users.
- As it stands, it seems that we can support longjmp only if we have language support for it, which is not available in Fortran/C
- note however that we have implemented transactions with macros that do setjmp/longjmp, and used longjmp from error handlers, but all under user's control, where function call nesting is not MPI's problem.
- One may be able to set a "reinit" error handler on some communicator. The application could thus alternate between "reinit" phases and local recovery phases, or execute reinit on a section, and local recovery on another (or multiple independent instances of reinit, even).
- still subject to the longjmp complication... + where do we jump if not after MPI_Init? Proposed we get out of MPI_COMM_DUP, could work, but as long a we are unsure it can be cleanly implemented, this is tentative.
- Keita confirms this function would ease Fenix implementation.
- Exploring the idea of "descendants_revoke" that would revoke all communicators created from the revoked communicator i.e. calling it on MPI_COMM_WORLD would revoke everything but SELF and get_parent()).
- idea is amusing, not sure yet if it can be made to work, or how much the forum would receive these new concepts of descendent communicators/windows/files.