Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI communicator generalizations #1277

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

jmsexton03
Copy link

Pull Request Summary

Description

Summary of changes:

  • add mpicomm module to track split communicator
  • use mpicomm module to replace MPI_COMM_WORLD statements

These changes extend some of the OASIS related changes by using a separate communicator that is specifically WW3-related for all routines. This makes communication safer by avoiding conflicting tags, and makes collective communications more maintainable.

Issue(s) addressed

This PR is associated with issue #1261
This PR addresses changes to make the use of MPI more general to allow for easier coupling.

Commit Message

MPI communicator generalizations

Check list

Testing

  • How were these changes tested?
    • These were tested by the github actions and were tested locally on a subset of the regtests
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
    • These are covered by the regression tests, although they won't significantly affect some exe if the regtests don't launch with MPI such as using mpirun or mpiexec
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
    • The matrix regression tests have not been run
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jmsexton03, thank you for submitting this fix to #1261. I've started the regression testing for it. Just a note that @JessicaMeixner-NOAA is on travel this week, and we'd also like to allow time for trusted repo managers to do their own tests, so please allow some extra processing time for that. Early next week is a good tentative time to hear back.

Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments for consideration now. Additional testing and further review is likely necessary as I have only been able to have a quick look, but wanted to provide feedback as soon as possible.

@@ -282,7 +282,7 @@
},
{
"name": "NL2",
"build_files": ["w3snl2md.F90", "mod_xnl4v5.f90", "serv_xnl4v5.f90", "mod_fileio.f90", "mod_constants.f90"],
"build_files": ["w3snl2md.F90", "mod_xnl4v5.f90", "serv_xnl4v5.f90", "mod_fileio.f90", "mod_constants.f90", "mod_mpicomm.f90"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a file mod_mpicomm.f90.

Also I think mpicomm.F90 should be adde with the MPI switch not NL2.

Copy link
Author

@jmsexton03 jmsexton03 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert this change, I was initially confused about whether mod files were separately needed
(I will wait on this fix for further clarity on whether a separate module is needed)

!> @author J. M. Sexton @date 19-Jul-2024
!>
!
#ifndef ENDIANNESS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why this is added here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was modelled off of constants, this ifdef is probably not needed.

@@ -410,6 +410,7 @@ SUBROUTINE W3INIT ( IMOD, IsMulti, FEXT, MDS, MTRACE, ODAT, FLGRD, FLGR2, FLGD,
UA, UD, U10, U10D, AS
#ifdef W3_MPI
USE W3ADATMD, ONLY: MPI_COMM_WAVE, MPI_COMM_WCMP
USE MPICOMM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these USE MPICOM, if we can add the "only" statement that would be nice. I know it's not uniformly done in WW3 right now, but since this is an addition, it would be good to add it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push a commit to make these all have an only statement

@@ -46,7 +46,7 @@ fi
#convert main_dir to absolute path
main_dir="`cd $main_dir 1>/dev/null 2>&1 && pwd`"

cmplr=datarmor_intel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mickaelaccensi is this change okay?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was for testing only, I will push a commit reverting it, or if you want to do "Applying suggestions" through github I can also do that

!/ ------------------------------------------------------------------- /
!/
!
!#ifdef W3_MPI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean up this commented out code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you referring to the ifdef W3_MPI here? I was unsure if I followed the header convention correctly

@@ -3,6 +3,7 @@ set(c_src w3getmem.c)
# Core files always built
set(ftn_src
constants.F90
mpicomm.F90
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me if this should only be included if you have the MPI switch and therefore be added in model/src/cmake/switches.json instead.

Copy link
Author

@jmsexton03 jmsexton03 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try to include ifdefs to make mpicomm.F90 not affect anything if it was compiled without MPI, but I can try to change this to model/src/cmake/switches.json since that sounds cleaner (I will wait on this fix for further clarity on whether a separate module is needed)

#endif
!
!/ ------------------------------------------------------------------- /
MODULE MPICOMM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly lightweight routine, should we add this to an existing file such as constants?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the ramifications would be here, I'll consult with one of my collaborators who'd originally suggested having it as a separate file. I believe the intent was to make it more portable and obviously defined. Would w3parall.F90 be appropriate? I do see that IS_ESMF_COMPONENT is in constants.F90

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move things to another location if that's your preference. When you know, can you point me to where specifically so that I can push an update?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w3parall is mostly for unstructured grids, where as this is for all grids, so I think constants might be a better choice. Others can weigh in on this thought too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, and I'm also interested in others thoughts 🙂 I was originally looking at w3parall because the SYNCHRONIZE_GLOBAL_ARRAY has been useful in our coupling tests so far, although I do see now that the header for this file indicates it's "Parallel routines for implicit solver"

@@ -102,10 +102,10 @@ jobs:
export CC=mpicc
export FC=mpif90
export OASISDIR=${GITHUB_WORKSPACE}/work_oasis3-mct
# mkdir build && cd build
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were from attempting to test with debug flags on, but I should revert them (particularly as that seemed to not be a good way to build with debug)

Copy link
Collaborator

@mickaelaccensi mickaelaccensi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to add it into constants.ftn.
I approve this PR with the condition it must pass all the regtests done by Matt

@MatthewMasarik-NOAA
Copy link
Collaborator

I agree to add it into constants.ftn. I approve this PR with the condition it must pass all the regtests done by Matt

@mickaelaccensi it looks like there are still many unresolved conversations above. I'll be posting this shortly, but we will be taking a pause on this and other PR processing temporarily. This is a significant enough of a PR that we'll want to have all code managers test on their respective machines when we resume work on this.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jmsexton03, thank you for your patience on this, and I apologize I haven't given a status update sooner. Our team has had some temporary changes that affect our ability to process PR's, so thanks for bearing with us while we worked behind the scenes to adjust. I just posted our group statement regarding this on the Commit Queue: Temporary PR Policy. We tentatively hope to resume mid to late November. This PR has some great advantages, we'll look forward to being able to look at it carefully when we pick back up our processing in November.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants