Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Gathering/Scattering of Data with MPI_GATHERV/MPI_SCATTERV #40

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

hga007
Copy link
Collaborator

@hga007 hga007 commented Oct 15, 2024

In distributed-memory parallel applications with serial I/O files, data is scattered from global to tiled arrays during reading and gathered from tiled to global arrays during writing. The ROMS distribute.F module includes routines mp_scatter2d, mp_scatter3d, mp_gather2d, and mp_gather3d for such I/O processing.

  • For scattering, mp_scatter2d and mp_scatter3d uses MPI_BCAST routine if the ROMS C-preprocessing option SCATTER_BCAST is activated. Otherwise, it will default to the more efficient collective communications routine MPI_SCATTERV.
    image

  • For gathering, mp_gather2d and mp_gather3d uses MPI_ISEND/MPI_IRECV routines if the ROMS C-preprocessing option GATHER_SENDRECV. Otherwise, it will default to calling the collective communications routine MPI_GATHERV.
    image

Using MPI_SCATTERV and MPI_GATHERV is complicated to implement in applications when the grid size is not an exact multiple of the number of processor layouts, leading to parallel tiles of different sizes. In addition, the partition of data segments is not contiguous in Fortran memory. Strategies to manipulate the chunks of parallel data with MPI types and subarrays (say, MPI_TYPE_CREATE_SUBARRAY) are available, but these are inefficient with multiple-dimensional data. However, achieving a generic solution for any ROMS grid can be challenging.

After some experimentation, an alternative, efficient, and generic solution was achieved. For example, in mp_scatter3d we now have:

!
!  If mater node, Set (i,j,k) indices map from global array to vector.
!
      IF (MyRank.eq.MyMaster) THEN
        allocate ( ijk_global(Io:Ie,Jo:Je,LBk:UBk) )
!
        DO k=LBk,UBk
          kc=(k-Koff)*IJsize
          DO j=Jo,Je
            jc=(j-Joff)*Isize
            DO i=Io,Ie
              ijk=i+Ioff+jc+kc
              ijk_global(i,j,k)=ijk
            END DO
          END DO
        END DO
!
!  Reorganize the input global vector in such a way that the tile data
!  is continuous in memory to facilitate "SCATTERV" with different size
!  sections.
!
        nc=0
        DO rank=0,Ntasks-1
          iLB=BOUNDS(ng) % Imin(Cgrid,ghost,rank)
          iUB=BOUNDS(ng) % Imax(Cgrid,ghost,rank)
          jLB=BOUNDS(ng) % Jmin(Cgrid,ghost,rank)
          jUB=BOUNDS(ng) % Jmax(Cgrid,ghost,rank)
          DO k=LBk,UBk
            DO j=jLB,jUB
              DO i=iLB,iUB
                ijk=ijk_global(i,j,k)
                nc=nc+1
                Vreset(nc)=Vglobal(ijk)
              END DO
            END DO
          END DO
        END DO
        deallocate (ijk_global)
      END IF
!
!  Distribute global data into local tiled arrays.
!
      Mysize=(Imax-Imin+1)*(Jmax-Jmin+1)*(UBk-LBk+1)
      allocate ( Vrecv(MySize) )
      Vrecv=0.0_r8
!
      CALL mpi_scatterv (Vreset, counts, displs, MP_FLOAT,              &
     &                   Vrecv, MySize, MP_FLOAT,                       &
     &                   MyMaster, OCN_COMM_WORLD, MyError)
      IF (MyError.ne.MPI_SUCCESS) THEN
        CALL mpi_error_string (MyError, string, Lstr, Serror)
        WRITE (stdout,20) 'MPI_SCATTERV', MyRank, MyError,              &
     &                    TRIM(string)
 20     FORMAT (/,' MP_SCATTER3D - error during ',a,                    &
     &          ' call, Task = ', i3.3, ' Error = ',i3,/,15x,a)
        exit_flag=2
        RETURN
      END IF
!
!  Unpack data buffer and load into tiled array
!
      nc=0
      DO k=LBk,UBk
        DO j=Jmin,Jmax
          DO i=Imin,Imax
            nc=nc+1          
            Awrk(i,j,k)=Vrecv(nc)
          END DO
        END DO
      END DO
      deallocate ( Vrecv )

Its inverse gathering communication is more straightforward in mp_gather3d:

!
!  Gather local tiled data into a global array.
!
      allocate ( Arecv(IJsize*Ksize) )
      Arecv=0.0_r8
!
      CALL mpi_gatherv (Asend, Asize, MP_FLOAT,                         &
     &                  Arecv, counts, displs, MP_FLOAT,                &
     &                  MyMaster, OCN_COMM_WORLD, MyError)
      IF (MyError.ne.MPI_SUCCESS) THEN
        CALL mpi_error_string (MyError, string, Lstr, Serror)
        WRITE (stdout,10) 'MPI_GATHERV', MyRank, MyError, TRIM(string)
 10     FORMAT (/,' MP_GATHER3D - error during ',a,' call, Task = ',    &
     &          i3.3, ' Error = ',i3,/,15x,a)
        exit_flag=2
        RETURN
      END IF
!
!  Unpack gathered data in a continuous memory order and remap every
!  task segment.
!
      nc=0
      DO rank=0,Ntasks-1
        iLB=BOUNDS(ng) % Imin(Cgrid,ghost,rank)
        iUB=BOUNDS(ng) % Imax(Cgrid,ghost,rank)
        jLB=BOUNDS(ng) % Jmin(Cgrid,ghost,rank)
        jUB=BOUNDS(ng) % Jmax(Cgrid,ghost,rank)
        DO k=LBk,UBk
          kc=(k-Koff)*IJsize
          DO j=jLB,jUB
            jc=(j-Joff)*Isize
            DO i=iLB,iUB
              ijk=i+Ioff+jc+kc
              nc=nc+1
              Awrk(ijk)=Arecv(nc)
            END DO
          END DO
        END DO
      END DO
      deallocate (Arecv)

The counts and displacements (counts, displs) vectors for mpi_scatterv and mpi_gatherv are computed as follows:

      Ntasks=NtileI(ng)*NtileJ(ng)
      nc=0
      DO rank=0,Ntasks-1
        iLB=BOUNDS(ng) % Imin(Cgrid,ghost,rank)
        iUB=BOUNDS(ng) % Imax(Cgrid,ghost,rank)
        jLB=BOUNDS(ng) % Jmin(Cgrid,ghost,rank)
        jUB=BOUNDS(ng) % Jmax(Cgrid,ghost,rank)
        displs(rank)=nc
        DO k=LBk,UBk
          DO j=jLB,jUB
            DO i=iLB,iUB
              nc=nc+1
            END DO
          END DO
        END DO
        counts(rank)=nc-displs(rank)
      END DO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant