Implemented Gathering/Scattering of Data with MPI_GATHERV/MPI_SCATTERV #40

hga007 · 2024-10-15T20:59:23Z

In distributed-memory parallel applications with serial I/O files, data is scattered from global to tiled arrays during reading and gathered from tiled to global arrays during writing. The ROMS distribute.F module includes routines mp_scatter2d, mp_scatter3d, mp_gather2d, and mp_gather3d for such I/O processing.

For scattering, mp_scatter2d and mp_scatter3d uses MPI_BCAST routine if the ROMS C-preprocessing option SCATTER_BCAST is activated. Otherwise, it will default to the more efficient collective communications routine MPI_SCATTERV.
For gathering, mp_gather2d and mp_gather3d uses MPI_ISEND/MPI_IRECV routines if the ROMS C-preprocessing option GATHER_SENDRECV. Otherwise, it will default to calling the collective communications routine MPI_GATHERV.

Using MPI_SCATTERV and MPI_GATHERV is complicated to implement in applications when the grid size is not an exact multiple of the number of processor layouts, leading to parallel tiles of different sizes. In addition, the partition of data segments is not contiguous in Fortran memory. Strategies to manipulate the chunks of parallel data with MPI types and subarrays (say, MPI_TYPE_CREATE_SUBARRAY) are available, but these are inefficient with multiple-dimensional data. However, achieving a generic solution for any ROMS grid can be challenging.

After some experimentation, an alternative, efficient, and generic solution was achieved. For example, in mp_scatter3d we now have:

!
!  If mater node, Set (i,j,k) indices map from global array to vector.
!
      IF (MyRank.eq.MyMaster) THEN
        allocate ( ijk_global(Io:Ie,Jo:Je,LBk:UBk) )
!
        DO k=LBk,UBk
          kc=(k-Koff)*IJsize
          DO j=Jo,Je
            jc=(j-Joff)*Isize
            DO i=Io,Ie
              ijk=i+Ioff+jc+kc
              ijk_global(i,j,k)=ijk
            END DO
          END DO
        END DO
!
!  Reorganize the input global vector in such a way that the tile data
!  is continuous in memory to facilitate "SCATTERV" with different size
!  sections.
!
        nc=0
        DO rank=0,Ntasks-1
          iLB=BOUNDS(ng) % Imin(Cgrid,ghost,rank)
          iUB=BOUNDS(ng) % Imax(Cgrid,ghost,rank)
          jLB=BOUNDS(ng) % Jmin(Cgrid,ghost,rank)
          jUB=BOUNDS(ng) % Jmax(Cgrid,ghost,rank)
          DO k=LBk,UBk
            DO j=jLB,jUB
              DO i=iLB,iUB
                ijk=ijk_global(i,j,k)
                nc=nc+1
                Vreset(nc)=Vglobal(ijk)
              END DO
            END DO
          END DO
        END DO
        deallocate (ijk_global)
      END IF
!
!  Distribute global data into local tiled arrays.
!
      Mysize=(Imax-Imin+1)*(Jmax-Jmin+1)*(UBk-LBk+1)
      allocate ( Vrecv(MySize) )
      Vrecv=0.0_r8
!
      CALL mpi_scatterv (Vreset, counts, displs, MP_FLOAT,              &
     &                   Vrecv, MySize, MP_FLOAT,                       &
     &                   MyMaster, OCN_COMM_WORLD, MyError)
      IF (MyError.ne.MPI_SUCCESS) THEN
        CALL mpi_error_string (MyError, string, Lstr, Serror)
        WRITE (stdout,20) 'MPI_SCATTERV', MyRank, MyError,              &
     &                    TRIM(string)
 20     FORMAT (/,' MP_SCATTER3D - error during ',a,                    &
     &          ' call, Task = ', i3.3, ' Error = ',i3,/,15x,a)
        exit_flag=2
        RETURN
      END IF
!
!  Unpack data buffer and load into tiled array
!
      nc=0
      DO k=LBk,UBk
        DO j=Jmin,Jmax
          DO i=Imin,Imax
            nc=nc+1          
            Awrk(i,j,k)=Vrecv(nc)
          END DO
        END DO
      END DO
      deallocate ( Vrecv )

Its inverse gathering communication is more straightforward in mp_gather3d:

!
!  Gather local tiled data into a global array.
!
      allocate ( Arecv(IJsize*Ksize) )
      Arecv=0.0_r8
!
      CALL mpi_gatherv (Asend, Asize, MP_FLOAT,                         &
     &                  Arecv, counts, displs, MP_FLOAT,                &
     &                  MyMaster, OCN_COMM_WORLD, MyError)
      IF (MyError.ne.MPI_SUCCESS) THEN
        CALL mpi_error_string (MyError, string, Lstr, Serror)
        WRITE (stdout,10) 'MPI_GATHERV', MyRank, MyError, TRIM(string)
 10     FORMAT (/,' MP_GATHER3D - error during ',a,' call, Task = ',    &
     &          i3.3, ' Error = ',i3,/,15x,a)
        exit_flag=2
        RETURN
      END IF
!
!  Unpack gathered data in a continuous memory order and remap every
!  task segment.
!
      nc=0
      DO rank=0,Ntasks-1
        iLB=BOUNDS(ng) % Imin(Cgrid,ghost,rank)
        iUB=BOUNDS(ng) % Imax(Cgrid,ghost,rank)
        jLB=BOUNDS(ng) % Jmin(Cgrid,ghost,rank)
        jUB=BOUNDS(ng) % Jmax(Cgrid,ghost,rank)
        DO k=LBk,UBk
          kc=(k-Koff)*IJsize
          DO j=jLB,jUB
            jc=(j-Joff)*Isize
            DO i=iLB,iUB
              ijk=i+Ioff+jc+kc
              nc=nc+1
              Awrk(ijk)=Arecv(nc)
            END DO
          END DO
        END DO
      END DO
      deallocate (Arecv)

The counts and displacements (counts, displs) vectors for mpi_scatterv and mpi_gatherv are computed as follows:

      Ntasks=NtileI(ng)*NtileJ(ng)
      nc=0
      DO rank=0,Ntasks-1
        iLB=BOUNDS(ng) % Imin(Cgrid,ghost,rank)
        iUB=BOUNDS(ng) % Imax(Cgrid,ghost,rank)
        jLB=BOUNDS(ng) % Jmin(Cgrid,ghost,rank)
        jUB=BOUNDS(ng) % Jmax(Cgrid,ghost,rank)
        displs(rank)=nc
        DO k=LBk,UBk
          DO j=jLB,jUB
            DO i=iLB,iUB
              nc=nc+1
            END DO
          END DO
        END DO
        counts(rank)=nc-displs(rank)
      END DO

https://www.myroms.org/projects/src/ticket/975

hga007 added 7 commits October 15, 2024 13:03

Implemented gathering/scattering with mpi_gatherv/mpi_scatterv

53db325

https://www.myroms.org/projects/src/ticket/975

Corrected typos in documentation

87e553f

https://www.myroms.org/projects/src/ticket/975

Corrected typo in documentation

5d91a3a

https://www.myroms.org/projects/src/ticket/975

Allow the options of MPI with multiple threads processing

efcc60f

Added MULTI_THREAD option to MPI initialization

b62e751

Corrected mp_gather2d_xtr

0108c17

Updated grid decimation to work in single precision with mp_gather

647fc6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Gathering/Scattering of Data with MPI_GATHERV/MPI_SCATTERV #40

Implemented Gathering/Scattering of Data with MPI_GATHERV/MPI_SCATTERV #40

hga007 commented Oct 15, 2024 •

edited

Loading

Implemented Gathering/Scattering of Data with MPI_GATHERV/MPI_SCATTERV #40

Are you sure you want to change the base?

Implemented Gathering/Scattering of Data with MPI_GATHERV/MPI_SCATTERV #40

Conversation

hga007 commented Oct 15, 2024 • edited Loading

hga007 commented Oct 15, 2024 •

edited

Loading