Skip to content

Commit

Permalink
Merge pull request #65 from stfc/64_simple_mpi_support
Browse files Browse the repository at this point in the history
64 simple mpi support
  • Loading branch information
arporter authored Feb 12, 2024
2 parents ad209e9 + 18fed02 commit e9c24f9
Show file tree
Hide file tree
Showing 10 changed files with 447 additions and 44 deletions.
1 change: 1 addition & 0 deletions doc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_build
29 changes: 23 additions & 6 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,19 @@ Three types of boundary condition are currently supported:
Name Description
=============== =========================================
GO_BC_NONE No boundary conditions are applied.
GO_BC_EXTERNAL Some external forcing is applied. This must be implemented by a kernel. The domain must be defined with a T-point mask (see :ref:`gocean1.0-grid-init`).
GO_BC_EXTERNAL Some external forcing is applied. This must be implemented by a kernel.
The domain can be specified using a ``tmask``, but if no ``tmask`` is
specified, a dummy ``tmask`` is created that will define an all ocean
domain.
GO_BC_PERIODIC Periodic boundary conditions are applied.
=============== =========================================

The infrastructure requires this information in order to determine the
extent of the model grid.

Note that at this stage ``GO_BC_PERIODIC`` is not supported when
using distributed memory. This is tracked in issue #54.

The index offset is required because a model (kernel) developer has
choice in how they actually implement the staggering of variables on a
grid. This comes down to a choice of which grid points in the vicinity
Expand Down Expand Up @@ -98,11 +104,12 @@ object. This is done via a call to the ``grid_init`` subroutine::
!! wet (1), dry (0) or external (-1).
integer, dimension(m,n), intent(in), optional :: tmask


If no T-mask is supplied then this routine configures the grid
appropriately for an all-wet domain with periodic boundary conditions
in both the *x*- and *y*-dimensions. It should also be noted that
currently only grids with constant resolution in *x* and *y* are
supported by this routine.
appropriately for an all-wet domain by allocating a default
T-mask. It should also be noted that currently only grids with
constant resolution in *x* and *y* are supported by this routine.


.. _gocean1.0-fields:

Expand All @@ -128,11 +135,21 @@ constructor::
sshn_v = r2d_field(model_grid, GO_V_POINTS)
sshn_t = r2d_field(model_grid, GO_T_POINTS)

The constructor takes two arguments:
The constructor takes two mandatory and two optional arguments:

1. The grid on which the field exists
2. The type of grid point at which the field is defined
(``GO_U_POINTS``, ``GO_V_POINTS``, ``GO_T_POINTS`` or ``GO_F_POINTS``)
3. ``do_tile``: If the field should be tiled among all threads, or if only
a single field should be allocated (which is not currently
supported by PSyclone).
4. ``init_global_data``: an optional global 2D Fortran array, which must be
provided on each rank. On each rank the field will be initialised
with the data from the corresponding subdomain. This is just a convenience
for users with a small problem size, since typically for large data sets
using a global array will create scalability problems. In general, it is
the responsibility of the user to initialise an array with the required
local data.

Note that the grid object must have been fully configured (by a
call to ``grid_init`` for instance) before it is passed into this
Expand Down
6 changes: 3 additions & 3 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand All @@ -89,7 +89,7 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# html_static_path = ['_static']

# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
Expand Down Expand Up @@ -182,7 +182,7 @@
# -- Options for intersphinx extension ---------------------------------------

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None}
intersphinx_mapping = {'python': ('https://docs.python.org/', None)}

# -- Options for todo extension ----------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion finite_difference/src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ all: ${API_LIB}

# Create the archive.
${API_LIB}: ${MODULES}
${AR} ${ARFLAGS} ${API_LIB} *.o
${AR} ${ARFLAGS} ${API_LIB} $(MODULES)

install:
${MAKE} -C .. install
Expand Down
112 changes: 108 additions & 4 deletions finite_difference/src/field_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
! OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
!------------------------------------------------------------------------------
! Author: A. R. Porter, STFC Daresbury Laboratory
! Modified: J. Henrichs, Australian Bureau of Meteorology

!> Module for describing all aspects of a field (which exists on some
!! grid).
Expand Down Expand Up @@ -161,6 +162,7 @@ end subroutine write_to_device_f_interface
procedure, pass :: read_from_device
procedure, pass :: write_to_device
procedure, public :: halo_exchange
procedure, public :: gather_inner_data
end type r2d_field

!> Interface for the copy_field operation. Overloaded to take
Expand Down Expand Up @@ -237,9 +239,10 @@ end subroutine write_to_device_f_interface

!===================================================

function r2d_field_constructor(grid, &
function r2d_field_constructor(grid, &
grid_points, &
do_tile) result(self)
do_tile, &
init_global_data) result(self)
use parallel_mod, only: go_decompose, on_master
!$ use omp_lib, only : omp_get_max_threads
implicit none
Expand All @@ -252,6 +255,9 @@ function r2d_field_constructor(grid, &
!> a single field should be allocated (which is not currently
!> supported by PSyclone)
logical, intent(in), optional :: do_tile
!> An optional global array with which the field in each domain
!> will be initialsed
real(go_wp), dimension(:,:), intent(in), optional :: init_global_data
! Local declarations
type(r2d_field), target :: self
logical :: local_do_tiling = .false.
Expand All @@ -261,7 +267,7 @@ function r2d_field_constructor(grid, &
!> The upper bounds actually used to allocate arrays (as opposed
!! to the limits carried around with the field)
integer :: upper_x_bound, upper_y_bound
integer :: itile, nthreads, ntilex, ntiley
integer :: itile, nthreads, ntilex, ntiley, dx, dy

if (present(do_tile)) then
local_do_tiling = do_tile
Expand Down Expand Up @@ -349,7 +355,7 @@ function r2d_field_constructor(grid, &
end if

! Since we're allocating the arrays to be larger than strictly
! required we explicitly set all elements to -999 in case the code
! required we explicitly set all elements to 0 in case the code
! does access 'out-of-bounds' elements during speculative
! execution. If we're running with OpenMP this also gives
! us the opportunity to do a 'first touch' policy to aid with
Expand All @@ -368,6 +374,19 @@ function r2d_field_constructor(grid, &
else
self%data(:,:) = 0.0_go_wp
end if

if (present(init_global_data)) then
dx = grid%subdomain%global%xstart - self%internal%xstart
dy = grid%subdomain%global%ystart - self%internal%ystart

!$OMP PARALLEL DO schedule(runtime), default(none), &
!$OMP private(ji,jj), shared(self, grid, init_global_data, dx, dy)
do jj = grid%subdomain%internal%ystart, grid%subdomain%internal%ystop
do ji = grid%subdomain%internal%xstart, grid%subdomain%internal%xstop
self%data(ji, jj) = init_global_data(ji+dx, jj+dy)
end do
end do
end if
end function r2d_field_constructor

!===================================================
Expand Down Expand Up @@ -506,6 +525,8 @@ subroutine write_to_device(self, startx, starty, nx, ny, blocking)
end subroutine write_to_device


!===================================================

function get_data(self) result(dptr)
!> Getter for the data associated with a field. If the data is on a
! device it ensures that the host copy is up-to-date with that on
Expand Down Expand Up @@ -1287,6 +1308,89 @@ end function array_checksum

!===================================================

!> Collect a (distributed) field into a global array
!! on the master node.
subroutine gather_inner_data(self, global_data)
use parallel_utils_mod, only: get_num_ranks, gather
use parallel_mod, only: on_master

class(r2d_field), intent(in) :: self
real(go_wp), dimension(:,:), &
allocatable, intent(out) :: global_data

real(go_wp), dimension(:), allocatable :: send_buffer, recv_buffer

integer :: dx, dy, ji, jj, i, n, rank, halo_x, halo_y
integer :: x_start, x_stop, y_start, y_stop, ierr

allocate(global_data(self%grid%global_nx, self%grid%global_ny), &
stat=ierr)
if(ierr /= 0)then
call gocean_stop('gather_inner_data failed to allocate global result array')
end if

! No MPI (or single process), just copy the data out
if (get_num_ranks() == 1) then
! Compute size of inner area only
dx = self%internal%xstart - 1
dy = self%internal%ystart - 1
do jj= self%internal%ystart, self%internal%ystop
do ji = self%internal%xstart, self%internal%xstop
global_data(ji-dx,jj-dy) = self%data(ji,jj)
end do
end do
return
endif

! Determine maximum size of data to be sent. We don't need
! to sent the halo, so reduce max_width and max_height by
! 2*halo size.
halo_x = self%internal%xstart-1
halo_y = self%internal%ystart-1
n = (self%grid%decomp%max_width - 2*halo_x) * &
(self%grid%decomp%max_height - 2*halo_y)
allocate(send_buffer(n), stat=ierr)
if(ierr /= 0)then
call gocean_stop('gather_inner_data failed to allocate send buffer')
end if
allocate(recv_buffer(n*get_num_ranks()), stat=ierr)
if(ierr /= 0)then
call gocean_stop('gather_inner_data failed to allocate receive buffer')
end if

! Copy data into 1D send buffer.
i = 0
do jj= self%internal%ystart, self%internal%ystop
do ji = self%internal%xstart, self%internal%xstop
i = i + 1
send_buffer(i) = self%data(ji,jj)
end do
end do

! Collect all send_buffers on the master:
call gather(send_buffer, recv_buffer)

if (on_master()) then
! Copy the data from each process into the global array
do rank=1, get_num_ranks()
x_start = self%grid%decomp%subdomains(rank)%global%xstart
x_stop = self%grid%decomp%subdomains(rank)%global%xstop
y_start = self%grid%decomp%subdomains(rank)%global%ystart
y_stop = self%grid%decomp%subdomains(rank)%global%ystop
i = (rank-1) * n
do jj= y_start, y_stop
do ji = x_start, x_stop
i = i + 1
global_data(ji, jj) = recv_buffer(i)
end do
end do
enddo
endif

end subroutine gather_inner_data

!===================================================

subroutine init_periodic_bc_halos(fld)
implicit none
class(field_type), intent(inout) :: fld
Expand Down
Loading

0 comments on commit e9c24f9

Please sign in to comment.