-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFT for OMP backend (via 2decomp&fft) #113
base: main
Are you sure you want to change the base?
Conversation
@CFD-Xing I feel like you added a lot of unnecessary files in the last commit. Can you remove them? The build worked before, you just needed to build 2decomp&fft and then setup an environment variable named |
…mp2d_INCLUDE_DIRS=/path-to-2decomp-fft/build
@CFD-Xing How does it work now compared to what we have in the ADIOS2 branch? https://github.com/xcompact3d/x3d2/blob/jq/implement-io/.gitmodules |
src/CMakeLists.txt
Outdated
@@ -78,7 +78,6 @@ if (${POISSON_SOLVER} STREQUAL "FFT" AND ${BACKEND} STREQUAL "OMP") | |||
message(STATUS "Using the FFT poisson solver with 2decomp&fft") | |||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake") | |||
find_package(decomp2d REQUIRED) | |||
include_directories(${decomp2d_INCLUDE_DIRS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is required to set the include paths for the 2decomp modules when building x3d2
…ecomp-integration
Finally ready for reviews |
else(decomp2d_FOUND) | ||
message(STATUS "2decomp-fft PATH not available we'll try to download and install") | ||
configure_file(${CMAKE_SOURCE_DIR}/cmake/decomp2d/downloadBuild2decomp.cmake.in decomp2d-build/CMakeLists.txt) | ||
#message("Second CMAKE_GENERATOR ${CMAKE_GENERATOR}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
#ExternalProject_Add(downloadBuild2decomp | ||
# GIT_REPOSITORY "https://github.com/xcompact3d/2decomp-fft" | ||
# GIT_TAG "main" | ||
# CONFIGURE_COMMAND "cmake -S ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-src " | ||
# BUILD_COMMAND "" | ||
# INSTALL_COMMAND "" | ||
# TEST_COMMAND "" | ||
# SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/decomp2d-src" | ||
# BINARY_DIR "" | ||
# INSTALL_DIR "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code?
!dims = size(x%data) | ||
! Fix for size being stored wrongly into dims | ||
dims = self%mesh%get_padded_dims(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comment and/or replace with explanation (to prevent people reverting back if non-obvious)
type(allocator_t), pointer :: host_allocator | ||
type(solver_t) :: solver | ||
type(mesh_t), target :: mesh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be class
based on src/allocator.f90
?
@@ -47,7 +47,7 @@ program test_omp_transeq | |||
L_global = [2*pi, 2*pi, 2*pi] | |||
|
|||
! Domain decomposition in each direction | |||
nproc_dir = [nproc, 1, 1] | |||
nproc_dir = [1, 1, nproc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for this change? Should we consider running variations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is because 2decomp&fft only allows 2D decomposition and as it is implemented at the moment it is in y and z. I think that's also a limitation of the cuda backend, but I am not sure anymore.
That could likely be changed in the future though, whether this should be part of this PR or not I am not sure.
if (${POISSON_SOLVER} STREQUAL "FFT" AND ${BACKEND} STREQUAL "OMP") | ||
list(APPEND SRC ${2DECOMPFFTSRC}) | ||
else() | ||
list(APPEND SRC ${GENERICDECOMPSRC}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GENERIC
do nothing? Will this be an issue for CUDA code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic does the decomposition fine, and should be used when using CUDA (or ITER). It may not be consistent with what 2decomp&fft does internally hence the need to have a separate one for it calling 2decomp directly.
@@ -64,7 +64,7 @@ module m_allocator | |||
contains | |||
|
|||
function allocator_init(mesh, sz) result(allocator) | |||
type(mesh_t), target, intent(inout) :: mesh | |||
class(mesh_t), target, intent(inout) :: mesh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have mesh subtypes now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no indeed, that was a reminiscence of a previous implementation
@@ -0,0 +1,54 @@ | |||
module m_decomp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module could use some comments
div_r = real(div_u(i, j, k), kind=dp)/(nx*ny*nz) | ||
div_c = aimag(div_u(i, j, k))/(nx*ny*nz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the mesh is large, either
num / nx / ny / nz
or
num / (int(nx, int64) * int(ny, int64) * int(nz, int64)
to avoid overflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, the CUDA implementation might need the same change as I took most of this from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like when x3d2 is set up with 2DECOMP&FFT support the decomposition depends on 2DECOMP&FFT and can't be changed at runtime. When the iterative solver is in the mainline we'll have cases where a Cartesian decompostion would't work, and currenly the only option to run such a case would be recompiling without 2DECOMP&FFT. There would be a similar issue when CUDA backend supports 2DECOMP&FFT as well.
It would be better to have an option like -DWITH_2DECOMP and then if present compiling and making it possible to use 2DECOMP&FFT optionally with runtime parameters. My suggestion below would be a solution I think.
p_row = par%nproc_dir(2) | ||
p_col = par%nproc_dir(3) | ||
if (p_row*p_col /= par%nproc) then | ||
error stop "Decomposition in X not supported by 2decomp&fft backend" | ||
end if | ||
periodic_bc(:) = grid%periodic_BC(:) | ||
call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc) | ||
|
||
! Get global_ranks | ||
allocate(global_ranks(1, p_row, p_col)) | ||
allocate(global_ranks_lin(p_row*p_col)) | ||
global_ranks_lin(:) = 0 | ||
|
||
call MPI_Comm_rank(DECOMP_2D_COMM_CART_X, cart_rank, ierr) | ||
call MPI_Cart_coords(DECOMP_2D_COMM_CART_X, cart_rank, 2, coords, ierr) | ||
|
||
global_ranks_lin(coords(1)+1 + p_row*(coords(2))) = par%nrank | ||
|
||
call MPI_Allreduce(MPI_IN_PLACE, global_ranks_lin, p_row*p_col, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr) | ||
|
||
global_ranks = reshape(global_ranks_lin, shape=[1, p_row, p_col]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit is the only difference in the decomposition strategy we need to sort out when using 2DECOMP&FFT. The custom strategy we follow is very straightforward as in
Lines 226 to 227 in 9c39613
global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], & | |
shape=[nproc_x, nproc_y, nproc_z]) |
All we need from 2DECOMP&FFT is the rank mapping so that we can use it instead of our custom rank mapping, and this can be obtained after instantiating the library with global grid dimensions, decomposition as user sets in the input file, and the periodicity in BCs.
Because all this prerequisites to instantiate 2DECOMP&FFT are known from the very beginning of the program (from the input file), there is a possibility that we instantiate the library somewhere else, before the mesh class is instantiated.
Instantiating 2DECOMP&FFT outside of mesh class would make things easier, because instead of working out the rank mapping inside the mesh class which requires a relatively complex structure, we can pass the rank mapping as an input argument and then here in the higlighted bit use this simple input array to carry on with all we do in the mesh class.
I think this would simply the structure quite a lot, what do you think?
integer, private :: sz | ||
type(geo_t), allocatable :: geo ! object containing geometry information | ||
type(parallel_t), allocatable :: par ! object containing parallel domain decomposition information | ||
class(grid_t), allocatable :: grid ! object containing grid information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this new grid is great, neatly packs all the relevant stuff. It doesn't need to be class though, type should be enough.
And what do you think about defining these types inside mesh.f90 instead of its own new file? I think it would be tidier in mesh.f90.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with defining these inside mesh.f90, but that isn't possible due to limitations of the compilers. They flag circular dependencies when doing so (even if there aren't any) just because they compile mesh.f90 at once and can't compile the modules independently. This is the primary reason why all the contents of the mesh object have to be in a separate module (for actual circular dependency) and in a separate file (compiler limitation).
global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], & | ||
shape=[nproc_x, nproc_y, nproc_z]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we pass the rank mapping as an input to mesh type we can set the global_ranks
to this input and the rest should be fine.
You can turn on/off the option by using
Feel free to include this in your PR if you are happy with the behaviour. |
closes #54