Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT for OMP backend (via 2decomp&fft) #113

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

Nanoseb
Copy link
Collaborator

@Nanoseb Nanoseb commented Jul 11, 2024

closes #54

@Nanoseb Nanoseb added core Issue affecting core mechanisms of the software omp Related to openMP backend labels Jul 11, 2024
@Nanoseb Nanoseb self-assigned this Jul 11, 2024
@Nanoseb
Copy link
Collaborator Author

Nanoseb commented Oct 22, 2024

@CFD-Xing I feel like you added a lot of unnecessary files in the last commit. Can you remove them?
CMakeCache.txt shouldn't be in the repository, same as CMakeDetermineCompilerABI_Fortran.bin or anything inside cmake/CMakeFiles/3.28.3 or cmake/CMakeFiles/CMakeConfigureLog.yaml and maybe more.

The build worked before, you just needed to build 2decomp&fft and then setup an environment variable named decomp2d_INCLUDE_DIRS or something like that where the installation folder was and it was picked up fine by x3d2. We also want to make sure 2decomp is still an optional dependency of x3d2 which doesn't seem to be the case anymore.

Jacques Xing and others added 2 commits October 22, 2024 12:02
@semi-h
Copy link
Member

semi-h commented Oct 22, 2024

@@ -78,7 +78,6 @@ if (${POISSON_SOLVER} STREQUAL "FFT" AND ${BACKEND} STREQUAL "OMP")
message(STATUS "Using the FFT poisson solver with 2decomp&fft")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake")
find_package(decomp2d REQUIRED)
include_directories(${decomp2d_INCLUDE_DIRS})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is required to set the include paths for the 2decomp modules when building x3d2

@Nanoseb Nanoseb marked this pull request as ready for review October 28, 2024 16:11
@Nanoseb
Copy link
Collaborator Author

Nanoseb commented Oct 28, 2024

Finally ready for reviews

else(decomp2d_FOUND)
message(STATUS "2decomp-fft PATH not available we'll try to download and install")
configure_file(${CMAKE_SOURCE_DIR}/cmake/decomp2d/downloadBuild2decomp.cmake.in decomp2d-build/CMakeLists.txt)
#message("Second CMAKE_GENERATOR ${CMAKE_GENERATOR}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

Comment on lines +17 to +26
#ExternalProject_Add(downloadBuild2decomp
# GIT_REPOSITORY "https://github.com/xcompact3d/2decomp-fft"
# GIT_TAG "main"
# CONFIGURE_COMMAND "cmake -S ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-src "
# BUILD_COMMAND ""
# INSTALL_COMMAND ""
# TEST_COMMAND ""
# SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/decomp2d-src"
# BINARY_DIR ""
# INSTALL_DIR ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code?

Comment on lines +421 to +423
!dims = size(x%data)
! Fix for size being stored wrongly into dims
dims = self%mesh%get_padded_dims(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment and/or replace with explanation (to prevent people reverting back if non-obvious)

type(allocator_t), pointer :: host_allocator
type(solver_t) :: solver
type(mesh_t), target :: mesh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be class based on src/allocator.f90?

@@ -47,7 +47,7 @@ program test_omp_transeq
L_global = [2*pi, 2*pi, 2*pi]

! Domain decomposition in each direction
nproc_dir = [nproc, 1, 1]
nproc_dir = [1, 1, nproc]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for this change? Should we consider running variations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is because 2decomp&fft only allows 2D decomposition and as it is implemented at the moment it is in y and z. I think that's also a limitation of the cuda backend, but I am not sure anymore.
That could likely be changed in the future though, whether this should be part of this PR or not I am not sure.

if (${POISSON_SOLVER} STREQUAL "FFT" AND ${BACKEND} STREQUAL "OMP")
list(APPEND SRC ${2DECOMPFFTSRC})
else()
list(APPEND SRC ${GENERICDECOMPSRC})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GENERIC do nothing? Will this be an issue for CUDA code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic does the decomposition fine, and should be used when using CUDA (or ITER). It may not be consistent with what 2decomp&fft does internally hence the need to have a separate one for it calling 2decomp directly.

@@ -64,7 +64,7 @@ module m_allocator
contains

function allocator_init(mesh, sz) result(allocator)
type(mesh_t), target, intent(inout) :: mesh
class(mesh_t), target, intent(inout) :: mesh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have mesh subtypes now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no indeed, that was a reminiscence of a previous implementation

@@ -0,0 +1,54 @@
module m_decomp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module could use some comments

Comment on lines +25 to +26
div_r = real(div_u(i, j, k), kind=dp)/(nx*ny*nz)
div_c = aimag(div_u(i, j, k))/(nx*ny*nz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the mesh is large, either

num / nx / ny / nz

or

num / (int(nx, int64) * int(ny, int64) * int(nz, int64)

to avoid overflows

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, the CUDA implementation might need the same change as I took most of this from there.

src/poisson_fft.f90 Show resolved Hide resolved
Copy link
Member

@semi-h semi-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like when x3d2 is set up with 2DECOMP&FFT support the decomposition depends on 2DECOMP&FFT and can't be changed at runtime. When the iterative solver is in the mainline we'll have cases where a Cartesian decompostion would't work, and currenly the only option to run such a case would be recompiling without 2DECOMP&FFT. There would be a similar issue when CUDA backend supports 2DECOMP&FFT as well.

It would be better to have an option like -DWITH_2DECOMP and then if present compiling and making it possible to use 2DECOMP&FFT optionally with runtime parameters. My suggestion below would be a solution I think.

Comment on lines +52 to +72
p_row = par%nproc_dir(2)
p_col = par%nproc_dir(3)
if (p_row*p_col /= par%nproc) then
error stop "Decomposition in X not supported by 2decomp&fft backend"
end if
periodic_bc(:) = grid%periodic_BC(:)
call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc)

! Get global_ranks
allocate(global_ranks(1, p_row, p_col))
allocate(global_ranks_lin(p_row*p_col))
global_ranks_lin(:) = 0

call MPI_Comm_rank(DECOMP_2D_COMM_CART_X, cart_rank, ierr)
call MPI_Cart_coords(DECOMP_2D_COMM_CART_X, cart_rank, 2, coords, ierr)

global_ranks_lin(coords(1)+1 + p_row*(coords(2))) = par%nrank

call MPI_Allreduce(MPI_IN_PLACE, global_ranks_lin, p_row*p_col, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr)

global_ranks = reshape(global_ranks_lin, shape=[1, p_row, p_col])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is the only difference in the decomposition strategy we need to sort out when using 2DECOMP&FFT. The custom strategy we follow is very straightforward as in

x3d2/src/mesh.f90

Lines 226 to 227 in 9c39613

global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], &
shape=[nproc_x, nproc_y, nproc_z])

All we need from 2DECOMP&FFT is the rank mapping so that we can use it instead of our custom rank mapping, and this can be obtained after instantiating the library with global grid dimensions, decomposition as user sets in the input file, and the periodicity in BCs.

Because all this prerequisites to instantiate 2DECOMP&FFT are known from the very beginning of the program (from the input file), there is a possibility that we instantiate the library somewhere else, before the mesh class is instantiated.

Instantiating 2DECOMP&FFT outside of mesh class would make things easier, because instead of working out the rank mapping inside the mesh class which requires a relatively complex structure, we can pass the rank mapping as an input argument and then here in the higlighted bit use this simple input array to carry on with all we do in the mesh class.

I think this would simply the structure quite a lot, what do you think?

integer, private :: sz
type(geo_t), allocatable :: geo ! object containing geometry information
type(parallel_t), allocatable :: par ! object containing parallel domain decomposition information
class(grid_t), allocatable :: grid ! object containing grid information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this new grid is great, neatly packs all the relevant stuff. It doesn't need to be class though, type should be enough.
And what do you think about defining these types inside mesh.f90 instead of its own new file? I think it would be tidier in mesh.f90.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with defining these inside mesh.f90, but that isn't possible due to limitations of the compilers. They flag circular dependencies when doing so (even if there aren't any) just because they compile mesh.f90 at once and can't compile the modules independently. This is the primary reason why all the contents of the mesh object have to be in a separate module (for actual circular dependency) and in a separate file (compiler limitation).

Comment on lines -226 to -227
global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], &
shape=[nproc_x, nproc_y, nproc_z])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we pass the rank mapping as an input to mesh type we can set the global_ranks to this input and the rest should be fine.

src/poisson_fft.f90 Show resolved Hide resolved
@CFD-Xing
Copy link
Contributor

@Nanoseb

I think I have a strategy to force the re-build of decomp2d-fft, if desired. Here is the cmake

# - Find the 2decomp-fft library
if (rebuild_decom2d)
    message("Re-build 2decomp-fft")
    execute_process(COMMAND rm -rf ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build)
    set(decomp2d_FOUND FALSE)
else()
    find_package(decomp2d
                 PATHS ${CMAKE_SOURCE_DIR}/decomp2d/build)
endif()

if (decomp2d_FOUND)
  message(STATUS "2decomp-fft FOUND")
else(decomp2d_FOUND)
  message(STATUS "2decomp-fft PATH not available we'll try to download and install")
  configure_file(${CMAKE_SOURCE_DIR}/cmake/decomp2d/downloadBuild2decomp.cmake.in decomp2d-build/CMakeLists.txt)
  execute_process(COMMAND ${CMAKE_COMMAND} -G "${CMAKE_GENERATOR}" .
          RESULT_VARIABLE result
          WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build )
  if(result)
      message(FATAL_ERROR "CMake step for 2decomp-fft failed: ${result}")
  else()
      message("CMake step for 2decomp-fft completed (${result}).")
  endif()
  execute_process(COMMAND ${CMAKE_COMMAND} --build .
         RESULT_VARIABLE result
          WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build )
  if(result)
      message(FATAL_ERROR "Build step for 2decomp-fft failed: ${result}")
  endif()
  set(D2D_ROOT ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build/downloadBuild2decomp-prefix/src/downloadBuild2decomp-build)
  find_package(decomp2d REQUIRED
          PATHS ${D2D_ROOT})
endif(decomp2d_FOUND)

You can turn on/off the option by using

cmake .. -DCMAKE_BUILD_TYPE=Debug -Drebuild_decom2d=ON

cmake .. -DCMAKE_BUILD_TYPE=Debug -Drebuild_decom2d=OFF

Feel free to include this in your PR if you are happy with the behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issue affecting core mechanisms of the software omp Related to openMP backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement FFT Poisson solver (OMP)
4 participants