FFT for OMP backend (via 2decomp&fft) #113

Nanoseb · 2024-07-11T16:02:05Z

closes #54

Nanoseb · 2024-10-22T10:32:41Z

@CFD-Xing I feel like you added a lot of unnecessary files in the last commit. Can you remove them?
CMakeCache.txt shouldn't be in the repository, same as CMakeDetermineCompilerABI_Fortran.bin or anything inside cmake/CMakeFiles/3.28.3 or cmake/CMakeFiles/CMakeConfigureLog.yaml and maybe more.

The build worked before, you just needed to build 2decomp&fft and then setup an environment variable named decomp2d_INCLUDE_DIRS or something like that where the installation folder was and it was picked up fine by x3d2. We also want to make sure 2decomp is still an optional dependency of x3d2 which doesn't seem to be the case anymore.

…mp2d_INCLUDE_DIRS=/path-to-2decomp-fft/build

semi-h · 2024-10-22T12:39:09Z

@CFD-Xing How does it work now compared to what we have in the ADIOS2 branch?

https://github.com/xcompact3d/x3d2/blob/jq/implement-io/.gitmodules
https://github.com/xcompact3d/x3d2/blob/jq/implement-io/CMakeLists.txt

pbartholomew08 · 2024-10-22T12:40:58Z

src/CMakeLists.txt

@@ -78,7 +78,6 @@ if (${POISSON_SOLVER} STREQUAL "FFT" AND ${BACKEND} STREQUAL "OMP")
  message(STATUS "Using the FFT poisson solver with 2decomp&fft")
  set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake")
  find_package(decomp2d REQUIRED)
-  include_directories(${decomp2d_INCLUDE_DIRS})


This line is required to set the include paths for the 2decomp modules when building x3d2

…ecomp-integration

Nanoseb · 2024-10-28T16:11:41Z

Finally ready for reviews

pbartholomew08 · 2024-10-28T16:14:03Z

cmake/Finddecomp2d.cmake

+else(decomp2d_FOUND)
+  message(STATUS "2decomp-fft PATH not available we'll try to download and install")
+  configure_file(${CMAKE_SOURCE_DIR}/cmake/decomp2d/downloadBuild2decomp.cmake.in decomp2d-build/CMakeLists.txt)
+  #message("Second CMAKE_GENERATOR ${CMAKE_GENERATOR}") 


Remove commented code

pbartholomew08 · 2024-10-28T16:14:43Z

cmake/decomp2d/downloadBuild2decomp.cmake.in

+#ExternalProject_Add(downloadBuild2decomp
+#    GIT_REPOSITORY    "https://github.com/xcompact3d/2decomp-fft"
+#    GIT_TAG           "main"
+#    CONFIGURE_COMMAND "cmake -S ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-src "
+#    BUILD_COMMAND     ""
+#    INSTALL_COMMAND   ""
+#    TEST_COMMAND      ""
+#    SOURCE_DIR        "${CMAKE_CURRENT_BINARY_DIR}/decomp2d-src"
+#    BINARY_DIR        ""
+#    INSTALL_DIR       ""


Remove commented code?

pbartholomew08 · 2024-10-28T16:16:11Z

src/omp/backend.f90

+    !dims = size(x%data)
+    ! Fix for size being stored wrongly into dims
+    dims = self%mesh%get_padded_dims(x)


Remove comment and/or replace with explanation (to prevent people reverting back if non-obvious)

pbartholomew08 · 2024-10-28T16:17:02Z

src/xcompact.f90

  type(allocator_t), pointer :: host_allocator
  type(solver_t) :: solver
+  type(mesh_t), target :: mesh


Should this be class based on src/allocator.f90?

pbartholomew08 · 2024-10-28T16:17:41Z

tests/omp/test_omp_transeq.f90

@@ -47,7 +47,7 @@ program test_omp_transeq
  L_global = [2*pi, 2*pi, 2*pi]

  ! Domain decomposition in each direction
-  nproc_dir = [nproc, 1, 1]
+  nproc_dir = [1, 1, nproc]


Any particular reason for this change? Should we consider running variations?

Yes, this is because 2decomp&fft only allows 2D decomposition and as it is implemented at the moment it is in y and z. I think that's also a limitation of the cuda backend, but I am not sure anymore.
That could likely be changed in the future though, whether this should be part of this PR or not I am not sure.

pbartholomew08 · 2024-10-28T16:22:57Z

src/CMakeLists.txt

+if (${POISSON_SOLVER} STREQUAL "FFT" AND ${BACKEND} STREQUAL "OMP")
+  list(APPEND SRC ${2DECOMPFFTSRC})
+else()
+  list(APPEND SRC ${GENERICDECOMPSRC})


Does GENERIC do nothing? Will this be an issue for CUDA code?

Generic does the decomposition fine, and should be used when using CUDA (or ITER). It may not be consistent with what 2decomp&fft does internally hence the need to have a separate one for it calling 2decomp directly.

pbartholomew08 · 2024-10-28T16:23:44Z

src/allocator.f90

@@ -64,7 +64,7 @@ module m_allocator
 contains

  function allocator_init(mesh, sz) result(allocator)
-    type(mesh_t), target, intent(inout) :: mesh
+    class(mesh_t), target, intent(inout) :: mesh


Do we have mesh subtypes now?

no indeed, that was a reminiscence of a previous implementation

pbartholomew08 · 2024-10-28T16:38:06Z

src/decomp.f90

@@ -0,0 +1,54 @@
+module m_decomp


Module could use some comments

pbartholomew08 · 2024-10-28T16:40:32Z

src/omp/kernels/spectral_processing.f90

+          div_r = real(div_u(i, j, k), kind=dp)/(nx*ny*nz)
+          div_c = aimag(div_u(i, j, k))/(nx*ny*nz)


In case the mesh is large, either

num / nx / ny / nz

or

num / (int(nx, int64) * int(ny, int64) * int(nz, int64)

to avoid overflows

good point, the CUDA implementation might need the same change as I took most of this from there.

src/poisson_fft.f90

semi-h

It looks like when x3d2 is set up with 2DECOMP&FFT support the decomposition depends on 2DECOMP&FFT and can't be changed at runtime. When the iterative solver is in the mainline we'll have cases where a Cartesian decompostion would't work, and currenly the only option to run such a case would be recompiling without 2DECOMP&FFT. There would be a similar issue when CUDA backend supports 2DECOMP&FFT as well.

It would be better to have an option like -DWITH_2DECOMP and then if present compiling and making it possible to use 2DECOMP&FFT optionally with runtime parameters. My suggestion below would be a solution I think.

semi-h · 2024-10-29T11:01:38Z

src/2decompfft/decomp.f90

+    p_row = par%nproc_dir(2)
+    p_col = par%nproc_dir(3)
+    if (p_row*p_col /= par%nproc) then
+      error stop "Decomposition in X not supported by 2decomp&fft backend"
+    end if
+    periodic_bc(:) = grid%periodic_BC(:)
+    call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc)
+
+    ! Get global_ranks
+    allocate(global_ranks(1, p_row, p_col))
+    allocate(global_ranks_lin(p_row*p_col))
+    global_ranks_lin(:) = 0
+
+    call MPI_Comm_rank(DECOMP_2D_COMM_CART_X, cart_rank, ierr)
+    call MPI_Cart_coords(DECOMP_2D_COMM_CART_X, cart_rank, 2, coords, ierr)
+
+    global_ranks_lin(coords(1)+1 + p_row*(coords(2))) = par%nrank
+
+    call MPI_Allreduce(MPI_IN_PLACE, global_ranks_lin, p_row*p_col, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr)
+
+    global_ranks = reshape(global_ranks_lin, shape=[1, p_row, p_col])


This bit is the only difference in the decomposition strategy we need to sort out when using 2DECOMP&FFT. The custom strategy we follow is very straightforward as in

x3d2/src/mesh.f90

Lines 226 to 227 in 9c39613

global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], &

shape=[nproc_x, nproc_y, nproc_z])

All we need from 2DECOMP&FFT is the rank mapping so that we can use it instead of our custom rank mapping, and this can be obtained after instantiating the library with global grid dimensions, decomposition as user sets in the input file, and the periodicity in BCs.

Because all this prerequisites to instantiate 2DECOMP&FFT are known from the very beginning of the program (from the input file), there is a possibility that we instantiate the library somewhere else, before the mesh class is instantiated.

Instantiating 2DECOMP&FFT outside of mesh class would make things easier, because instead of working out the rank mapping inside the mesh class which requires a relatively complex structure, we can pass the rank mapping as an input argument and then here in the higlighted bit use this simple input array to carry on with all we do in the mesh class.

I think this would simply the structure quite a lot, what do you think?

semi-h · 2024-10-29T11:06:54Z

src/mesh.f90

    integer, private :: sz
    type(geo_t), allocatable :: geo ! object containing geometry information
-    type(parallel_t), allocatable :: par ! object containing parallel domain decomposition information
+    class(grid_t), allocatable :: grid ! object containing grid information


I think this new grid is great, neatly packs all the relevant stuff. It doesn't need to be class though, type should be enough.
And what do you think about defining these types inside mesh.f90 instead of its own new file? I think it would be tidier in mesh.f90.

I agree with defining these inside mesh.f90, but that isn't possible due to limitations of the compilers. They flag circular dependencies when doing so (even if there aren't any) just because they compile mesh.f90 at once and can't compile the modules independently. This is the primary reason why all the contents of the mesh object have to be in a separate module (for actual circular dependency) and in a separate file (compiler limitation).

semi-h · 2024-10-29T11:08:36Z

src/mesh.f90

-    global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], &
-                           shape=[nproc_x, nproc_y, nproc_z])


So if we pass the rank mapping as an input to mesh type we can set the global_ranks to this input and the rest should be fine.

src/poisson_fft.f90

CFD-Xing · 2024-10-30T15:53:32Z

@Nanoseb

I think I have a strategy to force the re-build of decomp2d-fft, if desired. Here is the cmake

# - Find the 2decomp-fft library
if (rebuild_decom2d)
    message("Re-build 2decomp-fft")
    execute_process(COMMAND rm -rf ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build)
    set(decomp2d_FOUND FALSE)
else()
    find_package(decomp2d
                 PATHS ${CMAKE_SOURCE_DIR}/decomp2d/build)
endif()

if (decomp2d_FOUND)
  message(STATUS "2decomp-fft FOUND")
else(decomp2d_FOUND)
  message(STATUS "2decomp-fft PATH not available we'll try to download and install")
  configure_file(${CMAKE_SOURCE_DIR}/cmake/decomp2d/downloadBuild2decomp.cmake.in decomp2d-build/CMakeLists.txt)
  execute_process(COMMAND ${CMAKE_COMMAND} -G "${CMAKE_GENERATOR}" .
          RESULT_VARIABLE result
          WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build )
  if(result)
      message(FATAL_ERROR "CMake step for 2decomp-fft failed: ${result}")
  else()
      message("CMake step for 2decomp-fft completed (${result}).")
  endif()
  execute_process(COMMAND ${CMAKE_COMMAND} --build .
         RESULT_VARIABLE result
          WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build )
  if(result)
      message(FATAL_ERROR "Build step for 2decomp-fft failed: ${result}")
  endif()
  set(D2D_ROOT ${CMAKE_CURRENT_BINARY_DIR}/decomp2d-build/downloadBuild2decomp-prefix/src/downloadBuild2decomp-build)
  find_package(decomp2d REQUIRED
          PATHS ${D2D_ROOT})
endif(decomp2d_FOUND)

You can turn on/off the option by using

cmake .. -DCMAKE_BUILD_TYPE=Debug -Drebuild_decom2d=ON

cmake .. -DCMAKE_BUILD_TYPE=Debug -Drebuild_decom2d=OFF

Feel free to include this in your PR if you are happy with the behaviour.

Nanoseb added 4 commits June 24, 2024 14:04

initial tests with 2decomp&fft

00b8780

Merge github.com:xcompact3d/x3d2 into 2decomp-integration

00630df

use 2decomp for mesh parallel decomposition

afcd22c

add omp/mesh.f90 to cmakelists.txt

8a513a3

Nanoseb added core Issue affecting core mechanisms of the software omp Related to openMP backend labels Jul 11, 2024

Nanoseb self-assigned this Jul 11, 2024

Nanoseb mentioned this pull request Jul 11, 2024

Could we use MPI Cartesian Communicators? #90

Open

Nanoseb and others added 11 commits July 16, 2024 13:58

cleanup and fix compilation

c0fb05a

use number of cells as 2decomp input

9c113a4

fix size of field issue in vecadd_omp

2e6f86c

add spectral processing to omp poisson solver

8d1b528

restructure code to have 2decomp&fft as optional dependency

eb3883b

Merge github.com:xcompact3d/x3d2 into 2decomp-integration

9b9bc35

rename par_grid file to mesh_content

8132b2b

move geo to mesh_content

7901035

disable poisson solver in CI

af250cc

revert back ci config

31a2035

Add decomp2d to cmake

fc19718

Jacques Xing and others added 2 commits October 22, 2024 12:02

Remove unecessary file

151eea7

Add provision to use pre-built decomp2d library using cmake .. -Ddeco…

c0649d2

…mp2d_INCLUDE_DIRS=/path-to-2decomp-fft/build

pbartholomew08 reviewed Oct 22, 2024

View reviewed changes

Jacques Xing and others added 6 commits October 22, 2024 13:52

partially revert last commit

3bd7540

Add back "include_directories(${decomp2d_INCLUDE_DIRS})"

56f2b44

fix build of 2decomp module

c1637a1

fix tests to remove decomposition in x

01403a9

Merge branch '2decomp-integration' of github.com:Nanoseb/x3d2 into 2d…

ea2218e

…ecomp-integration

throw error when using decomposition in X

b75fde0

Nanoseb added 3 commits October 24, 2024 18:29

remove debug print

846fffc

add test of FFT

51838dc

bump ubuntu version to 22.04

8a71c8f

Nanoseb marked this pull request as ready for review October 28, 2024 16:11

Nanoseb requested a review from pbartholomew08 October 28, 2024 16:11

add comment and code cleanup

b36cb75

pbartholomew08 reviewed Oct 28, 2024

View reviewed changes

semi-h reviewed Oct 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT for OMP backend (via 2decomp&fft) #113

FFT for OMP backend (via 2decomp&fft) #113

Nanoseb commented Jul 11, 2024

Nanoseb commented Oct 22, 2024

semi-h commented Oct 22, 2024

pbartholomew08 Oct 22, 2024

Nanoseb commented Oct 28, 2024

pbartholomew08 Oct 28, 2024

pbartholomew08 Oct 28, 2024

pbartholomew08 Oct 28, 2024

pbartholomew08 Oct 28, 2024

pbartholomew08 Oct 28, 2024

Nanoseb Oct 28, 2024

pbartholomew08 Oct 28, 2024

Nanoseb Oct 28, 2024

pbartholomew08 Oct 28, 2024

Nanoseb Oct 28, 2024

pbartholomew08 Oct 28, 2024

pbartholomew08 Oct 28, 2024

Nanoseb Oct 28, 2024

semi-h left a comment

semi-h Oct 29, 2024

semi-h Oct 29, 2024

Nanoseb Oct 29, 2024

semi-h Oct 29, 2024

CFD-Xing commented Oct 30, 2024

		div_r = real(div_u(i, j, k), kind=dp)/(nxnynz)
		div_c = aimag(div_u(i, j, k))/(nxnynz)

	global_ranks = reshape([(i, i=0, mesh%par%nproc - 1)], &
	shape=[nproc_x, nproc_y, nproc_z])

FFT for OMP backend (via 2decomp&fft) #113

Are you sure you want to change the base?

FFT for OMP backend (via 2decomp&fft) #113

Conversation

Nanoseb commented Jul 11, 2024

Nanoseb commented Oct 22, 2024

semi-h commented Oct 22, 2024

Choose a reason for hiding this comment

Nanoseb commented Oct 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

semi-h left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CFD-Xing commented Oct 30, 2024