Poisson solver #32

semi-h · 2024-02-09T10:52:36Z

Poisson solver is the final part we need for running simulations with x3d2. The plan is to have an FFT based direct solver and use it whenever we can, and also have a CG based iterative solver. So I came up with this structure to have both methdos and decide which one to run at run time.

Few important considerations so that the structure makes more sense:

FFT based solver is implemented in a dedicated class/module
- This will make it easy to switch to 2DECOMP&FFT later on
- Current pencil arrangement is already compatible with 2DECOMP&FFT
FFT based poisson solver class is instantiated by the backend.
- Different backends will use different libraries for running the FFT, I think this makes everything easy.
- Alternative would be instantiating the poisson_fft class in solver, but I think it would make things uglier.
A matrix free CG based solver requires to evaluate a Laplacian operation.
- I think it is a good idea to implement a laplacian subroutine in the solver class. We might require this in the solver later on just like gradient and divergence operations.
- So a CG solver will basically call a laplacian subroutine in solver class, and few other stuff provided by backends. This is why it doesn't require its own class like the fft based solver.
- @pbartholomew08, I actually focused on a standard CG method, can you comment on a preconditioned CG method and how similar or different it is compared to a standard CG implementation?

Closes #55

pbartholomew08 · 2024-02-09T10:58:22Z

To get good performance (at scale) you need a preconditioner. The preconditioner is typically based on an actual matrix, e.g. a low order discretisation. I'm working on an implementation of this here 3decomp/poissbox#20, once I've got the preconditioner sorted I'll port it over to x3d2. The high-order Laplacian is working and a star-based preconditioner is implemented, but it's not currently effective.

I don't think we want to implement CG ourselves though!

Note that the "serial" aspect of the linked PR is only due to the TDMA implementation I'm using to keep things simple, the CG and AMG preconditioner are parallel, and the solver can be run using the low-order discretisation in parallel.

pbartholomew08 · 2024-02-09T11:03:10Z

* A matrix free CG based solver requires to evaluate a Laplacian operation.
  
  * I think it is a good idea to implement a laplacian subroutine in the solver class. We might require this in the solver later on just like gradient and divergence operations.

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

src/cuda/backend.f90

src/poisson_fft.f90

src/common.f90

JamieJQuinn · 2024-02-09T11:21:19Z

The overall structure looks good. I like the simple calling code in the solve. Seems unlikely that the CG solver won't require some backend-specific stuff (so might need its own class) but the generic procedure can handle that case well, and can extend to other options if needed.

semi-h · 2024-02-09T11:54:56Z

I don't think we want to implement CG ourselves though!

At the current stage the codebase is only able to pick between two dummy functions, poisson_fft or poisson_cg at runtime via the function pointers. I think poisson_cg can then issue more sophisticated calls to PETSc for example. If you think this looks good I'm happy to leave poisson_cg as is so that you can port your stuff when ready. My plan was to complete only the fft solver in this PR so that we can test the codebase in full, but just wanted to implement a way to pick between two methods we'll definitely have in the codebase soon.

semi-h · 2024-02-09T11:59:22Z

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

semi-h · 2024-02-09T12:10:24Z

The overall structure looks good. I like the simple calling code in the solve. Seems unlikely that the CG solver won't require some backend-specific stuff (so might need its own class) but the generic procedure can handle that case well, and can extend to other options if needed.

Actually we can access the backend stuff from the solver, all methods in solver do it already. For example a naive CG implementation would only require a laplacian and vector squares I guess.
And technically we could add the FFT related stuff into the backends and not require seperate class as well, but I didn't do it for two reasons.

FFT based solver requires lots of parameters, settings, complex arrays..., it would make the backends complicated. Because it comes with bunch of complicated stuff that only relates to an FFT based approach, it made sense to me to separate it into its own class and keep the complexity away from backend. (all this is not in the PR yet)
If we switch to a third-party library (2DECOMP&FFT), then we obviously can't call things from our backend, and all those stuff would have to be removed in case of a switch, and also having a separate class now should help later on when switching I think.

But if we're doing the CG stuff via PETSc, then its very likely that we'll need a class to keep PETSc related complexity away from backends too.

pbartholomew08 · 2024-02-09T12:52:20Z

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

The pattern of execution is p (p grid) -> grad p (v grid) -> lapl p (p grid), because then this is consistent with the use of grad(p) in the momentum equations and the evaluation of div(u). I'll try just evaluating the Laplacian directly using 2nd derivatives, maybe this will work better.

semi-h · 2024-02-21T17:58:23Z

The only remaining part is the reorder functions operate on complex arrays. I think I can get them working tomorrow.

JamieJQuinn · 2024-02-26T12:15:33Z

This fixes #55

semi-h · 2024-03-01T11:36:01Z

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.
Also, the plans in the init function will be adopted to cuFFTMp, and the fft_forward and fft_backward functions will be simplified down to single line calls from cuFFTMp.

Only thing I want to add before we merge is a periodic call to curl and divergence functions to report enstrophy and divergence of velocity field as the simulation goes on. We need a simple inner product function in the backends to finalise that.

JamieJQuinn · 2024-03-01T11:39:31Z

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.

Can you mark these with a comment like "TODO remove after cuFFTMp" and create an issue for it?

semi-h · 2024-03-05T16:24:22Z

A call for last round of review folks!

The input settings in the main file is arranged so that it produces the result I shared in the slack channel. Basically a TGV from 0 to 20 on a 256^3 grid.

The output subroutine is not very good but for the time being useful to see if there is anything going bad with the TGV, because it prints the enstrophy. I suppose this will evolve a lot into more useful output stuff later.

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.

Can you mark these with a comment like "TODO remove after cuFFTMp" and create an issue for it?

They won't be being called once the cuFFTMp stuff is in and then like any redundant function they'll need to be removed. However we might want to compare performance of the two approaches so for a short while both the current FFT solver and cuFFTMp may coexist too. I think it'll be more clear with the cuFFTMp PR so not going ahead with TODO comments for now.

JamieJQuinn

Overall excellent. Needs comments + possible name changes for clarity & consistency. Might want tests but this PR is already large. Should be an issue if we decide not to put tests in here.

src/cuda/kernels/complex.f90

src/poisson_fft.f90

JamieJQuinn · 2024-03-05T17:37:21Z

src/cuda/poisson_fft.f90

+
+   end function init
+
+   subroutine fft_forward_cuda(self, f)


Would be good to have working tests for these before trying to implement cuFFTMP.

@semi-h implement in this PR or create an issue?

Actually, cuFFTMp provides 3D forward and inverse FFT transforms, and our functions here will only carry out the calls to cuFFTMp library so probably there is no value testing them.

For the forward/backward FFTs sure, no need to test individually. But we're currently not unit testing the Poisson solver at all. Since we have a working implementation, I think it would be easiest to implement a correct test of this working solver, then apply the same test to cuFFTMP.

If you don't already have one kicking around, the analytical test case I usually use for Poisson solve is solving $\nabla^2 p = -\sin(\pi x) \sin(\pi y)$ and comparing to analytical solution $p_{analytical}(x,y) = \sin (\pi x) \sin(\pi y)/(2\pi^2)$.

pbartholomew08

A few minor comments for consideration. Otherwise I agree with Jamie's points.

src/cuda/poisson_fft.f90

src/omp/poisson_fft.f90

src/solver.f90

pbartholomew08 · 2024-03-06T12:01:11Z

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

The pattern of execution is p (p grid) -> grad p (v grid) -> lapl p (p grid), because then this is consistent with the use of grad(p) in the momentum equations and the evaluation of div(u). I'll try just evaluating the Laplacian directly using 2nd derivatives, maybe this will work better.

Adding for reference (I think we've discussed this in meetings) the direct evaluation of the Laplacian using high-order methods for 2nd derivatives on the pressure grid seems to work well (as a standalone solver).

Nanoseb

I agree with Paul and Jamie's comments, just added a couple minor ones.

src/cuda/kernels/complex.f90

src/poisson_fft.f90

src/solver.f90

semi-h · 2024-03-06T17:28:57Z

Last commit fixed most issues raised.

There are a few comments about the reshape and complex_reorder functions, I admit that their naming scheme is terrible and there is a lot to do make them better, but there is a very good chance that cuFFTMp will work fine and they'll all be removed from the codebase entirely very soon. Because we expect their lifespan to be very short, I don't want to invest time to think about a proper naming scheme, adding comments, change the way we set threads/blocks for calling them, or testing them and willing to merge the PR as is. If we decide to keep them longer for any reason I'll be happy to implement all the suggestions.

JamieJQuinn

This is nearly done! Looks like there are a few minor things still needing addressed, and a proper test of the Poisson solver implemented.

src/solver.f90

JamieJQuinn · 2024-03-07T11:39:53Z

src/cuda/poisson_fft.f90

+
+   end function init
+
+   subroutine fft_forward_cuda(self, f)


For the forward/backward FFTs sure, no need to test individually. But we're currently not unit testing the Poisson solver at all. Since we have a working implementation, I think it would be easiest to implement a correct test of this working solver, then apply the same test to cuFFTMP.

If you don't already have one kicking around, the analytical test case I usually use for Poisson solve is solving $\nabla^2 p = -\sin(\pi x) \sin(\pi y)$ and comparing to analytical solution $p_{analytical}(x,y) = \sin (\pi x) \sin(\pi y)/(2\pi^2)$.

semi-h · 2024-03-07T12:01:33Z

Its a good idea to have a unit test for the Poisson solver, but I don't want to implement it here as the PR is already large. Opened an issue to deal with it later on #74.

JamieJQuinn

OK, I'm happy with the TGV showing the Poisson solver works at this stage.

semi-h requested review from pbartholomew08, JamieJQuinn, Nanoseb and slaizet February 9, 2024 10:53

JamieJQuinn requested changes Feb 9, 2024

View reviewed changes

src/cuda/backend.f90 Outdated Show resolved Hide resolved

src/poisson_fft.f90 Outdated Show resolved Hide resolved

src/common.f90 Outdated Show resolved Hide resolved

semi-h force-pushed the poisson branch 2 times, most recently from ecdb773 to bcbaf18 Compare February 19, 2024 16:51

semi-h force-pushed the poisson branch 2 times, most recently from adbc5ab to a251fa9 Compare February 22, 2024 17:31

semi-h force-pushed the poisson branch from a251fa9 to 6399546 Compare March 1, 2024 11:26

semi-h added 9 commits March 5, 2024 09:14

Add a Poisson solver selection mechanism.

56688de

Implement a dedicated class for the FFT based Poisson solver.

f037d11

Store compact scheme coefficients in tdsops class.

2fdb1b0

Store domain length and dx in dirps_t.

a306ee4

Change the way we initialise poisson_fft class.

c7ff92a

Construct the waves array for all periodic BCs.

fc3810f

Create cufft plans.

991bb92

Add postprocessing in spectral space.

2756ee7

Store plans at poisson_fft class level.

d5c159a

semi-h added 12 commits March 5, 2024 09:16

Fix allocation size of the waves array on GPU.

a7c25d9

Change shape of the c_x/y/z arrays.

95c1b36

Add postprocessing kernel call.

0c5d456

Add complex reordering functions.

aac5af2

Call complex reordering functions in forward and backward FFTs.

279666d

No need to initialise backends with dirps any more.

eaaf2c2

Cleanups.

a262347

Cleanup and style fix.

9824841

Add reshape subrotines for transposing pencil groups.

a5e2a1a

Use reshapes to carry out local transposes.

cbb2dcd

Change the Poisson solver selection mechanism.

95164c3

Add initial conditions for TGV.

68b47d0

pbartholomew08 mentioned this pull request Mar 5, 2024

Implement an iterative Poisson solver #69

Draft

Output enstrophy and divergence of u.

0a4be35

semi-h force-pushed the poisson branch from c424f12 to 0a4be35 Compare March 5, 2024 16:15

JamieJQuinn self-requested a review March 5, 2024 17:08

JamieJQuinn requested changes Mar 5, 2024

View reviewed changes

pbartholomew08 reviewed Mar 5, 2024

View reviewed changes

src/cuda/poisson_fft.f90 Show resolved Hide resolved

src/cuda/poisson_fft.f90 Show resolved Hide resolved

src/cuda/poisson_fft.f90 Show resolved Hide resolved

src/omp/poisson_fft.f90 Show resolved Hide resolved

src/solver.f90 Show resolved Hide resolved

Nanoseb reviewed Mar 6, 2024

View reviewed changes

src/cuda/kernels/complex.f90 Show resolved Hide resolved

src/poisson_fft.f90 Show resolved Hide resolved

src/solver.f90 Show resolved Hide resolved

Cleanups and comments.

f8a1ecd

JamieJQuinn requested changes Mar 7, 2024

View reviewed changes

Comment.

370d1c4

semi-h force-pushed the poisson branch from 1962734 to 370d1c4 Compare March 7, 2024 12:06

JamieJQuinn approved these changes Mar 7, 2024

View reviewed changes

semi-h merged commit e916fe8 into xcompact3d:main Mar 7, 2024
2 checks passed

semi-h mentioned this pull request Mar 29, 2024

Use 3D FFT transforms directly and drop the current 1D decomposition capable strategy #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poisson solver #32

Poisson solver #32

semi-h commented Feb 9, 2024 •

edited by Nanoseb

Loading

pbartholomew08 commented Feb 9, 2024 •

edited

Loading

pbartholomew08 commented Feb 9, 2024

JamieJQuinn commented Feb 9, 2024

semi-h commented Feb 9, 2024

semi-h commented Feb 9, 2024

semi-h commented Feb 9, 2024

pbartholomew08 commented Feb 9, 2024

semi-h commented Feb 21, 2024

JamieJQuinn commented Feb 26, 2024

semi-h commented Mar 1, 2024

JamieJQuinn commented Mar 1, 2024

semi-h commented Mar 5, 2024

JamieJQuinn left a comment

JamieJQuinn Mar 5, 2024

JamieJQuinn Mar 6, 2024

semi-h Mar 7, 2024

JamieJQuinn Mar 7, 2024

pbartholomew08 left a comment

pbartholomew08 commented Mar 6, 2024

Nanoseb left a comment

semi-h commented Mar 6, 2024

JamieJQuinn left a comment

JamieJQuinn Mar 7, 2024

semi-h commented Mar 7, 2024

JamieJQuinn left a comment

Poisson solver #32

Poisson solver #32

Conversation

semi-h commented Feb 9, 2024 • edited by Nanoseb Loading

pbartholomew08 commented Feb 9, 2024 • edited Loading

pbartholomew08 commented Feb 9, 2024

JamieJQuinn commented Feb 9, 2024

semi-h commented Feb 9, 2024

semi-h commented Feb 9, 2024

semi-h commented Feb 9, 2024

pbartholomew08 commented Feb 9, 2024

semi-h commented Feb 21, 2024

JamieJQuinn commented Feb 26, 2024

semi-h commented Mar 1, 2024

JamieJQuinn commented Mar 1, 2024

semi-h commented Mar 5, 2024

JamieJQuinn left a comment

Choose a reason for hiding this comment

JamieJQuinn Mar 5, 2024

Choose a reason for hiding this comment

JamieJQuinn Mar 6, 2024

Choose a reason for hiding this comment

semi-h Mar 7, 2024

Choose a reason for hiding this comment

JamieJQuinn Mar 7, 2024

Choose a reason for hiding this comment

pbartholomew08 left a comment

Choose a reason for hiding this comment

pbartholomew08 commented Mar 6, 2024

Nanoseb left a comment

Choose a reason for hiding this comment

semi-h commented Mar 6, 2024

JamieJQuinn left a comment

Choose a reason for hiding this comment

JamieJQuinn Mar 7, 2024

Choose a reason for hiding this comment

semi-h commented Mar 7, 2024

JamieJQuinn left a comment

Choose a reason for hiding this comment

semi-h commented Feb 9, 2024 •

edited by Nanoseb

Loading

pbartholomew08 commented Feb 9, 2024 •

edited

Loading