-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting assembly to the GPU #696
Comments
The most complex part to resolve will be the well assembly, which includes solving the well equations in isolation. Maybe it is a good idea to do the reservoir assembly on GPU first, as that is simpler in structure but more time consuming. For doing that, I think your idea applies. It looks like a very large refactoring, with a complete change to the way assembly is done. What about starting from the other end, by evaluating cell properties (i.e. intensive quantities cache) on the GPU? That is an important subsystem, and there are no dependencies between cells, so it can parallelize perfectly. |
Is the reservoir assembly the whole grid and resulting large matrix? Where is that cache located/computed? Also is it named 'cache' because it stores a lot of data that is used often, or because it takes CPU cache hierarchy into account? |
Yes.
For a multisegment well, these and many other functions are used. Basically all of the WellInterface* StandardWell* and MultisegmentWell* classes are involved in well assembly (on an individual well basis), managed on a higher level (collecting the contributions of all wells) by the BlackoilWellModel class.
Main function call triggering recalculation of all the values is invalidateAndUpdateIntensiveQuantities(). The function is in
The first. It does not take CPU cache considerations into account. |
We might start trying to accelerate (parts of) the assembly with a GPU at some point.
My initial plan was to:
However, we should avoid having two different codebases with the same functionality. Changes to one would need to be made to the other.
Maybe code-generation could help us here.
CUDA does support templates, but OpenCL does not.
Especially the
Evalution
template is used a lot.I did some profiling on an older version (Jan 2021) and found this:
The numbers are timings in seconds, for a small sample of NORNE.
Some of these functions are combined into
updateAll()
.The locallinearizer is called for every element, for every assembly.
Each OpenMP thread has 1 elementcontext, which 'shifts' to the next element, requiring resizing of internal variables. Maybe updating the stencil is not needed on the GPU, if a context could be kept for every element.
It probably is not worthwhile to only port some of these functions to the GPU. It should be all of them.
There are also many optional functions to consider, like the extra modules, or options like
enableDissolvedGas()
andenableVaporizedOil()
.@atgeirr
The text was updated successfully, but these errors were encountered: