Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use libCEED for element-wise error estimate integrals #219

Merged
merged 9 commits into from
Apr 4, 2024

Conversation

sebastiangrimberg
Copy link
Contributor

@sebastiangrimberg sebastiangrimberg commented Mar 25, 2024

Resolves #213. Full error estimate calculation now takes place on GPU without host-device transfer.

Performance comparison for estimation time (the loop over mesh elements) on a particularly large example problem (on CPU):

  • main: 1209.6 s
  • This PR: 6.8 s

@sebastiangrimberg sebastiangrimberg added enhancement New feature or request performance Related to performance amr Related to adaptive mesh refinement (AMR) labels Mar 25, 2024
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-solve-dev branch from e952666 to 9db564b Compare March 26, 2024 21:34
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-libceed-dev branch from a845072 to 3b875d4 Compare March 26, 2024 21:34
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-solve-dev branch from 9db564b to 48be81b Compare March 26, 2024 22:30
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-libceed-dev branch from 3b875d4 to 2eb4ebe Compare March 26, 2024 22:30
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-solve-dev branch from 48be81b to 5c8863f Compare March 28, 2024 17:26
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-libceed-dev branch 2 times, most recently from 77f473a to b03405f Compare March 28, 2024 17:44
Base automatically changed from sjg/estimator-solve-dev to main April 3, 2024 20:39
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-libceed-dev branch from b03405f to f6a30b7 Compare April 3, 2024 21:00
Copy link
Collaborator

@hughcars hughcars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some minor suggestions on function signatures in integrator.cpp, pushed to hughcars/estimator-libceed-dev if of interest. There's some libceed function/files I don't think needed.

Comment on lines 126 to 135
if (input)
{
PalaceCeedCall(ceed,
CeedOperatorSetField(op, name.c_str(), restr, CEED_BASIS_NONE, v));
}
else
{
PalaceCeedCall(ceed,
CeedOperatorSetField(op, name.c_str(), restr, CEED_BASIS_NONE, v));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each branch of these input if is the same, so I think the branch can be removed. The only changing on input seems to occur on the name above. The same thing can be done with the true/false argument on the other method, with a double dispatch, and dropping the input bool from AddOperatorActiveFields:

void AddOperatorActiveInputFields(unsigned int ops, bool input, Ceed ceed,
                             CeedElemRestriction restr, CeedBasis basis, CeedOperator op,
                             std::string name = "u", CeedVector v = CEED_VECTOR_ACTIVE)
{
 AddOperatorActiveFields(ops, ceed, restr, basis, op, name, v);
}
void AddOperatorActiveOutputFields(unsigned int ops, bool input, Ceed ceed,
                             CeedElemRestriction restr, CeedBasis basis, CeedOperator op,
                             std::string name = "v", CeedVector v = CEED_VECTOR_ACTIVE)
{
 AddOperatorActiveFields(ops, ceed, restr, basis, op, name, v);
}

the call sites then drop the true/false similarly.

Copy link
Contributor Author

@sebastiangrimberg sebastiangrimberg Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted these in 70aabf6.

Comment on lines 25 to 27
void AddQFunctionActiveInputsOutputs(unsigned int ops, bool input, Ceed ceed,
CeedBasis basis, CeedQFunction qf,
std::string name = "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every call for these I can see has a compile time known input value, so this could be split to avoid the nested if statements. Then the call sites are more explicit about what's going on, without needing to check the signature for what the true/false is doing.

void AddQFunctionActiveInputs(unsigned int ops, Ceed ceed, CeedBasis basis,
                              CeedQFunction qf, std::string name = "u")
{
  // Add inputs or outputs with evaluation modes for the active vector of a QFunction.
  CeedInt num_comp;
  PalaceCeedCall(ceed, CeedBasisGetNumComponents(basis, &num_comp));
  if (ops & EvalMode::None)
  {
    PalaceCeedCall(ceed, CeedQFunctionAddInput(qf, name.c_str(), num_comp, CEED_EVAL_NONE));
  }
  if (ops & EvalMode::Interp)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_INTERP, &q_comp));
    PalaceCeedCall(
        ceed, CeedQFunctionAddInput(qf, name.c_str(), num_comp * q_comp, CEED_EVAL_INTERP));
  }
  if (ops & EvalMode::Grad)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_GRAD, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddInput(qf, (std::string("grad_") + name).c_str(),
                                               num_comp * q_comp, CEED_EVAL_GRAD));
  }
  if (ops & EvalMode::Div)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_DIV, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddInput(qf, (std::string("div_") + name).c_str(),
                                               num_comp * q_comp, CEED_EVAL_DIV));
  }
  if (ops & EvalMode::Curl)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_CURL, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddInput(qf, (std::string("curl_") + name).c_str(),
                                               num_comp * q_comp, CEED_EVAL_CURL));
  }
}

void AddQFunctionActiveOutputs(unsigned int ops, Ceed ceed, CeedBasis basis,
                               CeedQFunction qf, std::string name = "v")
{
  // Add inputs or outputs with evaluation modes for the active vector of a QFunction.
  CeedInt num_comp;
  PalaceCeedCall(ceed, CeedBasisGetNumComponents(basis, &num_comp));
  if (ops & EvalMode::None)
  {
    PalaceCeedCall(ceed,
                   CeedQFunctionAddOutput(qf, name.c_str(), num_comp, CEED_EVAL_NONE));
  }
  if (ops & EvalMode::Interp)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_INTERP, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddOutput(qf, name.c_str(), num_comp * q_comp,
                                                CEED_EVAL_INTERP));
  }
  if (ops & EvalMode::Grad)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_GRAD, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddOutput(qf, (std::string("grad_") + name).c_str(),
                                                num_comp * q_comp, CEED_EVAL_GRAD));
  }
  if (ops & EvalMode::Div)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_DIV, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddOutput(qf, (std::string("div_") + name).c_str(),
                                                num_comp * q_comp, CEED_EVAL_DIV));
  }
  if (ops & EvalMode::Curl)
  {
    CeedInt q_comp;
    PalaceCeedCall(ceed,
                   CeedBasisGetNumQuadratureComponents(basis, CEED_EVAL_CURL, &q_comp));
    PalaceCeedCall(ceed, CeedQFunctionAddOutput(qf, (std::string("curl_") + name).c_str(),
                                                num_comp * q_comp, CEED_EVAL_CURL));
  }
}

Copy link
Contributor Author

@sebastiangrimberg sebastiangrimberg Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted these in 70aabf6.

}
{
const CeedScalar u2_loc[3] = {u2[i + Q * 0], u2[i + Q * 1], u2[i + Q * 2]};
CeedScalar coeff[6], J_loc[9];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can coeff[6] be hoisted too? Provided the MultBAx33 calls are done before each load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "hoisted" do you mean put into the outer scope? I would think any compiler would not stack allocate two coeff[6] objects here since only one is in scope at a time. Is that not true? I hadn't thought about it but certainly moving it out could reduce local stack/register usage if that's not true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a sane compiler would notice, you're right with the scope. I was being dumb and thinking of a heap variable. Should be fine as is.

Comment on lines +46 to +48
CEED_QFUNCTION(f_apply_hdivhcurl_error_33)(void *__restrict__ ctx, CeedInt Q,
const CeedScalar *const *in,
CeedScalar *const *out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these two functions in here, given the coefficients are allowed to be discontinuous, I think the domain spaces are both L2d rather than hdiv and hcurl, for CurlFlux and GradFlux respectively. Judging by how these functions are all written however, I don't believe this matters? I.e. the discontinuity is coming from the coefficient which is evaluated after the basis at the quadrature points. Am I understanding that correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is addressing not the space of integration but just the space of the input functions for the interpolation/differentiation operations. That is, interpolate into physical space (requires the right Piola transformation), and only then multiply by the coefficient. Does that make sense? So the naming addresses the u space, not the space of C * u.

Comment on lines +10 to +12
CEED_QFUNCTION(f_apply_hcurlh1d_error_33)(void *__restrict__ ctx, CeedInt Q,
const CeedScalar *const *in,
CeedScalar *const *out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these functions are holdovers from when the grad flux estimation was h1d yes? Not sure they're still needed given they'd only be applicable if there was only a constant material property coefficient. I don't think specializing the estimation class on that assumption seems worth it.

Copy link
Contributor Author

@sebastiangrimberg sebastiangrimberg Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't needed but I figured they might be worth keeping in case down the line this type of element-wise integration is desired. There are a few other QFunctions that are also currently unused but left just for helpfulness later on.

CeedVector estimates_vec;
ceed::InitCeedVector(estimates, ceed, &estimates_vec);

// Do the integration (both input vectors are passive). For the complex case, add sum of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passive here means that neither is modified by the operations correct? so there's no possibility of race conditions/can be easily vectorized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, inputs are always read-only anyway. We have an operator d = f(u, v) which isn't nicely representable with a single active input vector. So both are just labeled as "passive" inputs.

@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-libceed-dev branch 2 times, most recently from 889860f to ebc2326 Compare April 4, 2024 18:14
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/estimator-libceed-dev branch from ebc2326 to 3678622 Compare April 4, 2024 18:19
@sebastiangrimberg sebastiangrimberg merged commit f066315 into main Apr 4, 2024
17 checks passed
@sebastiangrimberg sebastiangrimberg deleted the sjg/estimator-libceed-dev branch April 4, 2024 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
amr Related to adaptive mesh refinement (AMR) enhancement New feature or request performance Related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants