Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add runner with Ubuntu on ARM64 #634

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

mmuetzel
Copy link
Contributor

GitHub started hosting runners with Ubuntu on ARM64 processors for open source projects for free:
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/

Add one configuration that is using these runners to the build matrix.

According to their blog post, the arm64 runners are more efficient and potentially faster than the x86_64 runners. (But it is still in a preview phase and maybe it will take some time for them to better scale and balance the load.) If that turns out to be true, it should be easy to switch more configurations in that workflow to the arm64 runners (and maybe keep only one or two running on x86_64).

@mmuetzel
Copy link
Contributor Author

Oof. More tests are failing on that runner than I would have hoped.
At least some of the failing tests (H1BasisEvaluation, SD_H1BasisEvaluation, pointload2) also fail on macOS (Apple Silicon). Maybe, there are some assumptions somewhere that only hold for Intel/AMD processors?

I don't have physical access to ARM64 hardware. Not sure if I could help track down any of that.
Maybe, valgrind or some sanitizers would be able to find something that is odd also on Intel/AMD?

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

@mmuetzel
Copy link
Contributor Author

Have you checked it's not a stack size problem ?

I might be wrong. But wouldn't that crash the program?
It looks like the numerics don't behave as expected. E.g., the last time steps before the test ConstantParamFunc failed:

  MAIN: 
  MAIN: -------------------------------------
  MAIN: Time: 34/50:   6.800E+00
  MAIN: -------------------------------------
  MAIN: 
  HeatSolver: Solving the energy equation for temperature
  HeatSolve: 
  HeatSolve: 
  HeatSolve: -------------------------------------
  HeatSolve:  TEMPERATURE ITERATION           1
  HeatSolve: -------------------------------------
  HeatSolve: 
  HeatSolve: Starting Assembly...
  HeatSolve: Assembly:
  HeatSolve: Assembly done
  ComputeChange: NS (ITER=1) (NRM,RELC): ( 0.16009565     0.62252546E-01 ) :: heat equation
  HeatSolve: iter:    1 Assembly: (s)    0.00    0.00
  HeatSolve: iter:    1 Solve:    (s)    0.00    0.00
  HeatSolve:  Result Norm   :   0.16009565005742657
  HeatSolve:  Relative Change :    6.2252546152800785E-002
  MAIN: 
  MAIN: -------------------------------------
  MAIN: Time: 35/50:   7.000E+00
  MAIN: -------------------------------------
  MAIN: 
  HeatSolver: Solving the energy equation for temperature
  HeatSolve: 
  HeatSolve: 
  HeatSolve: -------------------------------------
  HeatSolve:  TEMPERATURE ITERATION           1
  HeatSolve: -------------------------------------
  HeatSolve: 
  HeatSolve: Starting Assembly...
  HeatSolve: Assembly:
  HeatSolve: Assembly done
  ERROR:: IterSolve: Numerical Error: System diverged over maximum tolerance.

I don't know what could be causing that though...

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

GitHub started hosting runners with Ubuntu on ARM64 processors for open
source projects for free:
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/

Add one configuration that is using these runners to the build matrix.
@mmuetzel
Copy link
Contributor Author

I switched the iterator on the ConstParamFunc, ConvergenceControl, HeatControlExplicit, TransientCostFourHeater and fsi_box tests (in devel now).

Thank you for looking into this.

I rebased this PR on top of your latest changes. Let's see if this will make a difference.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Jan 23, 2025

That doesn't seem to have made much difference.
Still the following test failures:

The following tests FAILED:
	126 - ConstantParamFunc (Failed)                        serial
	155 - ConvergenceControl (Failed)                       serial transient
	292 - H1BasisEvaluation (Failed)                        benchmark serial
	301 - HeatControlExplicit (Failed)                      control quick serial
	433 - OptimizeSimplexFourHeaters (Failed)               control serial
	434 - OptimizeSimplexFourHeatersInt (Failed)            control serial
	540 - SD_H1BasisEvaluation (Failed)                     benchmark serendipity serial
	680 - TransientCostFourHeaters (Failed)                 serial
	801 - fsi_box (Failed)                                  elasticsolve fsi serial transient
	920 - pointload2 (Failed)                               serial

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

@juharu
Copy link
Contributor

juharu commented Jan 23, 2025 via email

@mmuetzel
Copy link
Contributor Author

Can I somehow misuse the git "push" triggered "Action" to do debugging (without waiting for too long) ?

Debugging using only GitHub actions is pretty tedious unfortunately. Unfortunately, I'm not aware of any option to log into the runners and run commands manually.
I would be very interested if someone finds out how to do that.

The next "best" thing (but still pretty bad) what I mostly resort to is:

  • Fork the repository to my own user account.
  • Enable running actions on that fork.
  • Optionally, disable any workflows that I'm currently not interested in. (This can be done on the "three-dot-menu" after selecting the respective workflow on the "Actions" tab.)
  • Optionally, modify the "configure" step of the workflow file in that fork to disable some build options (e.g., the GUI) to reduce the build time.
  • Optionally, modify the "check" step of the workflow file in that fork to reduce the number of tests that are being run.
  • Optionally, modify the source files in that fork to output more intermediate results.
  • Then iterate until there is some useful information.

That is still quite tedious and time consuming. It would be much easier if I knew how to stop in a debugger or anything like that in a GitHub action.

@juharu
Copy link
Contributor

juharu commented Jan 24, 2025 via email

@juharu
Copy link
Contributor

juharu commented Jan 24, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants