Add Buildkite GPU pipeline #411

rkierulf · 2024-06-17T20:06:14Z

This is an initial implementation for creating a Buildkite pipeline to test on all 4 GPU backends (CUDA, AMD, Metal and oneAPI). As discussed with Carlos, I simplified the KomaMRICore tests so there are no longer separate tests for CPU single-thread, CPU multi-thread, and GPU. Instead, the tests now look to see if there is a test argument for "AMDGPU", "CUDA", "Metal", or "oneAPI", and if so load the corresponding package, which will trigger loading the corresponding ext module from KomaMRICore. The tests no longer set anything for the "Nthreads" parameter, so they can be made to run with a specific number of CPU threads by passing the julia argument --threads= to Pkg.test(). For now, the SimpleMotion and ArbitraryMotion tests still have the :skipci tag since there is at least one issue with SimpleMotion on Metal that has not yet been resolved, I still need to check to see if it has been fixed by #408 .

cncastillo

Amazing! I love the PRs that remove code 😄, well done!!

~~I added a few code suggestions, so we can filter with :skipbuildkite, as some tests are pretty GPU agnostic~~. Also, I think that "KomaMRIFiels" and "KomaMRIPlots" are not used for these tests, so I suggested removing them.

EDIT: I changed my mind about the :skipbuildkite, the tests code coverage would be harder to calculate if one part is run on GitHub and the other on Buildkite.

.buildkite/pipeline.yml

KomaMRICore/test/runtests.jl

cncastillo · 2024-06-17T20:54:15Z

KomaMRICore/test/initialize.jl

Any way we can select which backend to use in VSCode?

I don't think so since there doesn't appear to be a way to pass test arguments through the VSCode test panel. If a user wants to run an individual test on a specific backend, the easiest way to do it might be to add this:

ARGS = ["CUDA"]

before the line with include("initialize.jl")

Mmm maybe using Preferences.jl?

I think Preferences.jl requires being inside a module that corresponds to a loaded package, and with the way TestItems.jl works it creates separate modules for each TestItem. If I try to do anything with Preferences.jl inside run_tests.jl or initialize.jl I get an error: 'ArgumentError: Module Main does not correspond to a loaded package!'. There might be a way to get this to work, but I'm not sure how at the moment.

Oh, it is fine then. ~~If by default it tries to load CUDA, I think that would be enough, like isempty(ARGS) && ARGS = ["CUDA"].~~

EDIT: Ideally, there could be a way to control it locally for the local VSCode tests, but I am not sure what the best way is.

.buildkite/pipeline.yml

Removed KomaFiles & Plots and added `|` to `command` in pipeline.yml

cncastillo

I applied the code review changes, but some stuff from AMD needs Julia 1.10, so unfortunately I think we will need to use Julia 1.10 for that one. OneAPI seems to have the same problem with cumsum as Metal.

CUDA and Metal are working! https://buildkite.com/julialang/komamri-dot-jl/builds/593

I saw the CUDA ext being loaded in AMD, maybe we need to put them as weakdeps in the tests/Project.toml?

rkierulf · 2024-06-17T22:22:46Z

I changed the Julia version for the AMD tests to 1.10 and added the same workaround we're using for Metal into oneAPIExt.jl.

For the test Project.toml dependencies, I tried moving the GPU packages to [weakdeps] but it complained about them not being loaded when I tried to run the tests, even if I included them with 'using' before. DiffEqGPU.jl is adding the packages and then deleting test/Manifest.toml to avoid having any of the packages as explicit dependencies: https://github.com/SciML/DiffEqGPU.jl/blob/master/.buildkite/runtests.yml, so we could try and follow what they're doing, however, the problem I see with this is that I'm not sure how a user would be able to run tests on their own GPU without changing the test/Project.toml themselves, which may not be desirable.

It looks like AMD now has an issue with the device name function (should be easy to fix), and Intel has an issue with 'unsupported use of double value'.

rkierulf · 2024-06-17T22:49:03Z

Nice, AMD is now passing!

The Intel error may take a bit more time to resolve, I'll look into tomorrow

cncastillo · 2024-06-18T14:54:07Z

This PR could be relevant for the test environment:

Do not install all GPU backends at once FluxML/Flux.jl#2453

Also, this: https://fluxml.ai/Flux.jl/stable/guide/gpu/#Selecting-GPU-backend.

rkierulf · 2024-06-18T17:29:08Z

I copied the approach from that pull request so we are no longer installing all GPU backends. I also added a new gpu method without any backend parameter that uses the logic in get_backend() to determine which backend to use. The oneAPI tests are still failing on this line in BlochSimulationMethod.jl:

Mxy = [M.xy M.xy .* exp.(1im .* ϕ .- tp' ./ p.T2)]

I think there is a Float64 value sneaking in somehow based on the 'unsupported use of double value' message.

…ails

rkierulf · 2024-06-18T21:58:34Z

oneAPI is passing now that I switched the line with:

exp.(exp_argument)

to the equivalent:

exp.(real.(exp_argument)) .* (cos.(imag.(exp_argument)) .+ Complex{T}(0,1) * sin.(imag.(exp_argument)))

I'm pretty sure this is a oneAPI bug, but nice that we can work around it for now.

codecov · 2024-06-18T22:02:22Z

Codecov Report

Attention: Patch coverage is 50.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 88.22%. Comparing base (515d915) to head (9068796).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage   87.40%   88.22%   +0.82%     
==========================================
  Files          49       49              
  Lines        2802     2811       +9     
==========================================
+ Hits         2449     2480      +31     
+ Misses        353      331      -22

Flag	Coverage Δ
base	`86.42% <ø> (ø)`
core	`79.39% <50.00%> (+5.70%)`	⬆️
files	`93.70% <ø> (ø)`
komamri	`93.98% <ø> (ø)`
plots	`89.27% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
KomaMRICore/ext/KomaAMDGPUExt.jl	`41.66% <100.00%> (+41.66%)`	⬆️
KomaMRICore/ext/KomaoneAPIExt.jl	`60.00% <100.00%> (+60.00%)`	⬆️
KomaMRICore/src/KomaMRICore.jl	`100.00% <ø> (ø)`
.../src/simulation/Bloch/BlochDictSimulationMethod.jl	`87.50% <100.00%> (ø)`
...Core/src/simulation/Bloch/BlochSimulationMethod.jl	`100.00% <100.00%> (ø)`
KomaMRICore/ext/KomaMetalExt.jl	`0.00% <0.00%> (ø)`
KomaMRICore/src/simulation/Functors.jl	`32.43% <0.00%> (-20.70%)`	⬇️

... and 3 files with indirect coverage changes

cncastillo · 2024-06-19T12:00:22Z

We will be able to use exp when this is merged :)

Implement sincos intrinsic to fix exp. JuliaGPU/oneAPI.jl#443

cncastillo · 2024-06-19T13:22:39Z

We also got a passing badge!

We can specify ?step= to have a badge for each backend, but I haven't looked into this.

cncastillo · 2024-06-19T15:12:43Z

FYI:

…to buildkite

cncastillo

Due to problems with the codecov keys for Metal, we decided to remove Metal's codecov 😞 .

.buildkite/pipeline.yml

Fixed codecov key, removed Metal codecov

Simplify tests and add Buildkite pipeline

73a6b59

rkierulf requested a review from cncastillo as a code owner June 17, 2024 20:06

cncastillo approved these changes Jun 17, 2024

View reviewed changes

cncastillo requested changes Jun 17, 2024

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

f7bcd24

Removed KomaFiles & Plots and added `|` to `command` in pipeline.yml

cncastillo requested changes Jun 17, 2024

View reviewed changes

rkierulf linked an issue Jun 17, 2024 that may be closed by this pull request

Using BuildKite for GPU related CI? #147

Closed

Change AMDGPU tests to use julia 1.10 and add oneAPI workaround

a29b12b

rkierulf added 2 commits June 17, 2024 17:37

Fix AMD device name issue

2f0b520

Try again to fix AMD device name issue

ae99553

rkierulf added 3 commits June 18, 2024 10:50

See if this fixes oneAPI issue

78c4790

Add gpu function without backend parameter

84ac225

Avoid installing all GPU backends

b7f30b1

rkierulf added 5 commits June 18, 2024 14:19

Type stability for 1im

aec8848

Change warning message slightly

6cde94b

Split BlochSimulationMethod::62 into multiple lines to see where it f…

c52c9ba

…ails

Add some more temporary printing

ef5bfc7

Try using cos and sin instead of broadcasting exp over complex numbers

7d2dc76

rkierulf added 2 commits June 18, 2024 18:17

Use cis for complex exponentials

40c09ac

Switch back to sin and cos since cis still doesn't work

a056dc9

Merge branch 'master' of https://github.com/JuliaHealth/KomaMRI.jl in…

43672f2

…to buildkite

rkierulf requested a review from cncastillo June 19, 2024 18:42

rkierulf added 7 commits June 19, 2024 15:10

Add code coverage plugin to Buildkite pipeline

310f3a1

Add code coverage token

0d70782

Try with new token

5fd3245

Third time's the charm?

5ce5d45

Try one more time

ab48e13

Add JULIA_PKG_SERVER

0becfe8

Try with 2nd CODECOV_TOKEN value

08ee3b3

cncastillo mentioned this pull request Jun 19, 2024

3D recon #324

Draft

cncastillo approved these changes Jun 20, 2024

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

cncastillo added 2 commits June 20, 2024 10:25

Fixed

714cbb4

Fixed codecov key, removed Metal codecov

Merge branch 'master' into buildkite

9068796

cncastillo changed the title ~~Add Buildkite pipeline~~ Add Buildkite GPU pipeline Jun 20, 2024

cncastillo merged commit 3175e71 into master Jun 20, 2024
16 of 18 checks passed

cncastillo deleted the buildkite branch June 20, 2024 17:21

This was referenced Jun 20, 2024

Future: use @testsetup to reduce time to run tests on GPU #414

Closed

Optimize ArbitraryMotion (continuation) #408

Merged

rkierulf mentioned this pull request Aug 23, 2024

GSOC: Add GPU Explanation Section to Documentation #470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Buildkite GPU pipeline #411

Add Buildkite GPU pipeline #411

rkierulf commented Jun 17, 2024

cncastillo left a comment •

edited

Loading

cncastillo Jun 17, 2024 •

edited

Loading

rkierulf Jun 17, 2024

cncastillo Jun 18, 2024

rkierulf Jun 18, 2024

cncastillo Jun 18, 2024 •

edited

Loading

cncastillo left a comment •

edited

Loading

rkierulf commented Jun 17, 2024

rkierulf commented Jun 17, 2024

cncastillo commented Jun 18, 2024 •

edited

Loading

rkierulf commented Jun 18, 2024

rkierulf commented Jun 18, 2024

codecov bot commented Jun 18, 2024 •

edited

Loading

cncastillo commented Jun 19, 2024

cncastillo commented Jun 19, 2024 •

edited

Loading

cncastillo commented Jun 19, 2024

cncastillo left a comment •

edited

Loading

Add Buildkite GPU pipeline #411

Add Buildkite GPU pipeline #411

Conversation

rkierulf commented Jun 17, 2024

cncastillo left a comment • edited Loading

Choose a reason for hiding this comment

cncastillo Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

rkierulf Jun 17, 2024

Choose a reason for hiding this comment

cncastillo Jun 18, 2024

Choose a reason for hiding this comment

rkierulf Jun 18, 2024

Choose a reason for hiding this comment

cncastillo Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

cncastillo left a comment • edited Loading

Choose a reason for hiding this comment

rkierulf commented Jun 17, 2024

rkierulf commented Jun 17, 2024

cncastillo commented Jun 18, 2024 • edited Loading

rkierulf commented Jun 18, 2024

rkierulf commented Jun 18, 2024

codecov bot commented Jun 18, 2024 • edited Loading

Codecov Report

cncastillo commented Jun 19, 2024

cncastillo commented Jun 19, 2024 • edited Loading

cncastillo commented Jun 19, 2024

cncastillo left a comment • edited Loading

Choose a reason for hiding this comment

cncastillo left a comment •

edited

Loading

cncastillo Jun 17, 2024 •

edited

Loading

cncastillo Jun 18, 2024 •

edited

Loading

cncastillo left a comment •

edited

Loading

cncastillo commented Jun 18, 2024 •

edited

Loading

codecov bot commented Jun 18, 2024 •

edited

Loading

cncastillo commented Jun 19, 2024 •

edited

Loading

cncastillo left a comment •

edited

Loading