-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Buildkite GPU pipeline #411
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! I love the PRs that remove code 😄, well done!!
I added a few code suggestions, so we can filter with . Also, I think that "KomaMRIFiels" and "KomaMRIPlots" are not used for these tests, so I suggested removing them.:skipbuildkite
, as some tests are pretty GPU agnostic
EDIT: I changed my mind about the :skipbuildkite
, the tests code coverage would be harder to calculate if one part is run on GitHub and the other on Buildkite.
KomaMRICore/test/initialize.jl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we can select which backend to use in VSCode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so since there doesn't appear to be a way to pass test arguments through the VSCode test panel. If a user wants to run an individual test on a specific backend, the easiest way to do it might be to add this:
ARGS = ["CUDA"]
before the line with include("initialize.jl")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm maybe using Preferences.jl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Preferences.jl requires being inside a module that corresponds to a loaded package, and with the way TestItems.jl works it creates separate modules for each TestItem. If I try to do anything with Preferences.jl inside run_tests.jl or initialize.jl I get an error: 'ArgumentError: Module Main does not correspond to a loaded package!'. There might be a way to get this to work, but I'm not sure how at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it is fine then. If by default it tries to load CUDA, I think that would be enough, like isempty(ARGS) && ARGS = ["CUDA"]
.
EDIT: Ideally, there could be a way to control it locally for the local VSCode tests, but I am not sure what the best way is.
Removed KomaFiles & Plots and added `|` to `command` in pipeline.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the code review changes, but some stuff from AMD needs Julia 1.10, so unfortunately I think we will need to use Julia 1.10 for that one. OneAPI seems to have the same problem with cumsum
as Metal.
CUDA and Metal are working! https://buildkite.com/julialang/komamri-dot-jl/builds/593
I saw the CUDA ext being loaded in AMD, maybe we need to put them as weakdeps in the tests/Project.toml
?
I changed the Julia version for the AMD tests to 1.10 and added the same workaround we're using for Metal into oneAPIExt.jl. For the test Project.toml dependencies, I tried moving the GPU packages to [weakdeps] but it complained about them not being loaded when I tried to run the tests, even if I included them with 'using' before. DiffEqGPU.jl is adding the packages and then deleting test/Manifest.toml to avoid having any of the packages as explicit dependencies: https://github.com/SciML/DiffEqGPU.jl/blob/master/.buildkite/runtests.yml, so we could try and follow what they're doing, however, the problem I see with this is that I'm not sure how a user would be able to run tests on their own GPU without changing the test/Project.toml themselves, which may not be desirable. It looks like AMD now has an issue with the device name function (should be easy to fix), and Intel has an issue with 'unsupported use of double value'. |
Nice, AMD is now passing! The Intel error may take a bit more time to resolve, I'll look into tomorrow |
This PR could be relevant for the test environment: Also, this: https://fluxml.ai/Flux.jl/stable/guide/gpu/#Selecting-GPU-backend. |
I copied the approach from that pull request so we are no longer installing all GPU backends. I also added a new gpu method without any backend parameter that uses the logic in get_backend() to determine which backend to use. The oneAPI tests are still failing on this line in BlochSimulationMethod.jl: Mxy = [M.xy M.xy .* exp.(1im .* ϕ .- tp' ./ p.T2)] I think there is a Float64 value sneaking in somehow based on the 'unsupported use of double value' message. |
oneAPI is passing now that I switched the line with: exp.(exp_argument) to the equivalent: exp.(real.(exp_argument)) .* (cos.(imag.(exp_argument)) .+ Complex{T}(0,1) * sin.(imag.(exp_argument))) I'm pretty sure this is a oneAPI bug, but nice that we can work around it for now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #411 +/- ##
==========================================
+ Coverage 87.40% 88.22% +0.82%
==========================================
Files 49 49
Lines 2802 2811 +9
==========================================
+ Hits 2449 2480 +31
+ Misses 353 331 -22
Flags with carried forward coverage won't be shown. Click here to find out more.
|
We will be able to use exp when this is merged :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to problems with the codecov keys for Metal, we decided to remove Metal's codecov 😞 .
This is an initial implementation for creating a Buildkite pipeline to test on all 4 GPU backends (CUDA, AMD, Metal and oneAPI). As discussed with Carlos, I simplified the KomaMRICore tests so there are no longer separate tests for CPU single-thread, CPU multi-thread, and GPU. Instead, the tests now look to see if there is a test argument for "AMDGPU", "CUDA", "Metal", or "oneAPI", and if so load the corresponding package, which will trigger loading the corresponding ext module from KomaMRICore. The tests no longer set anything for the "Nthreads" parameter, so they can be made to run with a specific number of CPU threads by passing the julia argument --threads= to Pkg.test(). For now, the SimpleMotion and ArbitraryMotion tests still have the :skipci tag since there is at least one issue with SimpleMotion on Metal that has not yet been resolved, I still need to check to see if it has been fixed by #408 .