Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297

codetalker7 · 2023-07-22T17:31:38Z

This PR addresses issue #2293 by creating a device object to be used instead of the gpu function. This method was proposed by @CarloLucibello in the mentioned issue, and has a few advantages. The implementation has been inspired by Lux's approach to handling GPU backends.

Currently, this is just a draft PR, containing the high-level idea. The main addition is the AbstractDevice type, along with four concrete types representing devices for different GPU backends (and a device representing the CPU).

As an example, we can now do the following: (for the below examples, I had stored AMD as my gpu_backend preference)

# example without GPU
julia> using Flux;

julia> model = Dense(2 => 3)
Dense(2 => 3)       # 9 parameters

julia> device = Flux.get_device()           # this will just load the CPU device
[ Info: Using backend set in preferences: AMD.
┌ Warning: Trying to use backend AMD but package AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e] is not loaded.
│ Please load the package and call this function again to respect the preferences backend.
└ @ Flux ~/fluxml/Flux.jl/src/functor.jl:496
[ Info: Running automatic device selection...
(::Flux.FluxCPUDevice) (generic function with 1 method)

julia> model = model |> device
Dense(2 => 3)       # 9 parameters

julia> model.weight
3×2 Matrix{Float32}:
 -0.304362  -0.700477
 -0.861201   0.67825
 -0.176017   0.234188

Here is the same example, now using CUDA:

julia> using Flux, CUDA;

julia> model = Dense(2 => 3)
Dense(2 => 3)       # 9 parameters

julia> device = Flux.get_device()
[ Info: Using backend set in preferences: AMD.
┌ Warning: Trying to use backend AMD but package AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e] is not loaded.
│ Please load the package and call this function again to respect the preferences backend.
└ @ Flux ~/fluxml/Flux.jl/src/functor.jl:496
[ Info: Running automatic device selection...
(::Flux.FluxCUDADevice) (generic function with 1 method)

julia> model = model |> device
Dense(2 => 3)       # 9 parameters

julia> model.weight
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
  0.820013   0.527131
 -0.915589   0.549048
  0.290744  -0.0592499

PR Checklist

Finalize the implementation of get_device.
Add documentation for the new device objects, and possibly some docstrings.
Decide and finalize the warning messages; right now they are inspired by Lux.
Since Flux.get_device directly loads the preferences, are the references CUDA_LOADED, AMDGPU_LOADED and METAL_LOADED needed? On a similar note, the gpu(x) function, and the GPUBACKEND global isn't really needed anymore.
Implement DataLoader support for device objects.
Add/update relevant tests.

Closes #2293.

appropriate GPU backend (or CPU, if nothing is available).

CarloLucibello · 2023-07-22T19:07:48Z

Looks very good already.

Since Flux.get_device directly loads the preferences, are the references CUDA_LOADED, AMDGPU_LOADED and METAL_LOADED needed? On a similar note, the gpu(x) function, and the GPUBACKEND global isn't really needed anymore.

Since gpu is widely used I would avoid deprecating it for some time. Let's just introduce the new features in this PR, we can think later about possible deprecations paths.

…tion.

changes in `gpu(x)`. Adding more details in docstring of `get_device`.

ToucheSir · 2023-07-23T11:00:03Z

Instead of relying on pkg IDs, can we try to reuse some of the device and backend machinery from GPUArrays or KernelAbstractions?

codetalker7 · 2023-07-23T16:54:31Z

Instead of relying on pkg IDs, can we try to reuse some of the device and backend machinery from GPUArrays or KernelAbstractions?

Hello @ToucheSir. Do you have any specific functionalities from GPUArrays/KernelAbstractions in mind which we can use here? Also, I guess the only use of Pkg IDs here is to see if the package has been loaded; I can drop that and use the CUDA_LOADED, AMDGPU_LOADED and METAL_LOADED flags?

…arn`.

ToucheSir · 2023-07-24T05:20:56Z

Those two libraries have device and backend types already. I think we should try to use them directly or wrap them if we can.

I can drop that and use the CUDA_LOADED, AMDGPU_LOADED and METAL_LOADED flags?

That works. Another route would be to define a function on a backend which returns whether that backend is loaded. Each extension package yhen adds a method to that function, which means you can use dispatch and maybe save a few conditionals.

codetalker7 · 2023-07-24T18:17:12Z

Those two libraries have device and backend types already. I think we should try to use them directly or wrap them if we can.

I can drop that and use the CUDA_LOADED, AMDGPU_LOADED and METAL_LOADED flags?

That works. Another route would be to define a function on a backend which returns whether that backend is loaded. Each extension package then adds a method to that function, which means you can use dispatch and maybe save a few conditionals.

Hi @ToucheSir. I went through KernelAbstractions and GPUArrays. KernelAbstractions has backend types (namely Backend, GPU and CPU) and GPUArrays has just one backend type (AbstractGPUBackend). I couldn't find device types in either package (hopefully I've not missed anything).

It seems logical to me that device types should be separate from backend type; I can wrap a backend type around a device type (i.e by having a backend property for each device), but that didn't help me much in our case, since I only have to check whether a package has been loaded, which I can do via the package extension as you suggested.

Regarding your suggestion about having a method to check whether a device is loaded: this works nicely. I am thinking of doing the following (after removing the pkgid field from the four device types):

# Inside src/functors.jl
isavailable(device::AbstractDevice) = false  💡
isfunctional(device::AbstractDevice) = false

# CPU is always functional and available
isavailable(device::FluxCPUDevice) = true
isfunctional(device::FluxCPUDevice) = true

Then, for example in ext/FluxCUDAExt/FluxCUDAExt.jl, I add the following:

Flux.isavailable(device::Flux.FluxCUDADevice) = true
Flux.isfunctional(device::Flux.FluxCUDADevice) = CUDA.functional()

After this, in all the conditionals in Flux.get_device, I can simply use isavailable(device) and isfunctional(device) instead of the pkgids.

Does this sound fine? If so, I'll update the PR.

CarloLucibello · 2023-07-25T04:43:56Z

Does this sound fine? If so, I'll update the PR.

definitely a better solution then the pkgid one

available and functional.

codetalker7 · 2023-07-25T17:25:14Z

Does this sound fine? If so, I'll update the PR.

definitely a better solution then the pkgid one

Sure, I have updated the PR with the new implementation. Also, for the documentation: since we haven't removed any old functionalities in this PR, I'm just planning to add a new section on devices, device type and the get_device method. Is there anything more I should be adding?

And for tests, I'm planning to add basic tests which just verify that the correct device is loaded for the required case. Also, I have access to a machine with NVIDIA GPUs. Is there a way to run tests without AMD/Metal GPUs?

CarloLucibello · 2023-07-26T04:54:29Z

Sure, I have updated the PR with the new implementation. Also, for the documentation: since we haven't removed any old functionalities in this PR, I'm just planning to add a new section on devices, device type and the get_device method. Is there anything more I should be adding?

that's it I guess. In a follow-up PR we should then start to deprecate gpu at the documentation level only

And for tests, I'm planning to add basic tests which just verify that the correct device is loaded for the required case. Also, I have access to a machine with NVIDIA GPUs. Is there a way to run tests without AMD/Metal GPUs?

I'm not sure I understand the question. In any case, we have CI running tests on the different devices through buildkite. If you look at the structure of test/runtests.jl you will understand where to place the new tests for the various devices.

ToucheSir · 2023-07-26T06:45:38Z

src/functor.jl

+A type representing `device` objects for the `"Metal"` backend for Flux.
+"""
+Base.@kwdef struct FluxMetalDevice <: AbstractDevice
+    name::String = "Metal"


Can we use dispatch to get these fixed names and use the fields to instead store info about the actual device? e.g. ordinal number or wrapping the actual device type(s) from each GPU package.

Yes, I'll try to add this to the structs.

Hi @ToucheSir. I've added a deviceID to each device struct, whose type is the device type from the corresponding GPU package. Since KernelAbstractions or GPUArrays doesn't have any type hierarchy for device objects, I've moved the struct definitions to the package extensions. The device types are CUDA.CuDevice, AMDGPU.HIPDevice and Metal.MTLDevice respectively.

One disadvantage of this approach: from what I understand, Flux leaves the work of managing devices to the GPU packages. So, if the user chooses to switch a device by using functions from the GPU package, then our device object will also have to be updated (which currently isn't the case). But if users of Flux don't care about what device is allocated to them, I think this works fine.

What do you think about this?

In my mind, the whole point of calling this a device instead of a backend is that we'd allow users to choose which device they want their model to be transferred onto. If that's not feasible because of limitations in the way GPU packages must be used, I'd rather just call these backends instead. Others might have differing opinions on this, however, cc @CarloLucibello from earlier.

In my mind, the whole point of calling this a device instead of a backend is that we'd allow users to choose which device they want their model to be transferred onto. If that's not feasible because of limitations in the way GPU packages must be used, I'd rather just call these backends instead. Others might have differing opinions on this, however, cc @CarloLucibello from earlier.

Yes, I agree. Also, if a user wants to have finer control over which device they want to use, isn't it better for them to just rely on CUDA.jl for example?

If not, I think it won't be hard to add a device selection capability within Flux as well. But ultimately, we will be calling functions from GPU packages, which the user can just call themselves.

Sure. I'm fine with either; also, if we are to implement an interface for handling multiple devices, wouldn't it be a good idea to first discuss the overall API we want, and the specific implementation details we need (asking because I'm not completely aware of what all I'll have to implement to handle multiple devices)?

For instance, when we are talking about "multiple devices", do we mean providing the user the functionality to use "just one device", but have the ability to choose which one? Or do we mean using multiple devices simultaneously to train models? For the latter I was going through DaggerFlux.jl and it seems it's more non-trivial. The first idea seems easier to implement.

Somewhere in the middle I think. Training on multiple GPUs is out of scope for this PR (we have other efforts looking into that), but allowing users to transfer models to any active GPU without calling device! beforehand every time would be great for ergonomics.

Somewhere in the middle I think. Training on multiple GPUs is out of scope for this PR (we have other efforts looking into that), but allowing users to transfer models to any active GPU without calling device! beforehand every time would be great for ergonomics.

Sure, I think this shouldn't be too hard to implement. I have one idea for this.

Device methods

We will have the following methods:

function get_device() # this will be what we have right now # this returns an `AbstractDevice` whose deviceID # is the device with which the GPU package has been # initialized automatically end function get_device(backend::Type{<:KA.GPU}, ordinal::UInt) # this will return an `AbstractDevice` from the given backend whose deviceID # is the device with the given ordinal number. These methods will be defined # in the corresponding package extensions. end

With these functions, users can then specify the backend + ordinal of the GPU device which they want to work with.

Model transfer between devices

Next, suppose we have a model m which is bound to an AbstractDevice, say device1, which has a backend1::Type{<:KA.GPU} and an ordinal1::UInt. Suppose device2 is another device object with backend2::Type{<:KA.GPU} and ordinal2::UInt.

Then, a call to device2(m) will do the following: if backend1 == backend2 and ordinal1 == ordinal2, then nothing happens and m is returned. Otherwise, device1 is "freed" of m (we'll have to do some device memory management here) and is bound to device2.

In the above, the tricky part is how to identify the GPU backend + ordinal which m is bound to, and how to do free the memory taken by m on the device. For simple models like Dense, I can do the following

# suppose the backend is CUDA julia> using Flux, CUDA; julia> m = Dense(2 => 3) |> gpu; julia> CUDA.device(m.weight) # this gives me the device to which m is bound CuDevice(0): NVIDIA GeForce GTX 1650 julia> CUDA.unsafe_free!(m.weight) ; # just an idea, but something similar

Now clearly, I can't do something similar if m is a complex model. So we'll probably have to add some property to models which stores the device backend + ordinal to which they are bound.

Regarding the freeing the GPU device memory: for CUDA for example, we can probably use the CUDA.unsafe_free! method. But it might be unsafe for a reason.

How does this idea sound, @ToucheSir @CarloLucibello? Any pointers/suggestions on how to track which device a model is bound to and how to do the memory management?

If the actual data movement adaptors (e.g. FluxCUDAAdaptor) receives the device ID as an argument, then you only need to apply your detect + free logic at the level of individual parameters. fmap will take care of mapping the logic over a complex model.

In the simple case we are talking about, every parameter in the model should be bound to the same device. In general, model parallelism means that a model could be across multiple devices.

Yes, the only thing we need to worry about is "can I move this array to this device the user asked for?" Which sounds simple but might be tricky in practice if the GPU packages don't provide a way to do that directly. I hope here's a relatively straightforward way for most of them, but if not we can save that for future work and/or bug upstream to add it in for us :)

to package extensions.

CarloLucibello · 2023-07-30T12:22:02Z

let's add some tests and get this PR merged, discussions on device selection can be done somewhere else

codetalker7 · 2023-07-30T14:12:38Z

let's add some tests and get this PR merged, discussions on device selection can be done somewhere else

@CarloLucibello sure, I'm fine with it. I was trying to implement @darsnack's and @ToucheSir's idea on data transfer, but if it's better I can make a new issue for discussing those ideas and a new PR to implement it.

Also, I haven't touched the old code (except for minor changes), and haven't added any new tests either. But Nightly CI is still failing. How do I fix that?

CarloLucibello · 2023-07-30T14:21:42Z

Nightly CI has been failing for a while, ignore it.

ToucheSir · 2023-07-30T19:00:05Z

After talking with Tim and thinking it over I think per-device movement should be as simple as

Saving the current device
Calling device!(new device ID)
Allocating the destination array
copy! ing to the new array
Switching back to the original device with device!

If that turns out to be too much work, I agree with Carlo's suggestion.

functions.

docs/src/gpu.md

codetalker7 · 2023-08-01T07:03:43Z

I've added a few device selection tests. Also, I found a bug in test/functors.jl: it should be AMDGPU_LOADED instead of AMD_LOADED (I've fixed it now). If I'm not wrong, some CI tests here were not catching that.

CarloLucibello · 2023-08-01T13:13:21Z

test/runtests.jl

+    @test typeof(Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["Metal"]]) <: Flux.FluxMetalDevice
+    device = Flux.get_device()
+
    if Metal.functional()
+      @test typeof(Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["Metal"]].deviceID) <: Metal.MTLDevice
+      @test typeof(device) <: Flux.FluxMetalDevice
+      @test typeof(device.deviceID) <: Metal.MTLDevice
+      @test Flux._get_device_name(device) in Flux.supported_devices()
+


Let's not clutter test/runtests.jl with these tests. They can go within the ext_* folder.
Also, we need tests checking the x |> device transfers data correctly.

Okay, I'll move the tests to the extensions files.

Regarding x |> device tests: the extensions have a massive test suite for the gpu function. Under the hood, x |> device is also calling a gpu function; do I need to write all the same test cases for x |> device as well? Or just simple tests like checking GPU array types suffices?

simple tests are enough

test.

CarloLucibello · 2023-08-02T12:15:19Z

test/runtests.jl

    @testset "CUDA" begin
+      include("ext_cuda/device_selection.jl")


Suggested change

include("ext_cuda/device_selection.jl")

let's group these tests under the single file "get_devices.jl"

Done. Please let me know, if other tests need to be added (like more tests for x |> device on other types).

CarloLucibello · 2023-08-04T14:30:16Z

Fantastic work @codetalker7, thanks

codetalker7 · 2023-08-04T16:25:29Z

Fantastic work @codetalker7, thanks

Thank you!

codetalker7 added 2 commits July 22, 2023 17:30

Adding structs for cpu and gpu devices.

5d4f2a2

Adding implementation of Flux.get_device(), which returns the most

70044fb

appropriate GPU backend (or CPU, if nothing is available).

codetalker7 mentioned this pull request Jul 22, 2023

Allow old silent behavior for gpu #2293

Closed

codetalker7 added 2 commits July 23, 2023 11:44

Adding docstrings for the new device types, and the get_device func…

0dc5629

…tion.

Adding CPU to the list of supported backends. Made corresponding

a3f9257

changes in `gpu(x)`. Adding more details in docstring of `get_device`.

codetalker7 added 2 commits July 23, 2023 23:05

Using julia-repl instead of jldoctest, and @info instead of `@w…

18938de

…arn`.

Adding DataLoader functionality to device objects.

bf134ad

codetalker7 added 2 commits July 25, 2023 14:12

Removing pkgids and defining new functions to check whether backend is

f8fc22c

available and functional.

Correcting typographical errors, and removing useless imports.

3cd1d89

ToucheSir reviewed Jul 26, 2023

View reviewed changes

Adding deviceID to each device struct, and moving struct definitions

f7f21e1

to package extensions.

codetalker7 requested a review from ToucheSir July 27, 2023 09:01

codetalker7 added 5 commits July 31, 2023 03:45

Adding tutorial for using device objects in manual.

d22aaf5

Adding docstring for get_device in manual, and renaming internal

03faa96

functions.

Minor change in docs.

e1ad3e7

Removing structs from package extensions as it is bad practice.

179bbea

Adding more docstrings in manual.

bb67ad6

CarloLucibello reviewed Jul 31, 2023

View reviewed changes

docs/src/gpu.md Outdated Show resolved Hide resolved

CarloLucibello reviewed Jul 31, 2023

View reviewed changes

docs/src/gpu.md Outdated Show resolved Hide resolved

codetalker7 added 6 commits July 31, 2023 22:02

Removing redundant log messages.

7be1700

Adding kwarg to get_device for verbose output.

7558d29

Setting deviceID to nothing if GPU is not functional.

95e3bc3

Adding basic tests for device objects.

40b1fe2

Fixing minor errors in package extensions and tests.

df70154

Minor fix in tests + docs.

650a273

codetalker7 marked this pull request as ready for review July 31, 2023 20:59

CarloLucibello reviewed Aug 1, 2023

View reviewed changes

Moving device tests to extensions, and adding a basic data transfer

1495e04

test.

CarloLucibello reviewed Aug 2, 2023

View reviewed changes

Moving all device tests in single file per extension.

b07985b

codetalker7 requested a review from CarloLucibello August 4, 2023 14:19

CarloLucibello approved these changes Aug 4, 2023

View reviewed changes

CarloLucibello merged commit c2bd39d into FluxML:master Aug 4, 2023
5 of 6 checks passed

codetalker7 mentioned this pull request Aug 4, 2023

Implement data movement across GPU devices. #2302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297

Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297

codetalker7 commented Jul 22, 2023 •

edited

Loading

CarloLucibello commented Jul 22, 2023

ToucheSir commented Jul 23, 2023

codetalker7 commented Jul 23, 2023

ToucheSir commented Jul 24, 2023

codetalker7 commented Jul 24, 2023 •

edited

Loading

CarloLucibello commented Jul 25, 2023

codetalker7 commented Jul 25, 2023

CarloLucibello commented Jul 26, 2023

ToucheSir Jul 26, 2023

codetalker7 Jul 26, 2023

codetalker7 Jul 27, 2023

ToucheSir Jul 27, 2023

codetalker7 Jul 27, 2023

codetalker7 Jul 28, 2023

ToucheSir Jul 28, 2023

codetalker7 Jul 28, 2023

darsnack Jul 29, 2023

ToucheSir Jul 29, 2023

CarloLucibello commented Jul 30, 2023

codetalker7 commented Jul 30, 2023

CarloLucibello commented Jul 30, 2023

ToucheSir commented Jul 30, 2023

codetalker7 commented Aug 1, 2023

CarloLucibello Aug 1, 2023

codetalker7 Aug 1, 2023

CarloLucibello Aug 2, 2023

CarloLucibello Aug 2, 2023

codetalker7 Aug 2, 2023

CarloLucibello commented Aug 4, 2023

codetalker7 commented Aug 4, 2023

		@testset "CUDA" begin
		include("ext_cuda/device_selection.jl")

Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297

Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297

Conversation

codetalker7 commented Jul 22, 2023 • edited Loading

PR Checklist

CarloLucibello commented Jul 22, 2023

ToucheSir commented Jul 23, 2023

codetalker7 commented Jul 23, 2023

ToucheSir commented Jul 24, 2023

codetalker7 commented Jul 24, 2023 • edited Loading

CarloLucibello commented Jul 25, 2023

codetalker7 commented Jul 25, 2023

CarloLucibello commented Jul 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Device methods

Model transfer between devices

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Jul 30, 2023

codetalker7 commented Jul 30, 2023

CarloLucibello commented Jul 30, 2023

ToucheSir commented Jul 30, 2023

codetalker7 commented Aug 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Aug 4, 2023

codetalker7 commented Aug 4, 2023

codetalker7 commented Jul 22, 2023 •

edited

Loading

codetalker7 commented Jul 24, 2023 •

edited

Loading