-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding device objects for selecting GPU backends (and defaulting to CPU if none exists). #2297
Conversation
appropriate GPU backend (or CPU, if nothing is available).
Looks very good already.
Since |
changes in `gpu(x)`. Adding more details in docstring of `get_device`.
Instead of relying on pkg IDs, can we try to reuse some of the device and backend machinery from GPUArrays or KernelAbstractions? |
Hello @ToucheSir. Do you have any specific functionalities from GPUArrays/KernelAbstractions in mind which we can use here? Also, I guess the only use of Pkg IDs here is to see if the package has been loaded; I can drop that and use the |
Those two libraries have device and backend types already. I think we should try to use them directly or wrap them if we can.
That works. Another route would be to define a function on a backend which returns whether that backend is loaded. Each extension package yhen adds a method to that function, which means you can use dispatch and maybe save a few conditionals. |
Hi @ToucheSir. I went through It seems logical to me that device types should be separate from backend type; I can wrap a backend type around a device type (i.e by having a Regarding your suggestion about having a method to check whether a device is loaded: this works nicely. I am thinking of doing the following (after removing the # Inside src/functors.jl
isavailable(device::AbstractDevice) = false 💡
isfunctional(device::AbstractDevice) = false
# CPU is always functional and available
isavailable(device::FluxCPUDevice) = true
isfunctional(device::FluxCPUDevice) = true Then, for example in Flux.isavailable(device::Flux.FluxCUDADevice) = true
Flux.isfunctional(device::Flux.FluxCUDADevice) = CUDA.functional() After this, in all the conditionals in Does this sound fine? If so, I'll update the PR. |
definitely a better solution then the pkgid one |
Sure, I have updated the PR with the new implementation. Also, for the documentation: since we haven't removed any old functionalities in this PR, I'm just planning to add a new section on devices, device type and the And for tests, I'm planning to add basic tests which just verify that the correct device is loaded for the required case. Also, I have access to a machine with NVIDIA GPUs. Is there a way to run tests without AMD/Metal GPUs? |
that's it I guess. In a follow-up PR we should then start to deprecate
I'm not sure I understand the question. In any case, we have CI running tests on the different devices through buildkite. If you look at the structure of |
src/functor.jl
Outdated
A type representing `device` objects for the `"Metal"` backend for Flux. | ||
""" | ||
Base.@kwdef struct FluxMetalDevice <: AbstractDevice | ||
name::String = "Metal" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use dispatch to get these fixed names and use the fields to instead store info about the actual device? e.g. ordinal number or wrapping the actual device type(s) from each GPU package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll try to add this to the structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ToucheSir. I've added a deviceID
to each device struct, whose type is the device type from the corresponding GPU package. Since KernelAbstractions
or GPUArrays
doesn't have any type hierarchy for device objects, I've moved the struct definitions to the package extensions. The device types are CUDA.CuDevice
, AMDGPU.HIPDevice
and Metal.MTLDevice
respectively.
One disadvantage of this approach: from what I understand, Flux leaves the work of managing devices to the GPU packages. So, if the user chooses to switch a device by using functions from the GPU package, then our device
object will also have to be updated (which currently isn't the case). But if users of Flux don't care about what device is allocated to them, I think this works fine.
What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the whole point of calling this a device instead of a backend is that we'd allow users to choose which device they want their model to be transferred onto. If that's not feasible because of limitations in the way GPU packages must be used, I'd rather just call these backends instead. Others might have differing opinions on this, however, cc @CarloLucibello from earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the whole point of calling this a device instead of a backend is that we'd allow users to choose which device they want their model to be transferred onto. If that's not feasible because of limitations in the way GPU packages must be used, I'd rather just call these backends instead. Others might have differing opinions on this, however, cc @CarloLucibello from earlier.
Yes, I agree. Also, if a user wants to have finer control over which device they want to use, isn't it better for them to just rely on CUDA.jl for example?
If not, I think it won't be hard to add a device selection capability within Flux as well. But ultimately, we will be calling functions from GPU packages, which the user can just call themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'm fine with either; also, if we are to implement an interface for handling multiple devices, wouldn't it be a good idea to first discuss the overall API we want, and the specific implementation details we need (asking because I'm not completely aware of what all I'll have to implement to handle multiple devices)?
For instance, when we are talking about "multiple devices", do we mean providing the user the functionality to use "just one device", but have the ability to choose which one? Or do we mean using multiple devices simultaneously to train models? For the latter I was going through DaggerFlux.jl and it seems it's more non-trivial. The first idea seems easier to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in the middle I think. Training on multiple GPUs is out of scope for this PR (we have other efforts looking into that), but allowing users to transfer models to any active GPU without calling device!
beforehand every time would be great for ergonomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in the middle I think. Training on multiple GPUs is out of scope for this PR (we have other efforts looking into that), but allowing users to transfer models to any active GPU without calling
device!
beforehand every time would be great for ergonomics.
Sure, I think this shouldn't be too hard to implement. I have one idea for this.
Device methods
We will have the following methods:
function get_device()
# this will be what we have right now
# this returns an `AbstractDevice` whose deviceID
# is the device with which the GPU package has been
# initialized automatically
end
function get_device(backend::Type{<:KA.GPU}, ordinal::UInt)
# this will return an `AbstractDevice` from the given backend whose deviceID
# is the device with the given ordinal number. These methods will be defined
# in the corresponding package extensions.
end
With these functions, users can then specify the backend + ordinal of the GPU device which they want to work with.
Model transfer between devices
Next, suppose we have a model m
which is bound to an AbstractDevice
, say device1
, which has a backend1::Type{<:KA.GPU}
and an ordinal1::UInt
. Suppose device2
is another device object with backend2::Type{<:KA.GPU}
and ordinal2::UInt
.
Then, a call to device2(m)
will do the following: if backend1 == backend2
and ordinal1 == ordinal2
, then nothing happens and m
is returned. Otherwise, device1
is "freed" of m
(we'll have to do some device memory management here) and is bound to device2
.
In the above, the tricky part is how to identify the GPU backend + ordinal which m
is bound to, and how to do free the memory taken by m
on the device. For simple models like Dense
, I can do the following
# suppose the backend is CUDA
julia> using Flux, CUDA;
julia> m = Dense(2 => 3) |> gpu;
julia> CUDA.device(m.weight) # this gives me the device to which m is bound
CuDevice(0): NVIDIA GeForce GTX 1650
julia> CUDA.unsafe_free!(m.weight) ; # just an idea, but something similar
Now clearly, I can't do something similar if m
is a complex model. So we'll probably have to add some property to models which stores the device backend + ordinal to which they are bound.
Regarding the freeing the GPU device memory: for CUDA for example, we can probably use the CUDA.unsafe_free!
method. But it might be unsafe for a reason.
How does this idea sound, @ToucheSir @CarloLucibello? Any pointers/suggestions on how to track which device a model is bound to and how to do the memory management?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the actual data movement adaptors (e.g. FluxCUDAAdaptor
) receives the device ID as an argument, then you only need to apply your detect + free logic at the level of individual parameters. fmap
will take care of mapping the logic over a complex model.
In the simple case we are talking about, every parameter in the model should be bound to the same device. In general, model parallelism means that a model could be across multiple devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only thing we need to worry about is "can I move this array to this device the user asked for?" Which sounds simple but might be tricky in practice if the GPU packages don't provide a way to do that directly. I hope here's a relatively straightforward way for most of them, but if not we can save that for future work and/or bug upstream to add it in for us :)
to package extensions.
let's add some tests and get this PR merged, discussions on device selection can be done somewhere else |
@CarloLucibello sure, I'm fine with it. I was trying to implement @darsnack's and @ToucheSir's idea on data transfer, but if it's better I can make a new issue for discussing those ideas and a new PR to implement it. Also, I haven't touched the old code (except for minor changes), and haven't added any new tests either. But Nightly CI is still failing. How do I fix that? |
Nightly CI has been failing for a while, ignore it. |
After talking with Tim and thinking it over I think per-device movement should be as simple as
If that turns out to be too much work, I agree with Carlo's suggestion. |
I've added a few device selection tests. Also, I found a bug in |
test/runtests.jl
Outdated
@test typeof(Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["Metal"]]) <: Flux.FluxMetalDevice | ||
device = Flux.get_device() | ||
|
||
if Metal.functional() | ||
@test typeof(Flux.DEVICES[][Flux.GPU_BACKEND_ORDER["Metal"]].deviceID) <: Metal.MTLDevice | ||
@test typeof(device) <: Flux.FluxMetalDevice | ||
@test typeof(device.deviceID) <: Metal.MTLDevice | ||
@test Flux._get_device_name(device) in Flux.supported_devices() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not clutter test/runtests.jl
with these tests. They can go within the ext_*
folder.
Also, we need tests checking the x |> device
transfers data correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll move the tests to the extensions files.
Regarding x |> device
tests: the extensions have a massive test suite for the gpu
function. Under the hood, x |> device
is also calling a gpu
function; do I need to write all the same test cases for x |> device
as well? Or just simple tests like checking GPU array types suffices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple tests are enough
test/runtests.jl
Outdated
@testset "CUDA" begin | ||
include("ext_cuda/device_selection.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include("ext_cuda/device_selection.jl") |
let's group these tests under the single file "get_devices.jl"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Please let me know, if other tests need to be added (like more tests for x |> device
on other types).
Fantastic work @codetalker7, thanks |
Thank you! |
This PR addresses issue #2293 by creating a
device
object to be used instead of thegpu
function. This method was proposed by @CarloLucibello in the mentioned issue, and has a few advantages. The implementation has been inspired by Lux's approach to handling GPU backends.Currently, this is just a draft PR, containing the high-level idea. The main addition is the
AbstractDevice
type, along with four concrete types representing devices for different GPU backends (and a device representing the CPU).As an example, we can now do the following: (for the below examples, I had stored
AMD
as mygpu_backend
preference)Here is the same example, now using
CUDA
:PR Checklist
get_device
.Flux.get_device
directly loads the preferences, are the referencesCUDA_LOADED
,AMDGPU_LOADED
andMETAL_LOADED
needed? On a similar note, thegpu(x)
function, and theGPUBACKEND
global isn't really needed anymore.DataLoader
support for device objects.Closes #2293.