FluxML · CarloLucibello · Jul 12, 2023 · Jul 10, 2023 · Jul 10, 2023 · Jul 10, 2023
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@
 
 Flux is an elegant approach to machine learning. It's a 100% pure-Julia stack, and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable.
 
-Works best with [Julia 1.8](https://julialang.org/downloads/) or later. Here's a very short example to try it out:
+Works best with [Julia 1.9](https://julialang.org/downloads/) or later. Here's a very short example to try it out:
 ```julia
 using Flux, Plots
 data = [([x], 2x-x^3) for x in -2:0.1f0:2]

diff --git a/docs/src/gpu.md b/docs/src/gpu.md
@@ -1,11 +1,22 @@
 # GPU Support
 
-NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) readme.
+Starting with v0.14, Flux doesn't force a specific GPU backend and the corresponding package dependencies on the users. 
+Thanks to the [package extension mechanism](
+https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)) introduced in julia v1.9, Flux conditionally load GPU specific code once a GPU package is made available (e.g. through `using CUDA`).
+
+NVIDIA GPU support requires the packages `CUDA.jl` and `cuDNN.jl` to be installed in the environment. In the julia REPL, type `] add CUDA, cuDNN` to install them. For more details see the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) readme.
 
 AMD GPU support is available since Julia 1.9 on systems with ROCm and MIOpen installed. For more details refer to the [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) repository.
 
 Metal GPU acceleration is available on Apple Silicon hardware. For more details refer to the [Metal.jl](https://github.com/JuliaGPU/Metal.jl) repository. Metal support in Flux is experimental and many features are not yet available.
 
+In order to trigger GPU support in Flux, you need to call `using CUDA`, `using AMDGPU` or `using Metal`
+in your code. Notice that for CUDA, explicitely loading also `cuDNN` is not required, but the package has to be installed in the environment. 
+
+
+!!! compat "Flux ≤ 0.13"
+    Old versions of Flux automatically installed CUDA.jl to provide GPU support. Starting from Flux v0.14, CUDA.jl is not a dependency anymore and has to be installed manually.
+
 ## Checking GPU Availability
 
 By default, Flux will run the checks on your system to see if it can support GPU functionality. You can check if Flux identified a valid GPU setup by typing the following:

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -8,7 +8,8 @@ Flux is a library for machine learning. It comes "batteries-included" with many
 
 ### Installation
 
-Download [Julia 1.9](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt. This will automatically install several other packages, including [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) for Nvidia GPU support.
+Download [Julia 1.9](https://julialang.org/downloads/) or later, preferably the current stable release. You can add Flux using Julia's package manager, by typing `] add Flux` in the Julia prompt. 
+For Nvidia GPU support, you will also need to install the `CUDA` and the `cuDNN` packages. For AMD GPU support, install the `AMDGPU` package. For acceleration on Apple Silicon, install the `Metal` package.
 
 ### Learning Flux
 

diff --git a/docs/src/models/advanced.md b/docs/src/models/advanced.md
@@ -69,9 +69,9 @@ However, doing this requires the `struct` to have a corresponding constructor th
 
 When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`.
 
-!!! compat "Flux ≤ 0.13"
+!!! compat "Flux ≤ 0.14"
     The mechanism described here is for Flux's old "implicit" training style.
-    When upgrading for Flux 0.14, it should be replaced by [`freeze!`](@ref Flux.freeze!) and `thaw!`.
+    When upgrading for Flux 0.15, it should be replaced by [`freeze!`](@ref Flux.freeze!) and `thaw!`.
 
 Consider a simple multi-layer perceptron model where we want to avoid optimising the first two `Dense` layers. We can obtain
 this using the slicing features `Chain` provides:

diff --git a/docs/src/models/layers.md b/docs/src/models/layers.md
@@ -29,7 +29,7 @@ Perhaps `Scale` isn't quite fully connected, but it may be thought of as `Dense(
 
 !!! compat "Flux ≤ 0.12"
     Old versions of Flux accepted only `Dense(in, out, act)` and not `Dense(in => out, act)`.
-    This notation makes a `Pair` object. If you get an error like `MethodError: no method matching Dense(::Pair{Int64,Int64})`, this means that you should upgrade to Flux 0.13.
+    This notation makes a `Pair` object. If you get an error like `MethodError: no method matching Dense(::Pair{Int64,Int64})`, this means that you should upgrade to newer Flux versions.
 
 
 ## Convolution Models

diff --git a/docs/src/models/quickstart.md b/docs/src/models/quickstart.md
@@ -5,8 +5,8 @@ If you have used neural networks before, then this simple example might be helpf
 If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.
 
 ```julia
-# With Julia 1.7+, this will prompt if neccessary to install everything, including CUDA:
-using Flux, Statistics, ProgressMeter
+# This will prompt if neccessary to install everything, including CUDA:
+using Flux, CUDA, Statistics, ProgressMeter
 
 # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
 noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
@@ -102,7 +102,7 @@ for epoch in 1:1_000
 end
 ```
 
-!!! compat "Implicit-style training, Flux ≤ 0.13"
+!!! compat "Implicit-style training, Flux ≤ 0.14"
     Until recently Flux's training worked a bit differently. 
     Any code which looks like 
     ```
@@ -113,5 +113,5 @@ end
     train!((x,y) -> loss(model, x, y), Flux.params(model), loader, opt)
     ```
     (with `Flux.params`) is in the old "implicit" style.
-    This still works on Flux 0.13, but will be removed from Flux 0.14.
+    This still works on Flux 0.14, but will be removed from Flux 0.15.
     See the [training section](@ref man-training) for more details.
diff --git a/docs/src/training/reference.md b/docs/src/training/reference.md
@@ -10,7 +10,7 @@ Because of this:
 * Flux defines its own version of `setup` which checks this assumption.
   (Using instead `Optimisers.setup` will also work, they return the same thing.)
 
-The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.13, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
+The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
 The available rules are listed the [optimisation rules](@ref man-optimisers) page here;
 see the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.
 
@@ -37,11 +37,11 @@ Optimisers.freeze!
 Optimisers.thaw!
 ```
 
-## Implicit style (Flux ≤ 0.13)
+## Implicit style (Flux ≤ 0.14)
 
 Flux used to handle gradients, training, and optimisation rules quite differently.
 The new style described above is called "explicit" by Zygote, and the old style "implicit".
-Flux 0.13 is the transitional version which supports both; Flux 0.14 will remove the old.
+Flux 0.13 and 0.14 are the transitional version which supports both; Flux 0.15 will remove the old.
 
 !!! compat "How to upgrade"
     The blue-green boxes in the [training section](@ref man-training) describe

diff --git a/docs/src/training/training.md b/docs/src/training/training.md
@@ -65,14 +65,14 @@ It is also important that every `update!` step receives a newly gradient compute
 as this will be change whenever the model's parameters are changed, and for each new data point.
 
 !!! compat "Implicit gradients"
-    Flux ≤ 0.13 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
+    Flux ≤ 0.14 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
     It looks like this:
     ```
     pars = Flux.params(model)
     grad = gradient(() -> loss(model(input), label), pars)
     ```
     Here `pars::Params` and `grad::Grads` are two dictionary-like structures.
-    Support for this will be removed from Flux 0.14, and these blue (teal?) boxes
+    Support for this will be removed from Flux 0.15, and these blue (teal?) boxes
     explain what needs to change.
 
 ## Loss Functions
@@ -90,7 +90,7 @@ like [`mse`](@ref Flux.Losses.mse) for mean-squared error or [`crossentropy`](@r
 are available from the [`Flux.Losses`](../models/losses.md) module.
 
 !!! compat "Implicit-style loss functions"
-    Flux ≤ 0.13 needed a loss function which closed over a reference to the model,
+    Flux ≤ 0.14 needed a loss function which closed over a reference to the model,
     instead of being a pure function. Thus in old code you may see something like
     ```
     loss(x, y) = sum((model(x) .- y).^2)
@@ -211,7 +211,7 @@ Or explicitly writing the anonymous function which this `do` block creates,
 !!! compat "Implicit-style `train!`"
     This is a new method of `train!`, which takes the result of `setup` as its 4th argument.
     The 1st argument is a function which accepts the model itself.
-    Flux versions ≤ 0.13 provided a method of `train!` for "implicit" parameters,
+    Flux versions ≤ 0.14 provided a method of `train!` for "implicit" parameters,
     which works like this:
     ```
     train!((x,y) -> loss(model(x), y), Flux.params(model), train_set, Adam())
@@ -342,7 +342,7 @@ for epoch in 1:1000
 end
 ```
 
-!!! compat "Flux ≤ 0.13"
+!!! compat "Flux ≤ 0.14"
     With the old "implicit" optimiser, `opt = Adam(0.1)`, the equivalent was to
     directly mutate the `Adam` struct, `opt.eta = 0.001`. 
 
@@ -374,7 +374,7 @@ train!(loss, bimodel, data, opt_state)
 Flux.thaw!(opt_state)
 ```
 
-!!! compat "Flux ≤ 0.13"
+!!! compat "Flux ≤ 0.14"
     The earlier "implicit" equivalent was to pass to `gradient` an object referencing only
     part of the model, such as `Flux.params(bimodel.layers.enc)`.
 
@@ -383,7 +383,7 @@ Flux.thaw!(opt_state)
 
 Flux used to handle gradients, training, and optimisation rules quite differently.
 The new style described above is called "explicit" by Zygote, and the old style "implicit".
-Flux 0.13 is the transitional version which supports both.
+Flux 0.13 and 0.14 are the transitional versions which support both.
 
 The blue-green boxes above describe the changes.
 For more details on training in the implicit style, see [Flux 0.13.6 documentation](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).

diff --git a/docs/src/training/zygote.md b/docs/src/training/zygote.md
@@ -18,10 +18,10 @@ Zygote.hessian_reverse
 Zygote.diaghessian
 ```
 
-## Implicit style (Flux ≤ 0.13)
+## Implicit style (Flux ≤ 0.14)
 
 Flux used to use what Zygote calls "implicit" gradients, [described here](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1) in its documentation.
-However, support for this will be removed from Flux 0.14.
+However, support for this will be removed from Flux 0.15.
 
 !!! compat "Training"
     The blue-green boxes in the [training section](@ref man-training) describe

diff --git a/docs/src/utilities.md b/docs/src/utilities.md
@@ -49,7 +49,6 @@ These functions call:
 
 ```@docs
 Flux.rng_from_array
-Flux.default_rng_value
 Flux.nfan
 ```
 

diff --git a/src/Flux.jl b/src/Flux.jl
@@ -10,7 +10,7 @@ using MacroTools: @forward
 using MLUtils
 import Optimisers: Optimisers, trainable, destructure  # before v0.13, Flux owned these functions
 using Optimisers: freeze!, thaw!, adjust!
-
+using Random: default_rng
 using Zygote, ChainRulesCore
 using Zygote: Params, @adjoint, gradient, pullback
 using Zygote.ForwardDiff: value

diff --git a/src/deprecations.jl b/src/deprecations.jl
@@ -1,19 +1,3 @@
-# v0.12 deprecations
-
-function ones(dims...)
-  Base.depwarn("Flux.ones(size...) is deprecated, please use Flux.ones32(size...) or Base.ones(Float32, size...)", :ones, force=true)
-  Base.ones(Float32, dims...)
-end
-ones(T::Type, dims...) = Base.ones(T, dims...)
-
-function zeros(dims...)
-  Base.depwarn("Flux.zeros(size...) is deprecated, please use Flux.zeros32(size...) or Base.zeros(Float32, size...)", :zeros, force=true)
-  Base.zeros(Float32, dims...)
-end
-zeros(T::Type, dims...) = Base.zeros(T, dims...)
-
-ones32(::Type, dims...) = throw(ArgumentError("Flux.ones32 is always Float32, use Base.ones to specify the element type"))
-zeros32(::Type, dims...) = throw(ArgumentError("Flux.zeros32 is always Float32, use Base.zeros to specify the element type"))
 
 # v0.13 deprecations
 
@@ -59,7 +43,7 @@ function loadparams!(m, xs)
 end
 
 # Channel notation: Changed to match Conv, but very softly deprecated!
-# Perhaps change to @deprecate for v0.14, but there is no plan to remove these.
+# Perhaps change to @deprecate for v0.15, but there is no plan to remove these.
 Dense(in::Integer, out::Integer, σ = identity; kw...) =
   Dense(in => out, σ; kw...)
 Bilinear(in1::Integer, in2::Integer, out::Integer, σ = identity; kw...) =
@@ -86,7 +70,7 @@ Base.@deprecate_binding Data Flux false "Sub-module Flux.Data has been removed.
 
 @deprecate paramtype(T,m) _paramtype(T,m) false  # internal method, renamed to make this clear
 
-@deprecate rng_from_array() default_rng_value()
+@deprecate rng_from_array() Random.default_rng()
 
 function istraining()
   Base.depwarn("Flux.istraining() is deprecated, use NNlib.within_gradient(x) instead", :istraining)
@@ -216,13 +200,17 @@ ChainRulesCore.@non_differentiable _greek_ascii_depwarn(::Any...)
 
 
 # v0.14 deprecations
+@deprecate default_rng_value() Random.default_rng()
+
+
+# v0.15 deprecations
 
-# Enable these when 0.14 is released, and delete const ClipGrad = Optimise.ClipValue etc: 
+# Enable these when 0.15 is released, and delete const ClipGrad = Optimise.ClipValue etc: 
 # Base.@deprecate_binding Optimiser OptimiserChain
 # Base.@deprecate_binding ClipValue ClipGrad
 
 # train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
-#   """On Flux 0.14, `train!` no longer accepts implicit `Zygote.Params`.
+#   """On Flux 0.15, `train!` no longer accepts implicit `Zygote.Params`.
 #   Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
 #   it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
 #   where `loss_mxy` accepts the model as its first argument.

diff --git a/src/layers/normalise.jl b/src/layers/normalise.jl
@@ -71,9 +71,9 @@ mutable struct Dropout{F<:Real,D,R<:AbstractRNG}
   active::Union{Bool, Nothing}
   rng::R
 end
-Dropout(p::Real, dims, active) = Dropout(p, dims, active, default_rng_value())
+Dropout(p::Real, dims, active) = Dropout(p, dims, active, default_rng())
 
-function Dropout(p::Real; dims=:, active::Union{Bool,Nothing} = nothing, rng = default_rng_value())
+function Dropout(p::Real; dims=:, active::Union{Bool,Nothing} = nothing, rng = default_rng())
   0 ≤ p ≤ 1 || throw(ArgumentError("Dropout expects 0 ≤ p ≤ 1, got p = $p"))
   Dropout(p, dims, active, rng)
 end
@@ -125,8 +125,8 @@ mutable struct AlphaDropout{F,R<:AbstractRNG}
   rng::R
 end
 
-AlphaDropout(p, active) = AlphaDropout(p, active, default_rng_value())
-function AlphaDropout(p; rng = default_rng_value(), active::Union{Bool,Nothing} = nothing)
+AlphaDropout(p, active) = AlphaDropout(p, active, default_rng())
+function AlphaDropout(p; rng = default_rng(), active::Union{Bool,Nothing} = nothing)
   0 ≤ p ≤ 1 || throw(ArgumentError("AlphaDropout expects 0 ≤ p ≤ 1, got p = $p"))
   AlphaDropout(p, active, rng)
 end
@@ -520,7 +520,7 @@ function GroupNorm(chs::Int, G::Int, λ=identity;
               eps::Real=1f-5, momentum::Real=0.1f0, ϵ=nothing)
 
   if track_stats
-  Base.depwarn("`track_stats=true` will be removed from GroupNorm in Flux 0.14. The default value is `track_stats=false`, which will work as before.", :GroupNorm)
+    Base.depwarn("`track_stats=true` will be removed from GroupNorm in Flux 0.15. The default value is `track_stats=false`, which will work as before.", :GroupNorm)
   end
   ε = _greek_ascii_depwarn(ϵ => eps, :GroupNorm, "ϵ" => "eps")
 

diff --git a/src/optimise/optimisers.jl b/src/optimise/optimisers.jl
@@ -566,7 +566,7 @@ that will be fed into the next, and this is finally applied to the parameter as
 usual.
 
 !!! note
-    This will be replaced by `Optimisers.OptimiserChain` in Flux 0.14.
+    This will be replaced by `Optimisers.OptimiserChain` in Flux 0.15.
 """
 mutable struct Optimiser <: AbstractOptimiser
   os::Vector{Any}
@@ -704,7 +704,7 @@ end
 Clip gradients when their absolute value exceeds `thresh`.
 
 !!! note
-    This will be replaced by `Optimisers.ClipGrad` in Flux 0.14.
+    This will be replaced by `Optimisers.ClipGrad` in Flux 0.15.
 """
 mutable struct ClipValue{T} <: AbstractOptimiser
     thresh::T

diff --git a/src/optimise/train.jl b/src/optimise/train.jl
@@ -16,7 +16,7 @@ As a result, the parameters are mutated and the optimiser's internal state may c
 The gradient could be mutated as well.
 
 !!! compat "Deprecated"
-    This method for implicit `Params` (and `AbstractOptimiser`) will be removed from Flux 0.14.
+    This method for implicit `Params` (and `AbstractOptimiser`) will be removed from Flux 0.15.
     The explicit method `update!(opt, model, grad)` from Optimisers.jl will remain.
 """
 function update!(opt::AbstractOptimiser, x::AbstractArray, x̄)
@@ -46,7 +46,7 @@ Call `Flux.skip()` in a callback to indicate when a callback condition is met.
 This will trigger the train loop to skip the current data point and not update with the calculated gradient.
 
 !!! note
-    `Flux.skip()` will be removed from Flux 0.14
+    `Flux.skip()` will be removed from Flux 0.15
 
 # Examples
 ```julia
@@ -56,7 +56,7 @@ end
 ```
 """
 function skip()
-  Base.depwarn("""Flux.skip() will be removed from Flux 0.14.
+  Base.depwarn("""Flux.skip() will be removed from Flux 0.15.
                   and should be replaced with `continue` in an ordinary `for` loop.""", :skip)
   throw(SkipException())
 end
@@ -71,7 +71,7 @@ Call `Flux.stop()` in a callback to indicate when a callback condition is met.
 This will trigger the train loop to stop and exit.
 
 !!! note
-    `Flux.stop()` will be removed from Flux 0.14. It should be replaced with `break` in an ordinary `for` loop.
+    `Flux.stop()` will be removed from Flux 0.15. It should be replaced with `break` in an ordinary `for` loop.
 
 # Examples
 ```julia
@@ -81,7 +81,7 @@ end
 ```
 """
 function stop()
-  Base.depwarn("""Flux.stop() will be removed from Flux 0.14.
+  Base.depwarn("""Flux.stop() will be removed from Flux 0.15.
                   It should be replaced with `break` in an ordinary `for` loop.""", :stop)
   throw(StopException())
 end
@@ -96,7 +96,7 @@ Uses a `loss` function and training `data` to improve the
 model's parameters according to a particular optimisation rule `opt`.
 
 !!! compat "Deprecated"
-    This method with implicit `Params` will be removed from Flux 0.14.
+    This method with implicit `Params` will be removed from Flux 0.15.
     It should be replaced with the explicit method `train!(loss, model, data, opt)`.
 
 For each `d in data`, first the gradient of the `loss` is computed like this:
@@ -167,7 +167,7 @@ Run `body` `N` times. Mainly useful for quickly doing multiple epochs of
 training in a REPL.
 
 !!! note
-    The macro `@epochs` will be removed from Flux 0.14. Please just write an ordinary `for` loop.
+    The macro `@epochs` will be removed from Flux 0.15. Please just write an ordinary `for` loop.
 
 # Examples
 ```julia
@@ -179,7 +179,7 @@ hello
 ```
 """
 macro epochs(n, ex)
-  Base.depwarn("""The macro `@epochs` will be removed from Flux 0.14.
+  Base.depwarn("""The macro `@epochs` will be removed from Flux 0.15.
                   As an alternative, you can write a simple `for i in 1:epochs` loop.""", Symbol("@epochs"), force=true)
   :(@progress for i = 1:$(esc(n))
       @info "Epoch $i"