Failures on some backends for `pullback` on GPU #373

lassepe · 2024-07-21T20:52:35Z

As discussed on slack, I am sharing an MWE to reproduce the issues with some backends on the GPU that I hit. I believe that most of these failures are very representative of the failures that I have seen also when using the low-level API of each backend so I believe that, largely, these are not DI's fault. But as suggested, I am sharing these anyway:

MWE

using Flux: Flux
using CUDA: CUDA
using Chairmarks: @be
using DifferentiationInterface: DifferentiationInterface as DI
# various DI backends that support pullback
using Zygote: Zygote
using Enzyme: Enzyme
using Tracker: Tracker
using ReverseDiff: ReverseDiff
using Diffractor: Diffractor
using FiniteDiff: FiniteDiff
using Tapir: Tapir

# Without this, Enzyme errors out with "You may be using a constant variable as temporary storage for active memory"
Enzyme.API.runtimeActivity!(true)

function main(device = Flux.gpu, test_forward_pass = false)
    # this just a toy model; in the real setting, the model has ~1M parameters, ~512 inputs and outputs, and is a fully-convolutional network
    model = Flux.Chain(Flux.Dense(10, 32, Flux.relu), Flux.Dense(32, 10)) |> device
    data = randn(Float32, 10, 100) |> device
    labels = randn(Float32, 10, 100) |> device
    σ = 1.0f0
    dy1 = randn(size(data)) |> device
    dy2 = randn(size(data)) |> device
    original_size = size(data)

    function f(x_t)
        (model(x_t) - labels) .^ 2 / σ^2
    end

    # some of the backends require a flat input so this is a helper wrapper
    function f_flat(x_t)
        f(reshape(x_t, original_size))[:]
    end

    # testing forward pass to give a reference point
    @info "ForwardPass..."
    @time "✅ initial run" f(data)
    println("detailed benchmark:")
    @be(f(data)) |> display

    # NOTE:

    # only tested on the GPU
    backends = [
        DI.AutoZygote(), # ~0.05s
        DI.AutoEnzyme(), # compiles for quite long, then errors out due to augmented forward pass custom rule type mismatch.
        DI.AutoTracker(), # errors with `Tracker.TrackedReal{Float32} is a mutable type
        DI.AutoReverseDiff(), # errors due because it generates code that tries to multiply CPU and GPU arrays
        DI.AutoDiffractor(), # ERROR: MethodError: no method matching ndims(::Tuple{Int64, Int64})
        DI.AutoFiniteDiff(), # ~18s
        DI.AutoTapir(), # ERROR: CUDA.CuPtr{Nothing} is a primitive type. Implement a method of `tangent_type` for it.
    ]
    # creating a lazy Jacobian-vector product operator
    for backend in backends
        try
            @info "VJP with DI.jl and $backend"
            local pullback_prep
            @time "✅ initial run" let
                @time "preparing pullback object..." pullback_prep =
                    DI.prepare_pullback_same_point(f_flat, backend, data[:], dy1[:])

                @time "using it once" DI.pullback(f_flat, backend, data[:], dy1[:], pullback_prep)
            end
            println("detailed benchmark:")
            @be(DI.pullback(f_flat, backend, data[:], dy2[:], pullback_prep)) |> display
        catch e
            @error "Error with $backend: $e"
            Base.show_backtrace(stdout, backtrace())
        end
    end

    nothing
end

Program Output
https://pastebin.com/9kf9qqhG

Version Info

This output is what I get from using the latest registered version of all backends in combination with JuliaGPU/CUDA.jl#2422 to facilitate Enzyme reverse diff on the GPU.

Click to see output of `]status -m `

Status `~/worktree/BugReports/DifferentiationInterface.jl-vjp-failures/Manifest.toml`
  [47edcb42] ADTypes v1.6.1
  [c29ec348] AbstractDifferentiation v0.6.2
  [621f4979] AbstractFFTs v1.5.0
  [1520ce14] AbstractTrees v0.4.5
  [7d9f7c33] Accessors v0.1.36
  [79e6a3ab] Adapt v4.0.4
  [dce04be8] ArgCheck v2.3.0
  [ec485272] ArnoldiMethod v0.4.0
  [4fba245c] ArrayInterface v7.12.0
  [a9b6321e] Atomix v0.1.0
  [ab4f0b2a] BFloat16s v0.5.0
  [198e06fe] BangBang v0.4.2
  [9718e550] Baselet v0.1.1
  [fa961155] CEnum v0.5.0
  [052768ef] CUDA v5.4.2 `https://github.com/wsmoses/CUDA.jl#renz`
  [1af6417a] CUDA_Runtime_Discovery v0.3.4
  [082447d4] ChainRules v1.69.0
  [d360d2e6] ChainRulesCore v1.24.0
  [0ca39b1e] Chairmarks v1.2.1
  [da1fd8a2] CodeTracking v1.3.5
  [3da002f7] ColorTypes v0.11.5
  [5ae59095] Colors v0.12.11
  [861a8166] Combinatorics v1.0.2
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v4.15.0
  [a33af91c] CompositionsBase v0.1.2
  [187b0558] ConstructionBase v1.5.5
  [6add18c4] ContextVariablesX v0.1.3
  [a8cc5b0e] Crayons v4.1.1
  [f68482b8] Cthulhu v2.12.7
  [9a962f9c] DataAPI v1.16.0
  [a93c6f00] DataFrames v1.6.1
  [864edb3b] DataStructures v0.18.20
  [e2d170a0] DataValueInterfaces v1.0.0
  [244e2a9f] DefineSingletons v0.1.2
  [8bb1440f] DelimitedFiles v1.9.1
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
  [de460e47] DiffTests v0.1.2
  [a0c0ee7d] DifferentiationInterface v0.5.9
⌃ [9f5e2b26] Diffractor v0.2.6
  [ffbed154] DocStringExtensions v0.9.3
  [7da242da] Enzyme v0.12.23
  [f151be2c] EnzymeCore v0.7.7
  [e2ba6199] ExprTools v0.1.10
  [cc61a311] FLoops v0.2.2
  [b9860ae5] FLoopsBase v0.1.1
  [1a297f60] FillArrays v1.11.0
  [6a86dc24] FiniteDiff v2.23.1
  [53c48c17] FixedPointNumbers v0.8.5
  [587475ba] Flux v0.14.16
  [1eca21be] FoldingTrees v1.2.1
  [f6369f11] ForwardDiff v0.10.36
  [069b7b12] FunctionWrappers v1.1.3
  [d9f16b24] Functors v0.4.11
  [0c68f7d7] GPUArrays v10.3.0
  [46192b85] GPUArraysCore v0.1.6
⌃ [61eb1bfa] GPUCompiler v0.26.5
  [86223c79] Graphs v1.11.2
  [7869d1d1] IRTools v0.4.14
  [d25df0c9] Inflate v0.1.5
  [22cec73e] InitialValues v0.3.1
  [842dd82b] InlineStrings v1.4.2
  [3587e190] InverseFunctions v0.1.15
  [41ab1584] InvertedIndices v1.3.0
  [92d709cd] IrrationalConstants v0.2.2
  [82899510] IteratorInterfaceExtensions v1.0.0
  [c3a54625] JET v0.9.6
  [692b3bcd] JLLWrappers v1.5.0
  [aa1ae85d] JuliaInterpreter v0.9.32
  [70703baa] JuliaSyntax v0.4.8
  [b14d175d] JuliaVariables v0.2.4
  [63c18a36] KernelAbstractions v0.9.22
⌅ [929cbde3] LLVM v7.2.1
  [8b046642] LLVMLoopInfo v1.0.0
  [b964fa9f] LaTeXStrings v1.3.1
  [2ab3a3ac] LogExpFunctions v0.3.28
  [6f1432cf] LoweredCodeUtils v2.4.8
  [d8e11817] MLStyle v0.4.17
  [f1d291b0] MLUtils v0.4.4
  [1914dd2f] MacroTools v0.5.13
  [128add7d] MicroCollections v0.2.0
  [e1d29d7a] Missings v1.2.0
  [dbe65cb8] MistyClosures v1.0.1
  [872c559c] NNlib v0.9.20
  [5da4648a] NVTX v0.3.4
  [77ba4419] NaNMath v1.0.2
  [71a1bf82] NameResolution v0.1.5
  [d8793406] ObjectFile v0.4.1
  [6fe1bfb0] OffsetArrays v1.14.1
  [0b1bfda6] OneHotArrays v0.2.5
  [3bd65402] Optimisers v0.3.3
  [bac558e1] OrderedCollections v1.6.3
  [65ce6f38] PackageExtensionCompat v1.0.2
  [2dfb63ee] PooledArrays v1.4.3
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [8162dcfd] PrettyPrint v0.2.0
  [08abe8d2] PrettyTables v2.3.2
  [33c8b6b6] ProgressLogging v0.1.4
  [74087812] Random123 v1.7.0
  [e6cf234a] RandomNumbers v1.5.3
  [c1ae055f] RealDot v0.1.0
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [37e2e3b7] ReverseDiff v1.15.3
  [6c6a2e73] Scratch v1.2.1
  [91c51154] SentinelArrays v1.4.5
  [efcf1570] Setfield v1.1.1
  [605ecd9f] ShowCases v0.1.0
  [699a6c99] SimpleTraits v0.9.4
  [a2af1166] SortingAlgorithms v1.2.1
  [dc90abb0] SparseInverseSubset v0.1.2
  [0a514795] SparseMatrixColorings v0.3.5
  [276daf66] SpecialFunctions v2.4.0
  [171d559e] SplittablesBase v0.1.15
  [90137ffa] StaticArrays v1.9.7
  [1e83bf80] StaticArraysCore v1.4.3
  [82ae8749] StatsAPI v1.7.0
  [2913bbd2] StatsBase v0.34.3
  [892a3eda] StringManipulation v0.3.4
  [09ab397b] StructArrays v0.6.18
  [53d494c1] StructIO v0.3.0
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.12.0
  [07d77754] Tapir v0.2.24
  [a759f4b9] TimerOutputs v0.5.24
  [9f7883ad] Tracker v0.2.34
  [28d57a85] Transducers v0.4.82
  [d265eb64] TypedSyntax v1.3.1
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.1.5
  [b8c1c048] WidthLimitedIO v1.0.1
  [e88e6eb3] Zygote v0.6.70
  [700de1a5] ZygoteRules v0.2.5
  [02a925ec] cuDNN v1.3.2
  [4ee394cb] CUDA_Driver_jll v0.9.1+1
  [76a88914] CUDA_Runtime_jll v0.14.1+0
⌅ [62b44479] CUDNN_jll v9.0.0+1
⌅ [7cc45869] Enzyme_jll v0.0.134+0
  [9c1d0b0a] JuliaNVTXCallbacks_jll v0.2.1+0
⌅ [dad2f222] LLVMExtra_jll v0.0.29+0
  [e98f9f5b] NVTX_jll v3.1.0+2
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.10.0
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays v1.10.0
  [10745b16] Statistics v1.10.0
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.4.0+0
  [e37daf67] LibGit2_jll v1.6.4+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.2+1
  [14a3606d] MozillaCACerts_jll v2023.1.10
  [4536629a] OpenBLAS_jll v0.3.23+4
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.2.1+1
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.8.0+1
  [8e850ede] nghttp2_jll v1.52.0+1
  [3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m`

gdalle · 2024-07-31T05:23:27Z

Thanks for reporting this @lassepe! I don't have a way to test on GPU during CI yet so a fix will have to wait, but I will be keeping this in mind

gdalle added bug Something isn't working backend Related to one or more autodiff backends labels Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failures on some backends for `pullback` on GPU #373

Failures on some backends for `pullback` on GPU #373

lassepe commented Jul 21, 2024

gdalle commented Jul 31, 2024

Failures on some backends for pullback on GPU #373

Failures on some backends for pullback on GPU #373

Comments

lassepe commented Jul 21, 2024

gdalle commented Jul 31, 2024

Failures on some backends for `pullback` on GPU #373

Failures on some backends for `pullback` on GPU #373