Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using DifferentiationInterface in SymbolicRegression #319

Open
MilesCranmer opened this issue Jun 17, 2024 · 25 comments
Open

Using DifferentiationInterface in SymbolicRegression #319

MilesCranmer opened this issue Jun 17, 2024 · 25 comments
Labels
downstream Related to downstream compatibility

Comments

@MilesCranmer
Copy link

MilesCranmer commented Jun 17, 2024

Hi @gdalle,

Awesome package, thanks for making it!

I am considering whether to use this as the backend interface of SymbolicRegression.jl to allow the user to pick an AD backend (current attempt). However, one thing I was wondering if you would consider is designating a mapping from Symbol to the correct AbstractADType? I could implement this within my package, but it seems like something that should live here for others as well.

Basically, I would prefer that my users do not need to install both Zygote.jl and DifferentiationInterface.jl (or ADTypes) in their environment to set Zygote as the backend. I would rather they just pass, e.g., :Zygote and then have my package, which depends on DifferentiationInterface.jl, be able to internally map to the right AbstractADType.

Within DifferentiationInterface.jl this could look like:

function lookup_backend(key::Symbol)
    return ADTYPE_MAP[key]
end

const ADTYPE_MAP = Dict{Symbol,AbstractADType}([
    :Zygote => AutoZygote(),
    :Enzyme => AutoEnzyme(),
    :ForwardDiff => AutoForwardDiff(),
])

Then within my package, I can have my Options constructor look like this:

using DifferentiationInterface: lookup_backend, AbstractADType

function Options(autodiff_backend::Union{AbstractADType,Symbol})
    backend = autodiff_backend isa Symbol ? lookup_backend(autodiff_backend) : autodiff_backend

    return Options{typeof(backend)}(backend)
end

which means the user has a more lightweight interface; they can just pass the symbol:

using SymbolicRegression, Zygote

options = Options(:Zygote)

And then I map to the right type internally.

I guess I could also re-export all the abstract types but it feels a bit overkill. That would also mean I would be unable to provide helpful error messages if they forget the name of the backend and pass in the wrong symbol.

What do you think?

Cheers,
Miles

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

However, one thing I was wondering if you would consider is designating a mapping from Symbol to the correct AbstractADType?

Indeed that used to be a widespread approach, but a Symbol is not enough to fully represent a choice of AD backend. That's why e.g. Optimization.jl switched to ADTypes.jl. The reasons are described in more detail in the ADTypes.jl README.
I fear that adding a layer of translation from Symbols to ADTypes.AbstractADType would only result in competing standards, as well as suboptimal default choices for all the things a Symbol cannot express.

Basically, I would prefer that my users do not need to install both Zygote.jl and DifferentiationInterface.jl (or ADTypes) in their environment to set Zygote as the backend.

Why is that? If they use Zygote.jl as the backend, it will be installed anyway. And AD backends are infinitely heavier than ADTypes.jl or DifferentiationInterface.jl. ADTypes.jl has zero dependency on Julia >= 1.9, and DifferentiationInterface.jl is very lightweight too. Besides, if you use them internally, they'll be in the Manifest.toml anyway.

which means the user has a more lightweight interface; they can just pass the symbol:

To me, the following interface is even better, because it does not introduce a separate Symbol mapping, and instead leverages ADTypes.jl directly. Keep in mind that users may already be familiar with this system if they have ever used a SciML package.

using ADTypes, SymbolicRegression, Zygote

options = Options(AutoZygote())

@gdalle gdalle added downstream Related to downstream compatibility core Related to the core utilities of the package labels Jun 17, 2024
@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

And by the way, I'd be thrilled to help you get DifferentiationInterface.jl up and running with SymbolicRegression.jl. Don't hesitate to ping me for reviews or advice in that PR

@MilesCranmer
Copy link
Author

MilesCranmer commented Jun 17, 2024

Thanks!

It's not so much the extra dependency as SymbolicRegression.jl would of course already depend on DifferentiationInterface.jl if this goes through. The issue is I would be asking beginner users to install a third package in their top level environment, even after I have already asked them to install the AD backend package itself, just so they can set one of the parameters of the search.

Indeed that used to be a widespread approach, but a Symbol is not enough to fully represent a choice of AD backend.

I agree with this, in the example I gave above I allow both AbstractADType and a Symbol.

using DifferentiationInterface: lookup_backend, AbstractADType

function Options(autodiff_backend::Union{AbstractADType,Symbol})
    backend = autodiff_backend isa Symbol ? lookup_backend(autodiff_backend) : autodiff_backend

    return Options{typeof(backend)}(backend)
end

So advanced users can install ADTypes and customize the backend as they desire, but beginner users (which SymbolicRegression.jl gets a lot of from the Python frontend PySR) still can just pass in a :Zygote – which is probably good enough in 95% of cases.

(Even package extensions themselves I find to be a distraction for beginners, which is why PySR just automatically installs stuff as the user needs it – https://github.com/MilesCranmer/PySR/blob/89e991d9d7a5e34025bdbbe9076a8fd0892f04cc/pysr/julia_extensions.py#L8-L36 – rather than asking the user to install each package themselves. I just find it to be another barrier for people using my package.)

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

I understand your point, and I do think it makes a lot of sense to offer an easier setting, especially for people coming from Python. But I don't think this Symbol mapping should be part of DifferentiationInterface.jl, if only because I try to make as few assumptions as possible.

The best of both worlds might be to automatically turn a Symbol into an ADTypes.AbstractADType as follows: :Package -> ADTypes.AutoPackage(). In most cases this will work (except for a few minor backends which always require arguments) and yield a reasonable backend object.
That way you don't have to maintain an explicit mapping, and you automatically benefit from e.g. new backends that might be added to ADTypes.jl.

julia> using ADTypes

julia> s = :Zygote
:Zygote

julia> auto_s = Symbol(:Auto, s)
:AutoZygote

julia> backend = @eval $auto_s()
AutoZygote()

julia> backend isa AbstractADType
true

There might be some subtleties as to which eval you should use / in which scope (I'm not very confident in metaprogramming) but you get the gist.

@MilesCranmer
Copy link
Author

MilesCranmer commented Jun 17, 2024

The reason I thought it might make sense to offer such a mechanism in this package is so that we can avoid having different ways of solving this. I am guessing that other packages may want to offer similar functionality in the future. So it would be nice to have a unified mapping from symbol to backend type.

Even something like

function Auto(key::Symbol; options...)
    backend_type = lookup_type(key)
    return backend_type(; options...)
end

Which would let me pass extra backend configuration options from the user.

Then I can import this function internally in my package, and allow the user to pass a symbol and autodiff configuration to my options constructor, without needing the user to manually install ADTypes.

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

I don't think a unified mapping from Symbol to AbstractADType always makes sense. For all I know, other people may want a different mapping, or they may use symbols that don't correspond to an Auto struct at all, like :best_backend_for_linear_regression.
For DifferentiationInterface.jl, the ground truth is ADTypes.jl, and I want to avoid opinionated decisions on how to interpret a Symbol. If there are more people that need this mapping I might reconsider (maybe @Vaibhavdixit02 or @avikpal have opinions on this), but as a first step, I think parsing the Symbol as the second part of AutoPackage and adding kwargs is a reasonable thing to do on your end.
@adrhill thoughts?

@MilesCranmer
Copy link
Author

I guess I feel like (symbol; kws...) -> Auto{symbol}(; kws...) is quite an innocent choice?

You could even make it like

function Auto{key}(; options...)
    backend_type = lookup_type(key)
    return backend_type(; options...)
end

rather than needing AutoEnzyme, AutoZygote, etc., which need to be explicitly imported.

So, to summarise my thoughts:

The issue with having the user install ADTypes.jl themselves is that it presents a small barrier to usage and more mental overhead. While it wouldn't slow down performance, it would mean another line in the user's Project.toml file, the extra effort of needing to run add ADTypes, as well as simply the user needing to figure this out in the first place, rather than simply passing in :Zygote which I can document pretty easily.

The two workarounds are:

  1. Re-export ADTypes myself.
    • The issue with this is that I would either need to re-export all types (which is dangerous, in case of any future conflicts), or re-export the current set of types (meaning I would need to keep this up-to-date manually).
  2. Write my own mapping from symbol => type.
    • This has the same issue of needing to keep this up-to-date manually with all of the types in ADTypes
    • This also introduces the problem of proliferating multiple standards for doing this, leading to a similar problem to the one this package was designed to solve.
    • If I get around this by using the ADType.eval(symbol("Auto", ad_backend)) trick, it is a bit dangerous (eval in general runs into https://en.wikipedia.org/wiki/Arbitrary_code_execution), and also might lead to issues with precompilation due to manual calls of eval

@adrhill
Copy link
Collaborator

adrhill commented Jun 17, 2024

The reason I thought it might make sense to offer such a mechanism in this package is so that we can avoid having different ways of solving this. I am guessing that other packages may want to offer similar functionality in the future.

From my POV, this is why ADTypes exists in the first place–it is the standard you are asking for. ADTypes are basically symbols with optional configuration options:

A natural approach is to use a keyword argument with e.g. Bool or Symbol values.
Let's see a few examples to understand why this is not enough:

  • autodiff = true: ambiguous, we don't know which AD package should be used
  • autodiff = :forward: ambiguous, there are several AD packages implementing both forward and reverse mode (and there are other modes beyond that)
  • autodiff = :Enzyme: ambiguous, some AD packages can work both in forward and reverse mode
  • autodiff = (:Enzyme, :forward): not too bad, but many AD packages require additional configuration (number of chunks, tape compilation, etc.)

A more involved struct is thus required, with package-specific parameters.
If every AD user develops their own version of said struct, it will ruin interoperability.
This is why ADTypes.jl provides a single set of shared types for this task, as an extremely lightweight dependency.
They are types and not enums because we need AD choice information statically to use it for dispatch.

As Guillaume mentioned, there are some very large packages putting their weight behind ADTypes as a unified interface to select backends:

Keep in mind that users may already be familiar with this system if they have ever used a SciML package.

Maybe reexporting ADTypes is an option if you don't want users to have to load the package explicitly? (EDIT: just saw you addressed this simultaneously.)

@MilesCranmer
Copy link
Author

MilesCranmer commented Jun 17, 2024

@adrhill oops, our messages crossed. Regarding this:

Maybe reexporting ADTypes is an option if you don't want users to have to load the package explicitly?

It's a bit tricky, because either I:

  1. re-export everything in the package (which is a bit dangerous for a few reasons – including symbol conflicts, and the "spooky action at a distance" effect of not being explicit about what symbol comes from where), or
  2. I keep a list of currently exported AD types myself, and regularly update it. (But this a bit of a pain)

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

My issue is that if I make this translation available in DI (or in ADTypes itself for that matter), I somehow endorse it and encourage people to use it. But I don't want to, because there should only be one standard. Additionally, starting from a Symbol makes everything else type-unstable, unlike a properly defined AD backend.
Thus keeping a single standard seems worth the small mental overhead to me.

@adrhill
Copy link
Collaborator

adrhill commented Jun 17, 2024

Additionally, if you ever want to use our second order or sparse AD functionality1, the amount of symbols will grow combinatorially.

Footnotes

  1. Would be interesting to discuss whether this could have some applications in symbolic regression. We're still on the look-out for possible applications!

@MilesCranmer
Copy link
Author

MilesCranmer commented Jun 17, 2024

Additionally, starting from a Symbol makes everything else type-unstable, unlike a properly defined AD backend.

Actually this wouldn't be an issue.

First, this wouldn't redefine anything, it's just making an mapping that packages can use for easy user-side APIs. But a user would always be able to pass in a full ADType instead to the options if they wish.

Second, if the symbol is fixed in some code, Julia would just inline it and it would be stable.

Additionally, if you ever want to use our sparse AD functionality1, the amount of symbols will grow combinatorially.

This also wouldn't be an issue – the symbol here would only be the type specification. Like this:

function Auto(symbol, options...; kws...)

    if symbol == :Zygote
        AutoZygote(options...; kws...)
    elseif symbol == :Enzyme
        AutoEnzyme(options...; kws...)
    end
end

So, in your example, it would be

Auto(
    :Sparse,
    Auto(:ForwardDiff);
    sparsity_detector=TracerSparsityDetector(),
    coloring_algorithm=GreedyColoringAlgorithm(),
)

which could be done without needing to re-export all the ADTypes.

@MilesCranmer
Copy link
Author

MilesCranmer commented Jun 17, 2024

By the way, just curious, why not have

const AutoZygote = Auto{:Zygote}
const AutoSparse = Auto{:Sparse}
const AutoEnzyme = Auto{:Enzyme}

instead of AutoZygote, AutoSparse, etc? This would also solve the problem as I can simply map the user's passed symbol to Auto{symbol} inside my constructor, rather than need the user to install and import AutoZygote themselves. They can of course pass options here as well, like

Auto{:Sparse}(
    Auto{:ForwardDiff}();
    sparsity_detector=TracerSparsityDetector(),
    coloring_algorithm=GreedyColoringAlgorithm(),
)

so you don't lose extensibility.

I mean in general I should say I don't feel too strongly about this. It just seems like it would be pretty useful for designing user-facing APIs.

I note that will have to do some symbol => type mapping anyways, because of PySR (users can't specify Julia types there). I felt like it would be bad practice to come up with my own AD symbol => AD type mapping when others may want to use one as well (not to mention there might be more AD types in the future, so should probably be centralized and kept up-to-date).

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

This also wouldn't be an issue – the symbol here would only be the type specification. Like this:

function Auto(symbol, options...; kws...)

   if symbol == :Zygote
       AutoZygote(options...; kws...)
   elseif symbol == :Enzyme
       AutoEnzyme(options...; kws...)
   end
end

So, in your example, it would be

Auto(
   :Sparse,
   Auto(:ForwardDiff);
   sparsity_detector=TracerSparsityDetector(),
   coloring_algorithm=GreedyColoringAlgorithm(),
)

You do realize that this is not exactly diminishing the mental load of the user ;) This is essentially reinventing ADTypes with symbols.

By the way, just curious, why not have

const AutoZygote = Auto{:Zygote}
const AutoSparse = Auto{:Sparse}
const AutoEnzyme = Auto{:Enzyme}

instead of AutoZygote, AutoSparse, etc?

For the same reason that symbols are not specific enough, symbol wrappers won't work either. I encourage you to take a look at the ADTypes documentation if you want to understand what we actually put inside each backend struct, and why it matters.

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

I note that will have to do some symbol => type mapping anyways, because of PySR (users can't specify Julia types there).

To me this is the main (or only) argument in favor of symbols: calling Julia autodiff from Python or other languages. Everything else can be solved from within Julia with minimal efforts, but this seems to be a hard limit.

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

Although even that limit can be overcome with juliacall, right?

from juliacall import Main as jl
jl.seval("using ADTypes")
backend = jl.AutoZygote()

@MilesCranmer
Copy link
Author

MilesCranmer commented Jun 17, 2024

You do realize that this is not exactly diminishing the mental load of the user ;) This is essentially reinventing ADTypes with symbols.

Sorry I was just giving that as an example in regards to the comment that "amount of symbols will grow combinatorially". For that example in particular you would probably just want the user to install ADTypes, so that they can get really custom with it. Which my request is not opposed to!

Advanced users are not as sensitive to mental overheads :) It's really just the newbies where it would be great to have shortcuts for.

In the cases where the user is just selecting a backend like enzyme or zygote or forward diff, just allowing a symbol would be nice. And then for the reasons to solve this here rather than elsewhere, I reference my comments in #319 (comment).

Although even that limit can be overcome with juliacall, right?

Indeed it can. But,

  1. I want my beginner users to try out multiple backends as well – for them, I don't want to ask them to write Julia code unless they are really doing custom stuff. Picking a backend can 95% of the time be a single symbol, so I think that would be nice.
  2. As your example shows, asking new users to do custom stuff with juliacall can result in some unexpected issues: Importing juliacall before PySR skips the default configuration that PySR imposes: https://github.com/MilesCranmer/PySR/blob/master/pysr/julia_import.py. (This is technically an issue with juliacall itself, in that there's no way to provide env-specific configuration, but it also shows the reason I don't want to ask pure-Python users to mess around with juliacall :))

@MilesCranmer
Copy link
Author

For the same reason that symbols are not specific enough, symbol wrappers won't work either. I encourage you to take a look at the ADTypes documentation if you want to understand what we actually put inside each backend struct, and why it matters.

Sorry I am going off topic here but couldn't you just do this as follows?

struct Auto{S,C} <: AbstractADType
    options::C
    Auto{S}(; options...) = (nt = NamedTuple(options); new{S,typeof(nt)}(nt))
end
Base.getproperty(a::Auto, k::Symbol) = getproperty(getfield(a, :options), k)

const AutoZygote = Auto{:Zygote}
const AutoEnzyme = Auto{:Enzyme}
...

Since options are stored in a named tuple, all the relevant info is typed, no? This should even be backwards compatible I think.

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

That's actually not a bad idea for ADTypes.jl in the long run. In the meantime I opened SciML/ADTypes.jl#62 to see what other maintainers think. Can you play around with that branch in your DI integration attempt, and let me know how it works? Set ADTypes compat to 1.3.1 to be sure that you get the right version.

@MilesCranmer
Copy link
Author

Sounds good to me

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

Apart from this issue, which is getting solved, any other troubles when trying to integrate DifferentiationInterface.jl? What are you using it for?

@MilesCranmer
Copy link
Author

Basically just to let the user choose what AD backend is used during constant optimization. The only issues left are just from the different backends being incompatible (only Zygote/ChainRules works at the moment), but this is the fault of other backends, not the interface.

(I’m not sure if Enzyme will work though, as usually I would have to declare what variables are being used as mutable storage?)

@gdalle
Copy link
Owner

gdalle commented Jun 17, 2024

Basically just to let the user choose what AD backend is used during constant optimization.

Note that for some backends you'll get an epic performance boost by using preparation, especially in small dimension. What is the typical number of constants you optimize?

The only issues left are just from the different backends being incompatible (only Zygote/ChainRules works at the moment)

Does ForwardDiff work at least?

I’m not sure if Enzyme will work though, as usually I would have to declare what variables are being used as mutable storage?

Well, the whole point of DifferentiationInterface is that you get to try it out for free!
Using Enzyme optimally from DI is very hard without making DI just a frontend for Enzyme. But if you have an example for which it fails, don't hesitate to open an issue here.

Related discussions:

@MilesCranmer
Copy link
Author

Note that for some backends you'll get an epic performance boost by using preparation, especially in small dimension. What is the typical number of constants you optimize?

I usually can't use preparation because the gradients change for every expression. But the number of constants is typically ~1-10. For parametrized expressions (the reason I'm adding Zygote support) it can be up to 100.

Does ForwardDiff work at least?

For ForwardDiff I unfortunately have to use a separate function because it requires changing the element type.

Well, the whole point of DifferentiationInterface is that you get to try it out for free!
Using Enzyme optimally from DI is very hard without making DI just a frontend for Enzyme.

Yeah, indeed it seems hard to do so. I get the following when I try –

nested task error: Enzyme execution failed.
            Mismatched activity for:   %value_phi10 = phi {} addrspace(10)* [ %17, %L49 ], [ %getfield, %L46 ] const val:   %getfield = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %16 unordered, align 8, !dbg !211, !tbaa !215, !alias.scope !189, !noalias !210, !dereferenceable_or_null !216, !align !217
            Type tree: {[-1]:Pointer}
             llvalue=  %getfield = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %16 unordered, align 8, !dbg !211, !tbaa !215, !alias.scope !189, !noalias !210, !dereferenceable_or_null !216, !align !217
            You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now
            
            Stacktrace:
             [1] getproperty
               @ ./Base.jl:37
             [2] getindex
               @ ./subarray.jl:323
             [3] set_constants!
               @ ~/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/NodeUtils.jl:93

I wonder if there's some generic way to declare storage in a way that would pass the required info to Enzyme as well as any future backend that might require it? Otherwise will need to have an if-statement over backend which seems non-optimal. Especially because performance is so key to my use-case.

@gdalle gdalle changed the title Default mapping from Symbol? Using DifferentiationInterface in SymbolicRegression Jun 18, 2024
@gdalle gdalle removed the core Related to the core utilities of the package label Jun 18, 2024
@gdalle
Copy link
Owner

gdalle commented Jun 18, 2024

For ForwardDiff I unfortunately have to use a separate function because it requires changing the element type.

I'm curious now, why can't your code be type-agnostic?

Yeah, indeed it seems hard to do so. I get the following when I try –

Do you have a full reproducible example?

I wonder if there's some generic way to declare storage in a way that would pass the required info to Enzyme as well as any future backend that might require it?

This is related to, but different from, the issue about supporting active and inactive variables. And it's exactly what I said about Enzyme earlier: it is so different from other backends, and offers so many more options, that if I want to include every one I'll turn DI into a shallow frontend for Enzyme only. It's a question I haven't solved yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downstream Related to downstream compatibility
Projects
None yet
Development

No branches or pull requests

3 participants