Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a macro to opt-in to fancy printing, and to everything else #1932

Merged
merged 7 commits into from
Mar 5, 2024

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Apr 7, 2022

Originally, @big_show

This adds a macro which lets you tell Flux to treat some container type much the same way as it treats Chain, starting the recursive printing walk over all children.

Prompted by #1929. Closes #2044

Metalhead also has show methods for top-level layers which enable the recursive printing, which could be replaced by this macro:

https://github.com/FluxML/Metalhead.jl/blob/aba6fb832093d88dc2d2b4d5b1d2d63a0f21eb9c/src/Metalhead.jl#L53-L57

https://github.com/FluxML/Metalhead.jl/blob/c8f0a88e4d24274c53b62c2b740822aa6a781709/src/utilities.jl#L49-L52

Searching for other uses on github, I see only two:

https://github.com/avik-pal/ExplicitFluxLayers.jl/blob/8e1cff447afda225bc12144ff27ae1370e8fc3da/src/show_layers.jl

https://github.com/maxfreu/SegmentationModels.jl/blob/7bfdbaa438910baf49543f03f2931de765dbd761/src/unet.jl#L99-L112

Later, @layer

Now aims to also replace @functor, in addition to handling show.

Discussed at #2028 -- a layer supertype is another way to make opting into pretty printing easier. (We wouldn't do both, so this closes #2028 .)

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable

src/layers/show.jl Outdated Show resolved Hide resolved
@ToucheSir
Copy link
Member

Looks reasonable, so if you'd let me indulge in some light bikeshedding how about @show_expanded over @big_show?

@mcabbott
Copy link
Member Author

mcabbott commented Apr 7, 2022

Not at all set on the name.

Another question is whether there should be a matching way to disable recursion. Overloading _show_leaflike(::Diagonal) to make LayerNorm not expand is slightly weird. It's a hack to re-use the fact that layers all of whose children area leaflike aren't expanded, and has side-effects like this:

julia> Chain(Flux.Scale(2), Flux.Scale(2))
Chain(Scale(2), Scale(2))  # 8 parameters

julia> Chain(Flux.Scale(2), Flux.Scale(2), Dense(2=>2))
Chain(
  Scale(2),                             # 4 parameters
  Scale(2),                             # 4 parameters
  Dense(2 => 2),                        # 6 parameters
)                   # Total: 6 arrays, 14 parameters, 440 bytes.

If this needs a public face, then perhaps it should be more like @show_noexpand LayerNorm i.e. a property of the container, not the contents.

@CarloLucibello
Copy link
Member

show_leaflike(x::MyType) = true allowing to customize on the specific object seems more flexible and it feels ok to me as an interface, I would prefer that over a macro.

We could also merge this PR now since it seems ready and expose that interface later

@mcabbott
Copy link
Member Author

mcabbott commented Aug 23, 2022

6370374 upgrades this to a replacement for @functor. The idea is to have just one magic macro do several things for us, maybe that's less confusing. It would also allow any back-end changes later, once Flux owns the macro. Discussed at #2028 (as an alternative to the supertype idea there).

Still WIP, but roughly works, I think.

Comment on lines 16 to 17
* In fact you can provide an arbitrary keyword with this syntax, and it will
overload this function alla `trainable`... that might be a terrible idea.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a start, we could limit the allowable keyword list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Nice to pick a notation that can grow if desired though.

@layer Dense
@layer :expand Chain
@layer BatchNorm trainable=(β,γ)
@layer Struct functor=(α,β) trainable=(β,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, perhaps functor could be aliased to something less obscure? params might be a name conflict but is the most familiar (perhaps parameters? Too cheeky?). buffers is close to PyTorch but still kind of obscure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leaves or children?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do. I guess I wanted to match the function name, and if this one is obscure, that's OK, you should almost never use it. children does match a function name.

Copy link
Member

@ToucheSir ToucheSir Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

children also has the benefit of having the same signature as trainable: () -> Union{Tuple,NamedTuple,Array,...} instead of () -> Tuple{Union{Tuple,NamedTuple,Array,...}, Function} for functor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version uses children as the keyword.

@mcabbott mcabbott force-pushed the showmacro branch 2 times, most recently from 85252bf to 8981283 Compare August 24, 2022 21:21
@mcabbott mcabbott changed the title Add a macro to opt-in to fancy printing Add a macro to opt-in to fancy printing, and to everything else Aug 24, 2022
@mcabbott mcabbott marked this pull request as ready for review August 25, 2022 14:57
function _check_new_macro(x::T) where T
Functors.isleaf(x) && return
Base.depwarn("This type should probably now use `Flux.@layer` instead of `@functor`: $T", Symbol("@functor"))
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I toned this warning down from @warn. I believe this makes the PR non-breaking, could be in a 0.13 release. Possibly 0.14 should always warn?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related question. Currently many people use Flux.@functor. We could define such a macro as a pass-through to @layer, which will print a depwarn once on the macro call. Should we?

src/layers/macro.jl Outdated Show resolved Hide resolved
else
remake(nt) = Base.typename(T).wrapper(map(f -> f in which ? getfield(nt, f) : getfield(x, f), fieldnames(T))...)
NamedTuple{which}(map(s -> getfield(x, s), which)), remake
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path when you specify children isn't as fast. Maybe this doesn't matter for now, since no layer within Flux does so.

Copy link
Member

@ToucheSir ToucheSir Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be why Functors.jl uses @eval. A generated block seems fine though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure it can be done with a @generated block. I was just lazy to actually write and check and benchmark it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done with @generated? It seems you can't define closures in generated functions, and we need to close over x.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's true. But perhaps it can instead return a struct like Remake{(:some, :fields)}(x, constructor) which gets some fields from x and some from what it's called on. But I didn't try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also horrified by the number of ways to customise recursive behaviour we have, and I wonder if this one should just be forbidden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever, let me try that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait isn't returning a Remake struct the same as returning a closure (in terms of performance not whether it can be in a generated function)? I thought the benefit here was that you can manually write out the iteration accessing the fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would guess so. If necc. its call could itself be a generated thing, but quite likely some ntuple / map would just compile down.

Not precisely sure what list of fields this thing ought to be passed, the ones you get or the ones you don't or some mix such that merge sorts it out?

Comment on lines -355 to +346
@functor BatchNorm
trainable(bn::BatchNorm) = hasaffine(bn) ? (β = bn.β, γ = bn.γ) : (;)
@layer BatchNorm trainable=(β,γ)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalisation layers all have permanent trainable, but the fields are nothing when not in use, should be safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Really, the affine params should be wrapped up in a Scale sublayer like LayerNorm has, but that's a PR for another day.

src/layers/recurrent.jl Outdated Show resolved Hide resolved
@@ -1,22 +1,30 @@
@nospecialize # just for this file, for startup time
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes show is visibly slow on first loading. This doesn't really help, a test with @time show(stdout, MIME"text/plain"(), model); goes from 2.28s to 2.15s.

@@ -216,7 +216,7 @@ m(5) # => 26
Flux provides a set of helpers for custom layers, which you can enable by calling

```julia
Flux.@functor Affine
Flux.@layer Affine
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"set of helpers for custom layers" is more accurate after than before. Where else would docs need changing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a go at editing that. It's quite a crazy scheme we have going, where trainable, functor, adapt define 3 different recursions. Do we really need to ways to customise functor?

@ToucheSir
Copy link
Member

Doctest failure looks real, but I'm not sure why it's only showing up now.

@mcabbott
Copy link
Member Author

mcabbott commented Aug 26, 2022

Doc failure seems to be that this doesn't work on 1.6, where the first argument is a vector:

julia> Dense([3.3;;], false)
Dense(1 => 1; bias=false)  # 1 parameters

Downstream tests didn't run, not sure why.

@ToucheSir ToucheSir closed this Aug 26, 2022
@ToucheSir ToucheSir reopened this Aug 26, 2022
@mcabbott
Copy link
Member Author

Failures look unrelated:

  • GeometricFlux.jl I see only create_bias failures, Make create_bias a public API? #2049
  • FastAI.jl has failures like "Expression: isempty(stderr_content) [1105] Evaluated: isempty("┌ Warning: ignore(f) is deprecated, use ChainRulesCore.ignore_derivatives(f) instead.\n│ caller ="
  • AtomicGraphNets.jl has failures like "MethodError: no method matching needs_concrete_A(::Nothing) Closest candidates are: needs_concrete_A(::LinearSolve.AbstractFactorization) at ~/.julia/packages/LinearSolve/AoYJI/src/LinearSolve.jl:34"
  • OperatorLearning.jl has create_bias only

@ToucheSir
Copy link
Member

So just the doctest fixup for 1.6?

fixup

tidy up, add NEWS

review suggestions

macro docstring, incl. hcat(3.3)
Copy link

codecov bot commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 70.19231% with 31 lines in your changes are missing coverage. Please review.

Project coverage is 73.91%. Comparing base (20d516b) to head (63e10f6).

Files Patch % Lines
src/layers/attention.jl 9.52% 19 Missing ⚠️
src/layers/macro.jl 81.66% 11 Missing ⚠️
src/layers/normalise.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1932       +/-   ##
===========================================
+ Coverage   42.55%   73.91%   +31.35%     
===========================================
  Files          31       32        +1     
  Lines        1786     1909      +123     
===========================================
+ Hits          760     1411      +651     
+ Misses       1026      498      -528     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcabbott
Copy link
Member Author

Rebased in 2024. We should just do this, right?

The biggest questions is: Should this support children at all? Or can we simply forbid that... do we really need 4 ways to customise things, alla #2385? Maybe if you want to hack things with Functors.jl directly you are welcome, but Flux.jl can pretend this doesn't exist.

@darsnack
Copy link
Member

I agree. Given that you almost never want to set children, having to support and explain children vs trainable is just confusing for users. Anyone wanting to restrict children probably needs to take a minute to understand @functor first.

@mcabbott
Copy link
Member Author

mcabbott commented Mar 2, 2024

OK, children is gone.

This should be ready to merge, I think. As always there may be further cleaning up of the docs possible.

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this.

@mcabbott mcabbott merged commit 5e80211 into FluxML:master Mar 5, 2024
18 of 20 checks passed
@mcabbott mcabbott deleted the showmacro branch March 5, 2024 20:38
isentropic pushed a commit to isentropic/Flux.jl that referenced this pull request Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Macro to display model struct the way Flux does
4 participants