Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freezingdocs #2397

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

freezingdocs #2397

wants to merge 2 commits into from

Conversation

isentropic
Copy link

@isentropic isentropic commented Mar 13, 2024

Rewrite of the prev pull #2385 , god i hate git
@ToucheSir

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.94%. Comparing base (9061b79) to head (d3b800b).

Current head d3b800b differs from pull request most recent head 4331dc8

Please upload reports for the commit 4331dc8 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2397       +/-   ##
===========================================
+ Coverage   45.59%   73.94%   +28.34%     
===========================================
  Files          32       32               
  Lines        1895     1911       +16     
===========================================
+ Hits          864     1413      +549     
+ Misses       1031      498      -533     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 45 to 89
## Static freezing per model definition
Sometimes some parts of the model ([`Flux.@layer`](@ref)) needn't to be trained at all but these params
still need to reside on the GPU (these params are still needed in the forward
and/or backward pass).
```julia
struct MaskedLayer{T}
chain::Chain
mask::T
end
Flux.@layer MyLayer trainable=(chain,)
# mask field will not be updated in the training loop

function (m::MaskedLayer)(x)
# mask field will still move to to gpu for efficient operations:
return m.chain(x) + x + m.mask
end

model = MaskedLayer(...) # this model will not have the `mask` field trained
```
Note how this method permanently sets some model fields to be excluded from
training without on-the-fly changing.

## Excluding from model definition
Sometimes some parameters aren't just "not trainable" but they shouldn't even
transfer to the GPU (or be part of the functor). All scalar fields are like this
by default, so things like learning rate multipliers are not trainable nor
transferred to the GPU by default.
```julia
struct CustomLayer{T, F}
chain::Chain
activation_results::Vector{F}
lr_multiplier::Float32
end
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`

function (m::CustomLayer)(x)
result = m.chain(x) + x

# `activation_results` are not part of the GPU loop, hence we could do
# things like `push!`
push!(m.activation_results, mean(result))
return result
end
```
See more about this in [`Flux.@functor`](@ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great. I also wonder if it'd belong better in advanced.md. This docs page can then have a big, prominent link to the relevant sections in that file with a one or two-sentence explanation to look there for methods of including and excluding parts of the model at definition time.

To elaborate, I still think there is utility in separating out concepts and tools which apply at definition time vs those which are dynamic and apply at "runtime". That's not to say they can't be mentioned on the same page, but they should be introduced separately. One way to bridge the gap might be to retain reduced code snippets on this page alongside the link(s) I mentioned earlier. e.g.

# struct CustomLayer chain; activation_results; end
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`

@isentropic
Copy link
Author

@ToucheSir @mcabbott sorry for the time delay i got carried away. I updated the docs as you requested. Let me know your new comments. Hopefully this section could help with some of the frustrations I had for the new users

@ToucheSir
Copy link
Member

Thanks for the update. I'd say my comment at #2397 (comment) still applies, but if you feel strongly about it I'd recommend leaving that section out for now and we can look at merging the rest.

@isentropic
Copy link
Author

Oh I meant to incorporate your comment, maybe I couldn't understand completely what you meant. Your big "prominent links" part means like links in the subheadings directly?

  • I feel like I did the ”move to advanced" part of your comments

Could you just state more clearly what you suggested, I'll be happy to incorporate

@ToucheSir
Copy link
Member

I think what happened is a couple of file names changed, so what should've been a clear ask became slightly more confusion. Just to put it in words though, the ask is to move the content under the sections "Static freezing per model definition" and "Excluding from model definition" to https://github.com/FluxML/Flux.jl/blob/master/docs/src/guide/models/custom_layers.md. That way, we co-locate code snippets that talk about model definition instead of making users look across multiple pages for them.

To ensure users know that this examples are also useful for excluding parameters from training, the freezing docs page can then link to the two headings under custom_layers.md. Kind of a "see also" with maybe a couple of sentences explaining why the linked sections are relevant.

@isentropic
Copy link
Author

I tried to do this the way you suggested but it feels even more haphazard this way. custom_layers.md don't have any models that are defined via struct ... end all the models there are chains and other blocks provided by Flux, so I cannot reuse any definitions. I really tried to somehow merge it, but its just bad I feel.

Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze. The whole point seems gone, because I tried to summarize this already spread apart information into a single page, and seems like your suggestions are to split it again.

I suggest to merge it as is. Addition of a new page shouldn't hurt that much because it is very focused and specific to model freezing.

@ToucheSir
Copy link
Member

custom_layers.md don't have any models that are defined via struct ... end all the models there are chains and other blocks provided by Flux, so I cannot reuse any definitions.

That doesn't sound like https://fluxml.ai/Flux.jl/stable/guide/models/custom_layers/. Every section on that page has an example layer defined with struct ... end.

Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze.

The fundamental tension here is about what "freezing" means. In my mind, freezing parameters of a layer or model implies the ability to "thaw" those parameters later. Dynamic freezing using freeze! and thaw! does this, for example. This is also in line with the language and expectations around freezing in PyTorch and JAX (which links to https://optax.readthedocs.io/en/latest/api/optimizer_wrappers.html#masked-update).

In contrast, excluding model fields from training is treated differently. In this case, the non-trainable-ness or non-parameter-ness (depending on the library) of an array is an inherent property of it. One can not thaw it, nor is it stopped from updating (as "frozen" might imply) because it can still be mutable—think BatchNorm running stats. It's illustrative that the APIs libraries use here are distinctly named and documented from the ones they have for freezing for training. See for example PyTorch's register_buffer or the parameter vs state dichotomy in Flax. Flux follows a similar philosophy with Flux.@layer MyLayer trainable=(fields...,).

@isentropic
Copy link
Author

isentropic commented Jul 17, 2024

I honestly tried to do it your way, but I hated what it looked like at the end. I just thought having one more standalone page of docs is okay as long as it's all related + you can delete this one page later with minimal effort.

I am in the shoes of the learner of the library and to me this PR feels cleaner and better than trying to move things all over the place. I even indicate at the top of the page that the topics are a bit disconnected. If you still feel otherwise, please change it the way you deem appropriate.

You can take the commits partially or as a whole or close the PR all-together. At this point I don't wanna argue about a few written paragraphs anymore.

I thank both of you (@ToucheSir @mcabbott) for feedback given in the previous thread, and also the tips you've given me in the slack channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants