freezingdocs #2397

isentropic · 2024-03-13T05:07:48Z

Rewrite of the prev pull #2385 , god i hate git
@ToucheSir

codecov · 2024-03-13T05:26:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.94%. Comparing base (9061b79) to head (d3b800b).

❗ Current head d3b800b differs from pull request most recent head 4331dc8

Please upload reports for the commit 4331dc8 to get more accurate results.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2397       +/-   ##
===========================================
+ Coverage   45.59%   73.94%   +28.34%     
===========================================
  Files          32       32               
  Lines        1895     1911       +16     
===========================================
+ Hits          864     1413      +549     
+ Misses       1031      498      -533

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ToucheSir · 2024-03-15T03:23:44Z

docs/src/tutorials/misc-model-tweaking.md

+## Static freezing per model definition
+Sometimes some parts of the model ([`Flux.@layer`](@ref)) needn't to be trained at all but these params
+still need to reside on the GPU (these params are still needed in the forward
+and/or backward pass).
+```julia
+struct MaskedLayer{T}
+    chain::Chain
+    mask::T
+end
+Flux.@layer MyLayer trainable=(chain,)
+# mask field will not be updated in the training loop
+
+function (m::MaskedLayer)(x)
+    # mask field will still move to to gpu for efficient operations:
+  return m.chain(x) + x + m.mask
+end
+
+model = MaskedLayer(...) # this model will not have the `mask` field trained
+```
+Note how this method permanently sets some model fields to be excluded from
+training without on-the-fly changing.
+
+## Excluding from model definition
+Sometimes some parameters aren't just "not trainable" but they shouldn't even
+transfer to the GPU (or be part of the functor). All scalar fields are like this
+by default, so things like learning rate multipliers are not trainable nor
+transferred to the GPU by default.
+```julia
+struct CustomLayer{T, F}
+    chain::Chain
+    activation_results::Vector{F}
+    lr_multiplier::Float32
+end
+Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`
+
+function (m::CustomLayer)(x)
+    result = m.chain(x) + x
+
+    # `activation_results` are not part of the GPU loop, hence we could do
+    # things like `push!`
+    push!(m.activation_results, mean(result))
+    return result
+end
+```
+See more about this in [`Flux.@functor`](@ref)


I think this is great. I also wonder if it'd belong better in advanced.md. This docs page can then have a big, prominent link to the relevant sections in that file with a one or two-sentence explanation to look there for methods of including and excluding parts of the model at definition time.

To elaborate, I still think there is utility in separating out concepts and tools which apply at definition time vs those which are dynamic and apply at "runtime". That's not to say they can't be mentioned on the same page, but they should be introduced separately. One way to bridge the gap might be to retain reduced code snippets on this page alongside the link(s) I mentioned earlier. e.g.

# struct CustomLayer chain; activation_results; end Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`

isentropic · 2024-06-28T07:36:53Z

@ToucheSir @mcabbott sorry for the time delay i got carried away. I updated the docs as you requested. Let me know your new comments. Hopefully this section could help with some of the frustrations I had for the new users

ToucheSir · 2024-06-28T22:19:16Z

Thanks for the update. I'd say my comment at #2397 (comment) still applies, but if you feel strongly about it I'd recommend leaving that section out for now and we can look at merging the rest.

isentropic · 2024-06-29T03:29:33Z

Oh I meant to incorporate your comment, maybe I couldn't understand completely what you meant. Your big "prominent links" part means like links in the subheadings directly?

I feel like I did the ”move to advanced" part of your comments

Could you just state more clearly what you suggested, I'll be happy to incorporate

ToucheSir · 2024-07-04T14:34:15Z

I think what happened is a couple of file names changed, so what should've been a clear ask became slightly more confusion. Just to put it in words though, the ask is to move the content under the sections "Static freezing per model definition" and "Excluding from model definition" to https://github.com/FluxML/Flux.jl/blob/master/docs/src/guide/models/custom_layers.md. That way, we co-locate code snippets that talk about model definition instead of making users look across multiple pages for them.

To ensure users know that this examples are also useful for excluding parameters from training, the freezing docs page can then link to the two headings under custom_layers.md. Kind of a "see also" with maybe a couple of sentences explaining why the linked sections are relevant.

isentropic · 2024-07-15T08:53:59Z

I tried to do this the way you suggested but it feels even more haphazard this way. custom_layers.md don't have any models that are defined via struct ... end all the models there are chains and other blocks provided by Flux, so I cannot reuse any definitions. I really tried to somehow merge it, but its just bad I feel.

Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze. The whole point seems gone, because I tried to summarize this already spread apart information into a single page, and seems like your suggestions are to split it again.

I suggest to merge it as is. Addition of a new page shouldn't hurt that much because it is very focused and specific to model freezing.

ToucheSir · 2024-07-15T15:19:07Z

custom_layers.md don't have any models that are defined via struct ... end all the models there are chains and other blocks provided by Flux, so I cannot reuse any definitions.

That doesn't sound like https://fluxml.ai/Flux.jl/stable/guide/models/custom_layers/. Every section on that page has an example layer defined with struct ... end.

Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze.

The fundamental tension here is about what "freezing" means. In my mind, freezing parameters of a layer or model implies the ability to "thaw" those parameters later. Dynamic freezing using freeze! and thaw! does this, for example. This is also in line with the language and expectations around freezing in PyTorch and JAX (which links to https://optax.readthedocs.io/en/latest/api/optimizer_wrappers.html#masked-update).

In contrast, excluding model fields from training is treated differently. In this case, the non-trainable-ness or non-parameter-ness (depending on the library) of an array is an inherent property of it. One can not thaw it, nor is it stopped from updating (as "frozen" might imply) because it can still be mutable—think BatchNorm running stats. It's illustrative that the APIs libraries use here are distinctly named and documented from the ones they have for freezing for training. See for example PyTorch's register_buffer or the parameter vs state dichotomy in Flax. Flux follows a similar philosophy with Flux.@layer MyLayer trainable=(fields...,).

isentropic · 2024-07-17T03:22:30Z

I honestly tried to do it your way, but I hated what it looked like at the end. I just thought having one more standalone page of docs is okay as long as it's all related + you can delete this one page later with minimal effort.

I am in the shoes of the learner of the library and to me this PR feels cleaner and better than trying to move things all over the place. I even indicate at the top of the page that the topics are a bit disconnected. If you still feel otherwise, please change it the way you deem appropriate.

You can take the commits partially or as a whole or close the PR all-together. At this point I don't wanna argue about a few written paragraphs anymore.

I thank both of you (@ToucheSir @mcabbott) for feedback given in the previous thread, and also the tips you've given me in the slack channel.

ToucheSir reviewed Mar 15, 2024

View reviewed changes

ToucheSir added the documentation label Mar 15, 2024

freezingdocs

4331dc8

isentropic force-pushed the freezingdocs2 branch from d3b800b to 4331dc8 Compare June 28, 2024 06:37

update freezing docs

693604a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

freezingdocs #2397

freezingdocs #2397

isentropic commented Mar 13, 2024 •

edited

Loading

codecov bot commented Mar 13, 2024 •

edited

Loading

ToucheSir Mar 15, 2024

isentropic commented Jun 28, 2024

ToucheSir commented Jun 28, 2024

isentropic commented Jun 29, 2024

ToucheSir commented Jul 4, 2024

isentropic commented Jul 15, 2024

ToucheSir commented Jul 15, 2024

isentropic commented Jul 17, 2024 •

edited

Loading

freezingdocs #2397

Are you sure you want to change the base?

freezingdocs #2397

Conversation

isentropic commented Mar 13, 2024 • edited Loading

codecov bot commented Mar 13, 2024 • edited Loading

Codecov Report

ToucheSir Mar 15, 2024

Choose a reason for hiding this comment

isentropic commented Jun 28, 2024

ToucheSir commented Jun 28, 2024

isentropic commented Jun 29, 2024

ToucheSir commented Jul 4, 2024

isentropic commented Jul 15, 2024

ToucheSir commented Jul 15, 2024

isentropic commented Jul 17, 2024 • edited Loading

isentropic commented Mar 13, 2024 •

edited

Loading

codecov bot commented Mar 13, 2024 •

edited

Loading

isentropic commented Jul 17, 2024 •

edited

Loading