-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
freezingdocs #2397
base: master
Are you sure you want to change the base?
freezingdocs #2397
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2397 +/- ##
===========================================
+ Coverage 45.59% 73.94% +28.34%
===========================================
Files 32 32
Lines 1895 1911 +16
===========================================
+ Hits 864 1413 +549
+ Misses 1031 498 -533 ☔ View full report in Codecov by Sentry. |
## Static freezing per model definition | ||
Sometimes some parts of the model ([`Flux.@layer`](@ref)) needn't to be trained at all but these params | ||
still need to reside on the GPU (these params are still needed in the forward | ||
and/or backward pass). | ||
```julia | ||
struct MaskedLayer{T} | ||
chain::Chain | ||
mask::T | ||
end | ||
Flux.@layer MyLayer trainable=(chain,) | ||
# mask field will not be updated in the training loop | ||
|
||
function (m::MaskedLayer)(x) | ||
# mask field will still move to to gpu for efficient operations: | ||
return m.chain(x) + x + m.mask | ||
end | ||
|
||
model = MaskedLayer(...) # this model will not have the `mask` field trained | ||
``` | ||
Note how this method permanently sets some model fields to be excluded from | ||
training without on-the-fly changing. | ||
|
||
## Excluding from model definition | ||
Sometimes some parameters aren't just "not trainable" but they shouldn't even | ||
transfer to the GPU (or be part of the functor). All scalar fields are like this | ||
by default, so things like learning rate multipliers are not trainable nor | ||
transferred to the GPU by default. | ||
```julia | ||
struct CustomLayer{T, F} | ||
chain::Chain | ||
activation_results::Vector{F} | ||
lr_multiplier::Float32 | ||
end | ||
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results` | ||
|
||
function (m::CustomLayer)(x) | ||
result = m.chain(x) + x | ||
|
||
# `activation_results` are not part of the GPU loop, hence we could do | ||
# things like `push!` | ||
push!(m.activation_results, mean(result)) | ||
return result | ||
end | ||
``` | ||
See more about this in [`Flux.@functor`](@ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great. I also wonder if it'd belong better in advanced.md
. This docs page can then have a big, prominent link to the relevant sections in that file with a one or two-sentence explanation to look there for methods of including and excluding parts of the model at definition time.
To elaborate, I still think there is utility in separating out concepts and tools which apply at definition time vs those which are dynamic and apply at "runtime". That's not to say they can't be mentioned on the same page, but they should be introduced separately. One way to bridge the gap might be to retain reduced code snippets on this page alongside the link(s) I mentioned earlier. e.g.
# struct CustomLayer chain; activation_results; end
Flux.@functor CustomLayer (chain, ) # Explicitly leaving out `activation_results`
d3b800b
to
4331dc8
Compare
@ToucheSir @mcabbott sorry for the time delay i got carried away. I updated the docs as you requested. Let me know your new comments. Hopefully this section could help with some of the frustrations I had for the new users |
Thanks for the update. I'd say my comment at #2397 (comment) still applies, but if you feel strongly about it I'd recommend leaving that section out for now and we can look at merging the rest. |
Oh I meant to incorporate your comment, maybe I couldn't understand completely what you meant. Your big "prominent links" part means like links in the subheadings directly?
Could you just state more clearly what you suggested, I'll be happy to incorporate |
I think what happened is a couple of file names changed, so what should've been a clear ask became slightly more confusion. Just to put it in words though, the ask is to move the content under the sections "Static freezing per model definition" and "Excluding from model definition" to https://github.com/FluxML/Flux.jl/blob/master/docs/src/guide/models/custom_layers.md. That way, we co-locate code snippets that talk about model definition instead of making users look across multiple pages for them. To ensure users know that this examples are also useful for excluding parameters from training, the freezing docs page can then link to the two headings under |
I tried to do this the way you suggested but it feels even more haphazard this way. Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze. The whole point seems gone, because I tried to summarize this already spread apart information into a single page, and seems like your suggestions are to split it again. I suggest to merge it as is. Addition of a new page shouldn't hurt that much because it is very focused and specific to model freezing. |
That doesn't sound like https://fluxml.ai/Flux.jl/stable/guide/models/custom_layers/. Every section on that page has an example layer defined with
The fundamental tension here is about what "freezing" means. In my mind, freezing parameters of a layer or model implies the ability to "thaw" those parameters later. Dynamic freezing using In contrast, excluding model fields from training is treated differently. In this case, the non-trainable-ness or non-parameter-ness (depending on the library) of an array is an inherent property of it. One can not thaw it, nor is it stopped from updating (as "frozen" might imply) because it can still be mutable—think BatchNorm running stats. It's illustrative that the APIs libraries use here are distinctly named and documented from the ones they have for freezing for training. See for example PyTorch's |
I honestly tried to do it your way, but I hated what it looked like at the end. I just thought having one more standalone page of docs is okay as long as it's all related + you can delete this one page later with minimal effort. I am in the shoes of the learner of the library and to me this PR feels cleaner and better than trying to move things all over the place. I even indicate at the top of the page that the topics are a bit disconnected. If you still feel otherwise, please change it the way you deem appropriate. You can take the commits partially or as a whole or close the PR all-together. At this point I don't wanna argue about a few written paragraphs anymore. I thank both of you (@ToucheSir @mcabbott) for feedback given in the previous thread, and also the tips you've given me in the slack channel. |
Rewrite of the prev pull #2385 , god i hate git
@ToucheSir