docs: improve freezing docs #2385

isentropic · 2024-02-29T10:40:38Z

@ToucheSir as per slack discussion

(Edit: closes #2216 )

codecov · 2024-02-29T11:03:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.94%. Comparing base (48a43db) to head (5514952).
Report is 8 commits behind head on master.

❗ Current head 5514952 differs from pull request most recent head 8f3fe20. Consider uploading reports for the commit 8f3fe20 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2385      +/-   ##
==========================================
- Coverage   83.55%   73.94%   -9.62%     
==========================================
  Files          31       32       +1     
  Lines        1836     1911      +75     
==========================================
- Hits         1534     1413     -121     
- Misses        302      498     +196

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mcabbott · 2024-02-29T15:51:27Z

Link to version with the markdown rendered:
https://github.com/isentropic/Flux.jl/blob/improve-freezing-docs/docs/src/models/freezing-params.md
Edit:
https://github.com/isentropic/Flux.jl/blob/improve-freezing-docs/docs/src/tutorials/misc-model-tweaking.md

I wonder if this ought to be filed under docs/tutorials. It has some overlap with advanced.md (which itself is long overdue for a re-write / removal) which is OK in the "Tutorials" section?

ToucheSir · 2024-02-29T20:09:41Z

It also overlaps with https://fluxml.ai/Flux.jl/stable/training/training/#Freezing-and-Schedules. I do think it's a little disjointed to have docs for both layer definition tools and dedicated freezing tools on a single "freezing" page. Ideally functor and trainable would be documented alongside everything else on layer definition, but I appreciate that users may not realize the link between these and freezing.

isentropic · 2024-03-01T09:30:15Z

The reason i put this one all the way in the bottom is cause it requires the reading of functors first and seeing other ways of doing things. It's more like a meta overview of how to do things in each separate case. I tried to link where possible, but Im open to suggestions

Also please double check if the docs are factually correct. My only reference is the slack discussion

isentropic · 2024-03-01T09:31:26Z

It also overlaps with https://fluxml.ai/Flux.jl/stable/training/training/#Freezing-and-Schedules. I do think it's a little disjointed to have docs for both layer definition tools and dedicated freezing tools on a single "freezing" page. Ideally functor and trainable would be documented alongside everything else on layer definition, but I appreciate that users may not realize the link between these and freezing.

I specifically replaced that section, see the file diff

ToucheSir · 2024-03-05T20:41:46Z

I specifically replaced that section, see the file diff

Yes, I'm aware :)

To explain why I feel this approach is disjointed by way of analogy, this would be like putting Dropout and WeightDecay on the same page because both are "regularization". Yes, you can technically achieve similar ends with both. But functionally, they are designed for different things. @functor LayerType (fields...,) means "I don't want Flux to do anything with the other fields". Overriding trainable means "I don't want the optimizer/Optimisers.jl to do anything with the other fields". Using freeze! means "I want to dynamically control what parameters are trainable on the fly".

I think it could make sense to have a page talking about the last two, but having all three in one place doesn't make a lot of sense to me. Other than that, I think a "freezing params" page might not have enough content to justify its existence as a standalone page. If the scope were broadened slightly to working with parameters and optimization rules, that could work. Maybe leave some space to add docs on the various ways to do regularization and loss penalties.

isentropic · 2024-03-06T06:40:32Z

The last three pages "logistic regression" "linear regression" and "custom layers" all feature somewhat a disjointed set of functionality and act like a "putting it all together" sections or like extensions of other well-defined use-cases. To me it is fine that these pages are fine to be a mixed-bag because they are disconnected from the rest of the docs and marked as "tutorials" for like more advanced use-cases and clarifications in the docs that are otherwise hard to write in linear independent chunks.

I still insist on having an independent category/page for freezing and everything that might link to it including functors and optimisers. There are examples in the wild like:

lux features a separate page
flax explain how to achieve it using different ways.

I understand your sentiment, it is far from perfect right now. But I still feel like the users who read through the main docs kinda need a page or a section that would explain why and how each method is different in its own way.

I also agree that expanding the scope to include regularization and loss penalties is worth trying. Could you give me more directions here? I could try writing this up.
Thanks for comments and suggestions btw.

mcabbott · 2024-03-06T15:01:47Z

It's doesn't seem too crazy to make a tutorial describing concepts that Flux thinks of as orthogonal, all together... this shouldn't be the main intro to them but perhaps it's useful to for others? I quite like the examples here of why you might want each of them.

One thing this could usefully expand on is post-#1932 changes... e.g. that @layer MyLayer trainable=(w,) is recommended in place of @functor MyLayer and trainable(::MyLayer).

On freeze! & adjust!, maybe it's worth considering talking about parameter scheduling a little bit, e.g. a simple case by hand and an example of how to use https://github.com/FluxML/ParameterSchedulers.jl ?

Edit: one more thing I found here: #2216 (comment) is that if you want to freeze a lot of the model, you may not need to compute the whole gradient. That's a bit obscure, but might be nice to explain on a tutorial-like page like this.

I think the file ought to be in docs/src/tutorials. (Some old files like advanced.md are in slightly illogical places pending someone having time to move & redirect.)
~~In addition to this file, it might be worth looking for places to add really brief cross-references. E.g. the end of the training section about L2, weight decay etc links to Dropout briefly.~~ Edit: maybe done in feaa5ff

ToucheSir · 2024-03-06T17:13:58Z

The last three pages "logistic regression" "linear regression" and "custom layers" all feature somewhat a disjointed set of functionality and act like a "putting it all together" sections or like extensions of other well-defined use-cases. To me it is fine that these pages are fine to be a mixed-bag because they are disconnected from the rest of the docs and marked as "tutorials" for like more advanced use-cases and clarifications in the docs that are otherwise hard to write in linear independent chunks.

Yes, referencing a broad range of functionality works for a tutorial because there's a singular and clear end. This is less true for a "how-to guide" like this proposed freezing page and the two examples mentioned later.

I agree the custom layers page is a mess and needs to be broken up. But separating any mention of @functor from its main responsibility as a layer registration tool would be a bad idea. struct MyLayer ... end and @functor MyLayer should show up on the same page.

I still insist on having an independent category/page for freezing and everything that might link to it including functors and optimisers. There are examples in the wild like:
1. [lux](https://lux.csail.mit.edu/dev/manual/freezing_model_parameters) features a separate page

2. [flax](https://flax.readthedocs.io/en/latest/guides/training_techniques/transfer_learning.html#) explain how to achieve it using different ways.

I would not say they're equivalent. The Lux page really talks about one main piece of functionality, Lux.freeze. The equivalent Flux page would talk about freeze! and how to combine it with traversal helpers like fmap. What the Lux page does not talk about is layer definition tools, which is what @functor is. That's left to a separate page.

The Flax page is more of a tutorial. We could use a comparable tutorial in the Flux docs, but that's a separate discussion from a how-to guide on parameter freezing.

I understand your sentiment, it is far from perfect right now. But I still feel like the users who read through the main docs kinda need a page or a section that would explain why and how each method is different in its own way.

I think the core contention here is that I don't see @functor Layer (fields...) as a "method" for freezing parameters. By definition, @functor defines what fields are parameters. Anything not in fields is then clearly not a parameter, so it doesn't make sense to talk about them in the context of freezing parameters. I think it would be good to mention that one can use @functor to exclude certain struct fields from everything (including optimization), with a link to the layer definition section. But this should be a brief mention and not involve moving content from that page into this one.

I also agree that expanding the scope to include regularization and loss penalties is worth trying. Could you give me more directions here? I could try writing this up. Thanks for comments and suggestions btw.

I would just broaden the title and preamble for now and worry about adding this extra content in a follow-up.

isentropic · 2024-03-08T08:49:23Z

I tried to incorporate your suggestions in the last commit. Please take a look at the last subsection, and do let me know if it is factually correct

isentropic · 2024-03-12T06:58:59Z

any feedback on this @ToucheSir @mcabbott?

ToucheSir · 2024-03-13T01:40:22Z

Do you mind rebasing so we can see what this looks like on top of #2390?

Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3 to 4. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v3...v4) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 3.0.1 to 3.0.2. - [Release notes](https://github.com/dorny/paths-filter/releases) - [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md) - [Commits](dorny/paths-filter@v3.0.1...v3.0.2) --- updated-dependencies: - dependency-name: dorny/paths-filter dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

…ML#1932)

* doc changes re at-functor and at-layer * fix a doctest * more fixes * public at-layer * add a sentence comparing to freeze/thaw * Apply suggestions from code review Co-authored-by: Kyle Daruwalla <[email protected]> * two fixes re SignDecay --------- Co-authored-by: Kyle Daruwalla <[email protected]>

ToucheSir · 2024-03-13T04:36:05Z

Unfortunately I don't think the rebase was successful. Usually you'd have to force push the remote branch after rebasing, and only the changes from your four original commits on this PR should be visible on GitHub.

docs: improve freezing docs

383fb84

mcabbott mentioned this pull request Feb 29, 2024

Add a macro to opt-in to fancy printing, and to everything else #1932

Merged

3 tasks

mcabbott added the documentation label Feb 29, 2024

fix broken link

7f234d6

restructre

5514952

diegozea and others added 8 commits March 13, 2024 12:42

Fix FluxML#2380 (FluxML#2384)

10e9efd

Allow cpu(::DataLoader) (FluxML#2388)

960f573

Add a macro to opt-in to fancy printing, and to everything else (Flux…

97fdcd1

…ML#1932)

Small upgrades to training docs (FluxML#2331)

5618059

docs: improve freezing docs

8f3fe20

isentropic closed this Mar 13, 2024

isentropic mentioned this pull request Mar 13, 2024

freezingdocs #2397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: improve freezing docs #2385

docs: improve freezing docs #2385

isentropic commented Feb 29, 2024 •

edited by mcabbott

Loading

codecov bot commented Feb 29, 2024 •

edited

Loading

mcabbott commented Feb 29, 2024 •

edited

Loading

ToucheSir commented Feb 29, 2024 •

edited

Loading

isentropic commented Mar 1, 2024 •

edited

Loading

isentropic commented Mar 1, 2024

ToucheSir commented Mar 5, 2024

isentropic commented Mar 6, 2024

mcabbott commented Mar 6, 2024 •

edited

Loading

ToucheSir commented Mar 6, 2024

isentropic commented Mar 8, 2024

isentropic commented Mar 12, 2024

ToucheSir commented Mar 13, 2024

ToucheSir commented Mar 13, 2024 •

edited

Loading

docs: improve freezing docs #2385

docs: improve freezing docs #2385

Conversation

isentropic commented Feb 29, 2024 • edited by mcabbott Loading

codecov bot commented Feb 29, 2024 • edited Loading

Codecov Report

mcabbott commented Feb 29, 2024 • edited Loading

ToucheSir commented Feb 29, 2024 • edited Loading

isentropic commented Mar 1, 2024 • edited Loading

isentropic commented Mar 1, 2024

ToucheSir commented Mar 5, 2024

isentropic commented Mar 6, 2024

mcabbott commented Mar 6, 2024 • edited Loading

ToucheSir commented Mar 6, 2024

isentropic commented Mar 8, 2024

isentropic commented Mar 12, 2024

ToucheSir commented Mar 13, 2024

ToucheSir commented Mar 13, 2024 • edited Loading

isentropic commented Feb 29, 2024 •

edited by mcabbott

Loading

codecov bot commented Feb 29, 2024 •

edited

Loading

mcabbott commented Feb 29, 2024 •

edited

Loading

ToucheSir commented Feb 29, 2024 •

edited

Loading

isentropic commented Mar 1, 2024 •

edited

Loading

mcabbott commented Mar 6, 2024 •

edited

Loading

ToucheSir commented Mar 13, 2024 •

edited

Loading