Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve freezing docs #2385

Closed
wants to merge 11 commits into from

Conversation

isentropic
Copy link

@isentropic isentropic commented Feb 29, 2024

@ToucheSir as per slack discussion

(Edit: closes #2216 )

Copy link

codecov bot commented Feb 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.94%. Comparing base (48a43db) to head (5514952).
Report is 8 commits behind head on master.

❗ Current head 5514952 differs from pull request most recent head 8f3fe20. Consider uploading reports for the commit 8f3fe20 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2385      +/-   ##
==========================================
- Coverage   83.55%   73.94%   -9.62%     
==========================================
  Files          31       32       +1     
  Lines        1836     1911      +75     
==========================================
- Hits         1534     1413     -121     
- Misses        302      498     +196     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcabbott
Copy link
Member

mcabbott commented Feb 29, 2024

Link to version with the markdown rendered:
https://github.com/isentropic/Flux.jl/blob/improve-freezing-docs/docs/src/models/freezing-params.md
Edit:
https://github.com/isentropic/Flux.jl/blob/improve-freezing-docs/docs/src/tutorials/misc-model-tweaking.md

I wonder if this ought to be filed under docs/tutorials. It has some overlap with advanced.md (which itself is long overdue for a re-write / removal) which is OK in the "Tutorials" section?

@ToucheSir
Copy link
Member

ToucheSir commented Feb 29, 2024

It also overlaps with https://fluxml.ai/Flux.jl/stable/training/training/#Freezing-and-Schedules. I do think it's a little disjointed to have docs for both layer definition tools and dedicated freezing tools on a single "freezing" page. Ideally functor and trainable would be documented alongside everything else on layer definition, but I appreciate that users may not realize the link between these and freezing.

@isentropic
Copy link
Author

isentropic commented Mar 1, 2024

The reason i put this one all the way in the bottom is cause it requires the reading of functors first and seeing other ways of doing things. It's more like a meta overview of how to do things in each separate case. I tried to link where possible, but Im open to suggestions

Also please double check if the docs are factually correct. My only reference is the slack discussion

@isentropic
Copy link
Author

It also overlaps with https://fluxml.ai/Flux.jl/stable/training/training/#Freezing-and-Schedules. I do think it's a little disjointed to have docs for both layer definition tools and dedicated freezing tools on a single "freezing" page. Ideally functor and trainable would be documented alongside everything else on layer definition, but I appreciate that users may not realize the link between these and freezing.

I specifically replaced that section, see the file diff

@ToucheSir
Copy link
Member

I specifically replaced that section, see the file diff

Yes, I'm aware :)

To explain why I feel this approach is disjointed by way of analogy, this would be like putting Dropout and WeightDecay on the same page because both are "regularization". Yes, you can technically achieve similar ends with both. But functionally, they are designed for different things. @functor LayerType (fields...,) means "I don't want Flux to do anything with the other fields". Overriding trainable means "I don't want the optimizer/Optimisers.jl to do anything with the other fields". Using freeze! means "I want to dynamically control what parameters are trainable on the fly".

I think it could make sense to have a page talking about the last two, but having all three in one place doesn't make a lot of sense to me. Other than that, I think a "freezing params" page might not have enough content to justify its existence as a standalone page. If the scope were broadened slightly to working with parameters and optimization rules, that could work. Maybe leave some space to add docs on the various ways to do regularization and loss penalties.

@isentropic
Copy link
Author

The last three pages "logistic regression" "linear regression" and "custom layers" all feature somewhat a disjointed set of functionality and act like a "putting it all together" sections or like extensions of other well-defined use-cases. To me it is fine that these pages are fine to be a mixed-bag because they are disconnected from the rest of the docs and marked as "tutorials" for like more advanced use-cases and clarifications in the docs that are otherwise hard to write in linear independent chunks.

I still insist on having an independent category/page for freezing and everything that might link to it including functors and optimisers. There are examples in the wild like:

  1. lux features a separate page
  2. flax explain how to achieve it using different ways.

I understand your sentiment, it is far from perfect right now. But I still feel like the users who read through the main docs kinda need a page or a section that would explain why and how each method is different in its own way.

I also agree that expanding the scope to include regularization and loss penalties is worth trying. Could you give me more directions here? I could try writing this up.
Thanks for comments and suggestions btw.

@mcabbott
Copy link
Member

mcabbott commented Mar 6, 2024

It's doesn't seem too crazy to make a tutorial describing concepts that Flux thinks of as orthogonal, all together... this shouldn't be the main intro to them but perhaps it's useful to for others? I quite like the examples here of why you might want each of them.

One thing this could usefully expand on is post-#1932 changes... e.g. that @layer MyLayer trainable=(w,) is recommended in place of @functor MyLayer and trainable(::MyLayer).

On freeze! & adjust!, maybe it's worth considering talking about parameter scheduling a little bit, e.g. a simple case by hand and an example of how to use https://github.com/FluxML/ParameterSchedulers.jl ?

Edit: one more thing I found here: #2216 (comment) is that if you want to freeze a lot of the model, you may not need to compute the whole gradient. That's a bit obscure, but might be nice to explain on a tutorial-like page like this.

  • I think the file ought to be in docs/src/tutorials. (Some old files like advanced.md are in slightly illogical places pending someone having time to move & redirect.)

  • In addition to this file, it might be worth looking for places to add really brief cross-references. E.g. the end of the training section about L2, weight decay etc links to Dropout briefly. Edit: maybe done in feaa5ff

@ToucheSir
Copy link
Member

The last three pages "logistic regression" "linear regression" and "custom layers" all feature somewhat a disjointed set of functionality and act like a "putting it all together" sections or like extensions of other well-defined use-cases. To me it is fine that these pages are fine to be a mixed-bag because they are disconnected from the rest of the docs and marked as "tutorials" for like more advanced use-cases and clarifications in the docs that are otherwise hard to write in linear independent chunks.

Yes, referencing a broad range of functionality works for a tutorial because there's a singular and clear end. This is less true for a "how-to guide" like this proposed freezing page and the two examples mentioned later.

I agree the custom layers page is a mess and needs to be broken up. But separating any mention of @functor from its main responsibility as a layer registration tool would be a bad idea. struct MyLayer ... end and @functor MyLayer should show up on the same page.

I still insist on having an independent category/page for freezing and everything that might link to it including functors and optimisers. There are examples in the wild like:

1. [lux](https://lux.csail.mit.edu/dev/manual/freezing_model_parameters) features a separate page

2. [flax](https://flax.readthedocs.io/en/latest/guides/training_techniques/transfer_learning.html#) explain how to achieve it using different ways.

I would not say they're equivalent. The Lux page really talks about one main piece of functionality, Lux.freeze. The equivalent Flux page would talk about freeze! and how to combine it with traversal helpers like fmap. What the Lux page does not talk about is layer definition tools, which is what @functor is. That's left to a separate page.

The Flax page is more of a tutorial. We could use a comparable tutorial in the Flux docs, but that's a separate discussion from a how-to guide on parameter freezing.

I understand your sentiment, it is far from perfect right now. But I still feel like the users who read through the main docs kinda need a page or a section that would explain why and how each method is different in its own way.

I think the core contention here is that I don't see @functor Layer (fields...) as a "method" for freezing parameters. By definition, @functor defines what fields are parameters. Anything not in fields is then clearly not a parameter, so it doesn't make sense to talk about them in the context of freezing parameters. I think it would be good to mention that one can use @functor to exclude certain struct fields from everything (including optimization), with a link to the layer definition section. But this should be a brief mention and not involve moving content from that page into this one.

I also agree that expanding the scope to include regularization and loss penalties is worth trying. Could you give me more directions here? I could try writing this up. Thanks for comments and suggestions btw.

I would just broaden the title and preamble for now and worry about adding this extra content in a follow-up.

@isentropic
Copy link
Author

I tried to incorporate your suggestions in the last commit. Please take a look at the last subsection, and do let me know if it is factually correct

@isentropic
Copy link
Author

any feedback on this @ToucheSir @mcabbott?

@ToucheSir
Copy link
Member

Do you mind rebasing so we can see what this looks like on top of #2390?

diegozea and others added 8 commits March 13, 2024 12:42
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3 to 4.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v3...v4)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 3.0.1 to 3.0.2.
- [Release notes](https://github.com/dorny/paths-filter/releases)
- [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md)
- [Commits](dorny/paths-filter@v3.0.1...v3.0.2)

---
updated-dependencies:
- dependency-name: dorny/paths-filter
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
* doc changes re at-functor and at-layer

* fix a doctest

* more fixes

* public at-layer

* add a sentence comparing to freeze/thaw

* Apply suggestions from code review

Co-authored-by: Kyle Daruwalla <[email protected]>

* two fixes re SignDecay

---------

Co-authored-by: Kyle Daruwalla <[email protected]>
@ToucheSir
Copy link
Member

ToucheSir commented Mar 13, 2024

Unfortunately I don't think the rebase was successful. Usually you'd have to force push the remote branch after rebasing, and only the changes from your four original commits on this PR should be visible on GitHub.

@isentropic isentropic closed this Mar 13, 2024
@isentropic isentropic mentioned this pull request Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs on freezing layers should be ported to the explicit syntax
4 participants