Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Lion, up to 5x faster than Adam, and more accurate #156

Closed
PallHaraldsson opened this issue Aug 18, 2023 · 7 comments · Fixed by #157
Closed

Implement Lion, up to 5x faster than Adam, and more accurate #156

PallHaraldsson opened this issue Aug 18, 2023 · 7 comments · Fixed by #157
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@PallHaraldsson
Copy link

Motivation and description

https://arxiv.org/abs/2302.06675

Lion (EvoLved Sign Momentum). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image classification, Lion boosts the accuracy of ViT by up to 2% on ImageNet and saves up to 5x the pre-training compute on JFT.

It's 11 lines of pseudo-code (shorter than AdamW)

Possible Implementation

No response

@ToucheSir
Copy link
Member

ToucheSir commented Aug 18, 2023

Note that subsequent research has shown marginal at best improvements over Adam(W) with more rigorous experimental design. Nevertheless, this should be a straightforward addition if anyone is interested in getting their feet wet with a PR.

@ToucheSir ToucheSir transferred this issue from FluxML/Flux.jl Aug 18, 2023
@ToucheSir ToucheSir added enhancement New feature or request good first issue Good for newcomers labels Aug 18, 2023
@chengchingwen
Copy link
Member

Isn't it already done #129 ?

@ToucheSir
Copy link
Member

You're right, I completely forgot about that. Thanks Peter!

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Aug 19, 2023

It seems Lion is not documented (nor implemented) at Flux.jl, nor here?

https://fluxml.ai/Flux.jl/stable/training/optimisers/

https://github.com/FluxML/Flux.jl/blob/134882831277844cfab81f2e6ef393634b4215ec/src/optimise/Optimise.jl#L7

I recall looking for it in code, not finding, then for Adam finding "AdamW,RAdam" so I thought I was in the right place ("if not list all there, then more optimizers, such a Lion implemented in .."). Did optimizers belong originally in Flux.jl then moved out to a new package? Or well reexported in Flux.jl for compatibility (I can understand that).

In general do you think you have the best optimizers implemented (somewhere)?

[I know were activation functions are, it seems squareplus is not implemented (which seems like a good softplus alternative), I could, or add it to my NNLib.jl issue. I also think FlashAttention is missing and its improved version 2.]

@chengchingwen
Copy link
Member

Lion is implemented here (Optimisers.jl). I believe the optimiser/Optimise.jl in Flux.jl is somehow out-dated and should be ignored.

@mcabbott
Copy link
Member

Or well reexported in Flux.jl for compatibility

At present this is a little complicated. Flux still exports its own (optimiser/Optimise.jl) optimisers. But has methods to auto-translate them to their Optimisers.jl equivalents. The hope is to delete all of that soon -- perhaps FluxML/Flux.jl#1986 is the issue.

Having Flux re-export any newly added rules (for which it has no old equivalents, like Lion) would be fine. They could be temporarily included in the docs. Or perhaps simpler, some note to look at Optimisers.jl for more could be added somewhere.

@ToucheSir
Copy link
Member

There is indeed such a note in https://fluxml.ai/Flux.jl/stable/training/optimisers/. We'd want to make the preceding paragraph more strongly worded however, as I think the replacement is basically done and no longer "gradual".

Now, one thing I did notice is that Lion is not currently included in the Optimisers.jl docs build. That should be a simple enough fix.

@ToucheSir ToucheSir mentioned this issue Aug 19, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants