Add AdEMAMix Optimizer #20258

IMvision12 · 2024-09-14T20:33:32Z

AdEMAMix integrates Adam and EMA optimization methods to tackle issues of slow convergence and subpar generalization in large language models and noisy datasets. It utilizes three beta parameters along with an alpha parameter to provide flexible momentum and adaptive learning rates.

Paper: https://arxiv.org/abs/2409.03137

I'm interested in adding this optimizer to Keras.

@fchollet

fchollet · 2024-09-15T18:01:37Z

Thanks for the suggestion. I see the paper has a total of 0 citations listed on ArXiv. As a general rule we wait to see >50 citations before including a technique in Keras. As per API guidelines: "We only add new objects that are already commonly used in the machine learning community"

fchollet · 2024-09-15T18:02:52Z

Now, if you want to build this optimizer, you can do so in your own repo, and then we can share it with the community to see if people adopt it. If eventually the optimizer becomes commonly used, we will add it to the Keras API.

IMvision12 · 2024-09-16T02:15:06Z

@fchollet

here i have implemented the optimizer using keras : https://github.com/IMvision12/AdEMAMix-Optimizer-Keras

IMvision12 · 2024-09-16T02:30:04Z

If we need to add this optimizer in the future, I'd be eager to integrate it into Keras.

github-actions bot assigned mehtamansi29 Sep 14, 2024

fchollet closed this as completed Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AdEMAMix Optimizer #20258

Add AdEMAMix Optimizer #20258

IMvision12 commented Sep 14, 2024

fchollet commented Sep 15, 2024

fchollet commented Sep 15, 2024

IMvision12 commented Sep 16, 2024

IMvision12 commented Sep 16, 2024

Add AdEMAMix Optimizer #20258

Add AdEMAMix Optimizer #20258

Comments

IMvision12 commented Sep 14, 2024

fchollet commented Sep 15, 2024

fchollet commented Sep 15, 2024

IMvision12 commented Sep 16, 2024

IMvision12 commented Sep 16, 2024