Releases: kozistr/pytorch_optimizer
pytorch-optimizer v3.3.2
pytorch-optimizer v3.3.1
Change Log
Feature
- Support
Cautious
variant toAdaShift
optimizer. (#310) - Save the state of the
Lookahead
optimizer too. (#310) - Implement
APOLLO
optimizer. (#311, #312) - Rename the
Apollo
(An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
) optimizer name toApolloDQN
not to overlap with the new optimizer nameAPOLLO
. (#312) - Implement
MARS
optimizer. (#313, #314) - Support
Cautious
variant toMARS
optimizer. (#314)
Bug
- Fix
bias_correction
inAdamG
optimizer. (#305, #308) - Fix a potential bug when loading the state for
Lookahead
optimizer. (#306, #310)
Docs
Contributions
thanks to @Vectorrent
pytorch-optimizer v3.3.0
Change Log
Feature
- Support
PaLM
variant forScheduleFreeAdamW
optimizer. (#286, #288)- you can use this feature by setting
use_palm
toTrue
.
- you can use this feature by setting
- Implement
ADOPT
optimizer. (#289, #290) - Implement
FTRL
optimizer. (#291) - Implement
Cautious optimizer
feature. (#294)- Improving Training with One Line of Code
- you can use it by setting
cautious=True
forLion
,AdaFactor
andAdEMAMix
optimizers.
- Improve the stability of
ADOPT
optimizer. (#294) - Support a new projection type
random
forGaLoreProjector
. (#294) - Implement
DeMo
optimizer. (#300, #301) - Implement
Muon
optimizer. (#302) - Implement
ScheduleFreeRAdam
optimizer. (#304) - Implement
LaProp
optimizer. (#304) - Support
Cautious
variant toLaProp
,AdamP
,Adopt
optimizers. (#304).
Refactor
- Big refactoring, removing direct import from
pytorch_optimizer.*
.- I removed some methods not to directly import from it from
pytorch_optimzier.*
because they're probably not used frequently and actually not an optimizer rather utils only used for specific optimizers. pytorch_optimizer.[Shampoo stuff]
->pytorch_optimizer.optimizers.shampoo_utils.[Shampoo stuff]
.shampoo_utils
likeGraft
,BlockPartitioner
,PreConditioner
, etc. You can check the details here.
pytorch_optimizer.GaLoreProjector
->pytorch_optimizer.optimizers.galore.GaLoreProjector
.pytorch_optimizer.gradfilter_ema
->pytorch_optimizer.optimizers.grokfast.gradfilter_ema
.pytorch_optimizer.gradfilter_ma
->pytorch_optimizer.optimizers.grokfast.gradfilter_ma
.pytorch_optimizer.l2_projection
->pytorch_optimizer.optimizers.alig.l2_projection
.pytorch_optimizer.flatten_grad
->pytorch_optimizer.optimizers.pcgrad.flatten_grad
.pytorch_optimizer.un_flatten_grad
->pytorch_optimizer.optimizers.pcgrad.un_flatten_grad
.pytorch_optimizer.reduce_max_except_dim
->pytorch_optimizer.optimizers.sm3.reduce_max_except_dim
.pytorch_optimizer.neuron_norm
->pytorch_optimizer.optimizers.nero.neuron_norm
.pytorch_optimizer.neuron_mean
->pytorch_optimizer.optimizers.nero.neuron_mean
.
- I removed some methods not to directly import from it from
Docs
- Add more visualizations. (#297)
Bug
- Add optimizer parameter to
PolyScheduler
constructor. (#295)
Contributions
thanks to @tanganke
pytorch-optimizer v3.2.0
Change Log
Feature
- Implement
SOAP
optimizer. (#275) - Support
AdEMAMix
variants. (#276)bnb_ademamix8bit
,bnb_ademamix32bit
,bnb_paged_ademamix8bit
,bnb_paged_ademamix32bit
- Support 8/4bit, fp8 optimizers. (#208, #281)
torchao_adamw8bit
,torchao_adamw4bit
,torchao_adamwfp8
.
- Support a module-name-level (e.g.
LayerNorm
) weight decay exclusion forget_optimizer_parameters
. (#282, #283) - Implement
CPUOffloadOptimizer
, which offloads optimizer to CPU for single-GPU training. (#284) - Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.
Bug
Contributions
thanks to @Vectorrent
pytorch-optimizer v3.1.2
pytorch-optimizer v3.1.1
pytorch-optimizer v3.1.0
Change Log
Feature
- Implement
AdaLomo
optimizer. (#258) - Support
Q-GaLore
optimizer. (#258)- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
- you can use by
optimizer = load_optimizer('q_galore_adamw8bit')
- Support more bnb optimizers. (#258)
bnb_paged_adam8bit
,bnb_paged_adamw8bit
,bnb_*_*32bit
.
- Improve
power_iteration()
speed up to 40%. (#259) - Improve
reg_noise()
(E-MCMC) speed up to 120%. (#260) - Support
disable_lr_scheduler
parameter forRanger21
optimizer to disable built-in learning rate scheduler. (#261)
Refactor
- Refactor
AdamMini
optimizer. (#258) - Deprecate optional dependency,
bitsandbytes
. (#258) - Move
get_rms
,approximate_sq_grad
functions toBaseOptimizer
for reusability. (#258) - Refactor
shampoo_utils.py
. (#259) - Add
debias
,debias_adam
methods inBaseOptimizer
. (#261) - Refactor to use
BaseOptimizer
only, not inherit multiple classes. (#261)
Bug
- Fix several bugs in
AdamMini
optimizer. (#257)
Contributions
thanks to @sdbds
pytorch-optimizer v3.0.2
pytorch-optimizer v3.0.1
Change Log
Feature
- Implement
FAdam
optimizer. (#241, #242) - Tweak
AdaFactor
optimizer. (#236, #243)- support not-using-first-momentum when beta1 is not given
- default dtype for first momentum to
bfloat16
- clip second momentum to 0.999
- Implement
GrokFast
optimizer. (#244, #245)
Bug
- Wrong typing of reg_noise. (#239, #240)
- Lookahead`s param_groups attribute is not loaded from checkpoint. (#237, #238)
Contributions
thanks to @michaldyczko
pytorch-optimizer v3.0.0
Change Log
The major version is updated! (v2.12.0
-> v3.0.0
) (#164)
Many optimizers, learning rate schedulers, and objective functions are in pytorch-optimizer
.
Currently, pytorch-optimizer
supports 67 optimizers (+ bitsandbytes
), 11 lr schedulers, and 13 loss functions, and reached about 4 ~ 50K downloads / month (peak is 75K downloads / month)!
The reason for updating the major version from v2
to v3
is that I think it's a good time to ship the recent implementations (the last update was about 7 months ago) and plan to pivot to new concepts like training utilities while maintaining the original features (e.g. optimizers).
Also, rich test cases, benchmarks, and examples are on the list!
Finally, thanks for using the pytorch-optimizer
, and feel free to make any requests :)
Feature
- Implement
REX
lr scheduler. (#217, #222) - Implement
Aida
optimizer. (#220, #221) - Implement
WSAM
optimizer. (#213, #216) - Implement
GaLore
optimizer. (#224, #228) - Implement
Adalite
optimizer. (#225, #229) - Implement
bSAM
optimizer. (#212, #233) - Implement
Schedule-Free
optimizer. (#230, #233) - Implement
EMCMC
. (#231, #233)
Fix
- Fix SRMM to allow operation beyond memory_length. (#227)
Dependency
- Drop
Python 3.7
support officially. (#221)- Please check the README.
- Update
bitsandbytes
to0.43.0
. (#228)
Docs
- Add missing parameters in
Ranger21 optimizer
document. (#214, #215) - Fix
WSAM
optimizer paper link. (#219)
Contributions
Diff
- from the previous major version : 2.0.0...3.0.0
- from the previous version: 2.12.0...3.0.0