Releases: kozistr/pytorch_optimizer
Releases · kozistr/pytorch_optimizer
pytorch-optimizer v2.6.1
pytorch-optimizer v2.6.0
Change Log
Feature
- Implement SM3 optimizer, #130
- Tweak Scalable Shampoo optimizer, #128, #129
- implement a new preconditioner type, OUTPUT.
- optimize speed/memory usage of coupled Newton iteration and power iteration methods.
- use in-place operation (SQRT-N Grafting).
- clean-up
shampoo_utils
more readable. - support
skip_preconditioning_rank_lt
parameter to skip preconditioning in case of the low-rank gradient. - set default value for
preconditioning_compute_steps
to 1000. - set default value for
start_preconditioning_step
to 25.
pytorch-optimizer v2.5.2
pytorch-optimizer v2.5.1
pytorch-optimizer v2.5.0
pytorch-optimizer v2.4.2
Change Log
Bug
- Fix to deep-copy
inverse preconditioners
Deps
Docs
- Update Scalable Shampoo docstring (more parameter guides), #106
pytorch-optimizer v2.4.1
Change Log
Feature
- Rename the new
Shampoo
toScalableShampoo
. #103 - Implement the old(?) version of Shampoo optimizer. #103
- Support
SVD
method to calculate the inversepth
root matrix. #103- to boost the
M^{-1/p}
calculation, performs batched SVD when available.
- to boost the
- Implement
AdamS
optimizer. #102 - Support
stable weight decay
option forAdai
optimizer. #102
Bug
- Fix
compute_power_svd()
to get a singular value. #104
pytorch-optimizer v2.4.0
Change Log
Feature
- Implement
D-Adaptation optimizers
(DAdaptAdaGrad
,DAdaptAdam
,DAdaptSGD
), #101- Learning rate free learning for SGD, AdaGrad and Adam
- original implementation: https://github.com/facebookresearch/dadaptation
- Shampoo optimizer
- Support
no_preconditioning_for_layers_with_dim_gt
(default 8192)
- Support
Improvement
- refactor/improve
matrix_power()
, unroll the loop due to the performance, #101 - speed-up/fix
power_iter()
, not to deep-copymat_v
. #101
Docs
D-Adaptation optimizers
& Shampoo utils
pytorch-optimizer v2.3.1
Change Log
Feature
- more add-ons for Shampoo optimizer, #99
- implement
moving_average_for_momentum
- implement
decoupled_weight_decay
- implement
decoupled_learning_rate
- supports more grafting (
RMSProp
,SQRT_N
) - supports more PreConditioner (
ALL
,INPUT
)
- implement
Docs
- apply pydocstyle linter, #91
Refactor
- deberta_v3_large_lr_scheduler, #91
ETC
- add more Ruff rules (
ICN, TID, ERA, RUF, YTT, PL
), #91
pytorch-optimizer v2.3.0
Change Log
Feature
- re-implement Shampoo Optimizer (#97, related to #93)
- layer-wise grafting (none, adagrad, sgd)
- block partitioner
- preconditioner
- remove casting to
fp16
orbf16
inside of thestep()
not to lose consistency with the other optimizers. #96 - change some ops to in-place operations to speed up. #96
Fix
- fix
exp_avg_var
whenamsgrad
is True. #96
Refactor
- change linter from
Pylint
toRuff
, #97