Skip to content

Releases: kozistr/pytorch_optimizer

pytorch-optimizer v2.6.1

22 Apr 12:14
be0351d
Compare
Choose a tag to compare

Change Log

Fix

  • variables are not located on the same device with the gradients, #132 (related to #131) (thanks to @Bing-su)
  • fix approximate_sq_grad() in Adafactor optimizer, #132

pytorch-optimizer v2.6.0

22 Apr 07:56
19dcf2b
Compare
Choose a tag to compare

Change Log

Feature

  • Implement SM3 optimizer, #130
  • Tweak Scalable Shampoo optimizer, #128, #129
    • implement a new preconditioner type, OUTPUT.
    • optimize speed/memory usage of coupled Newton iteration and power iteration methods.
    • use in-place operation (SQRT-N Grafting).
    • clean-up shampoo_utils more readable.
    • support skip_preconditioning_rank_lt parameter to skip preconditioning in case of the low-rank gradient.
    • set default value for preconditioning_compute_steps to 1000.
    • set default value for start_preconditioning_step to 25.

pytorch-optimizer v2.5.2

11 Apr 13:47
e66435a
Compare
Choose a tag to compare

Feature

  • add eps to stabilize optimizing, Nero optimizer. #121

Fix

  • fix Ranger21 not to skip updates when the first parameter doesn't have a gradient, #125, #126 (thanks to @jdb78)
  • fix Lookahead optimizer, #122, #123

Dependency

  • upgrade to Pytorch 2.0, #123

pytorch-optimizer v2.5.1

12 Mar 05:48
df9e78d
Compare
Choose a tag to compare

Change Log

Feature

Bug

pytorch-optimizer v2.5.0

15 Feb 05:41
26b8b19
Compare
Choose a tag to compare

pytorch-optimizer v2.4.2

10 Feb 10:57
fff34af
Compare
Choose a tag to compare

Change Log

Bug

  • Fix to deep-copy inverse preconditioners

Deps

  • Support Pytorch 2.0, #106 (related to #105)

Docs

pytorch-optimizer v2.4.1

06 Feb 06:34
06dce18
Compare
Choose a tag to compare

Change Log

Feature

  • Rename the new Shampoo to ScalableShampoo. #103
  • Implement the old(?) version of Shampoo optimizer. #103
  • Support SVD method to calculate the inverse pth root matrix. #103
    • to boost the M^{-1/p} calculation, performs batched SVD when available.
  • Implement AdamS optimizer. #102
  • Support stable weight decay option for Adai optimizer. #102

Bug

  • Fix compute_power_svd() to get a singular value. #104

pytorch-optimizer v2.4.0

02 Feb 10:52
75a023a
Compare
Choose a tag to compare

Change Log

Feature

Improvement

  • refactor/improve matrix_power(), unroll the loop due to the performance, #101
  • speed-up/fix power_iter(), not to deep-copy mat_v. #101

Docs

  • D-Adaptation optimizers & Shampoo utils

pytorch-optimizer v2.3.1

31 Jan 13:20
44c423a
Compare
Choose a tag to compare

Change Log

Feature

  • more add-ons for Shampoo optimizer, #99
    • implement moving_average_for_momentum
    • implement decoupled_weight_decay
    • implement decoupled_learning_rate
    • supports more grafting (RMSProp, SQRT_N)
    • supports more PreConditioner (ALL, INPUT)

Docs

  • apply pydocstyle linter, #91

Refactor

  • deberta_v3_large_lr_scheduler, #91

ETC

  • add more Ruff rules (ICN, TID, ERA, RUF, YTT, PL), #91

pytorch-optimizer v2.3.0

30 Jan 07:42
5df1281
Compare
Choose a tag to compare

Change Log

Feature

  • re-implement Shampoo Optimizer (#97, related to #93)
    • layer-wise grafting (none, adagrad, sgd)
    • block partitioner
    • preconditioner
  • remove casting to fp16 or bf16 inside of the step() not to lose consistency with the other optimizers. #96
  • change some ops to in-place operations to speed up. #96

Fix

  • fix exp_avg_var when amsgrad is True. #96

Refactor

  • change linter from Pylint to Ruff, #97