Skip to content

Commit

Permalink
Merge pull request #211 from kozistr/feature/bitsandbytes
Browse files Browse the repository at this point in the history
[Feature] Support `bitsandbytes` optimizers
  • Loading branch information
kozistr authored Oct 7, 2023
2 parents 1c82216 + c6fbd24 commit 14b6b58
Show file tree
Hide file tree
Showing 13 changed files with 612 additions and 641 deletions.
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ authors:
given-names: Hyeongchan
orcid: https://orcid.org/0000-0002-1729-0580
title: "pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch"
version: 2.11.0
date-released: 2022-01-29
version: 2.12.0
date-released: 2021-09-21
url: "https://github.com/kozistr/pytorch_optimizer"
324 changes: 324 additions & 0 deletions README.md

Large diffs are not rendered by default.

448 changes: 0 additions & 448 deletions README.rst

This file was deleted.

17 changes: 17 additions & 0 deletions docs/changelogs/v2.12.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Change Log

### Feature

* Support `bitsandbytes` optimizer. (#211)
* now, you can install with `pip3 install pytorch-optimizer[bitsandbytes]`
* supports 8 bnb optimizers.
* `bnb_adagrad8bit`, `bnb_adam8bit`, `bnb_adamw8bit`, `bnb_lion8bit`, `bnb_lamb8bit`, `bnb_lars8bit`, `bnb_rmsprop8bit`, `bnb_sgd8bit`.

### Docs

* Introduce `mkdocs` with `material` theme. (#204, #206)
* documentation : https://pytorch-optimizers.readthedocs.io/en/latest/

### Diff

[2.11.2...2.12.0](https://github.com/kozistr/pytorch_optimizer/compare/v2.11.2...v2.12.0)
176 changes: 77 additions & 99 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Welcome to pytorch-optimizer
# pytorch-optimizer

| | |
|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand All @@ -8,44 +8,35 @@
| Status | [![PyPi download](https://static.pepy.tech/badge/pytorch-optimizer)](https://pepy.tech/project/pytorch-optimizer) [![PyPi month download](https://static.pepy.tech/badge/pytorch-optimizer/month)](https://pepy.tech/project/pytorch-optimizer) |
| License | [![apache](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) |

**pytorch-optimizer** is optimizer & lr scheduler collections in
PyTorch.
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm
while based on the original paper. Also, It includes useful and
practical optimization ideas.
Currently, **60 optimizers**, **10 lr schedulers**, and **13 loss
functions** are supported!

Highly inspired by
[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
Currently, **60 optimizers (+ `bitsandbytes`)**, **10 lr schedulers**, and **13 loss functions** are supported!

Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).

## Getting Started

For more, see the
[documentation](https://pytorch-optimizers.readthedocs.io/en/latest/).
For more, see the [documentation](https://pytorch-optimizers.readthedocs.io/en/latest/).

Most optimizers are under MIT or Apache 2.0 license, but a few
optimizers like <span class="title-ref">Fromage</span>, <span
class="title-ref">Nero</span> have BY-NC-SA 4.0 license, which is
non-commercial. So, please double-check the license before using it at
your work.
Most optimizers are under MIT or Apache 2.0 license, but a few optimizers like `Fromage`, `Nero` have `CC BY-NC-SA 4.0 license`, which is non-commercial.
So, please double-check the license before using it at your work.

### Installation

``` bash
$ pip3 install -U pytorch-optimizer
```bash
$ pip3 install pytorch-optimizer
```

If there's a version issue when installing the package, try with <span
class="title-ref">--no-deps</span> option.
From `pytorch-optimizer v2.12.0`, you can install and import `bitsandbytes` optimizers.
please check [the requirements](https://github.com/TimDettmers/bitsandbytes?tab=readme-ov-file#tldr) before installing it.

``` bash
$ pip3 install -U --no-deps pytorch-optimizer
```bash
$ pip install "pytorch-optimizer[bitsandbytes]"
```

### Simple Usage

``` python
```python
from pytorch_optimizer import AdamP

model = YourModel()
Expand All @@ -55,26 +46,29 @@ optimizer = AdamP(model.parameters())

from pytorch_optimizer import load_optimizer

model = YourModel()
opt = load_optimizer(optimizer='adamp')
optimizer = load_optimizer(optimizer='adamp')(model.parameters())

# if you install `bitsandbytes` optimizer, you can use `8-bit` optimizers from `pytorch-optimizer`.

from pytorch_optimizer import load_optimizer

opt = load_optimizer(optimizer='bnb_adamw8bit')
optimizer = opt(model.parameters())
```

Also, you can load the optimizer via <span
class="title-ref">torch.hub</span>
Also, you can load the optimizer via `torch.hub`.

``` python
```python
import torch

model = YourModel()
opt = torch.hub.load('kozistr/pytorch_optimizer', 'adamp')
optimizer = opt(model.parameters())
```

If you want to build the optimizer with parameters & configs, there's
<span class="title-ref">create_optimizer()</span> API.
If you want to build the optimizer with parameters & configs, there's `create_optimizer()` API.

``` python
```python
from pytorch_optimizer import create_optimizer

optimizer = create_optimizer(
Expand All @@ -91,7 +85,7 @@ optimizer = create_optimizer(

You can check the supported optimizers with below code.

``` python
```python
from pytorch_optimizer import get_supported_optimizers

supported_optimizers = get_supported_optimizers()
Expand Down Expand Up @@ -167,7 +161,7 @@ supported_optimizers = get_supported_optimizers()

You can check the supported learning rate schedulers with below code.

``` python
```python
from pytorch_optimizer import get_supported_lr_schedulers

supported_lr_schedulers = get_supported_lr_schedulers()
Expand All @@ -182,7 +176,7 @@ supported_lr_schedulers = get_supported_lr_schedulers()

You can check the supported loss functions with below code.

``` python
```python
from pytorch_optimizer import get_supported_loss_functions

supported_loss_functions = get_supported_loss_functions()
Expand All @@ -201,8 +195,7 @@ supported_loss_functions = get_supported_loss_functions()

## Useful Resources

Several optimization ideas to regularize & stabilize the training. Most
of the ideas are applied in `Ranger21` optimizer.
Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in `Ranger21` optimizer.

Also, most of the captures are taken from `Ranger21` paper.

Expand All @@ -214,131 +207,116 @@ Also, most of the captures are taken from `Ranger21` paper.
| [Lookahead](#lookahead) | [Chebyshev learning rate schedule](#chebyshev-learning-rate-schedule) | [(Adaptive) Sharpness-Aware Minimization](#adaptive-sharpness-aware-minimization) |
| [On the Convergence of Adam and Beyond](#on-the-convergence-of-adam-and-beyond) | [Improved bias-correction in Adam](#improved-bias-correction-in-adam) | [Adaptive Gradient Norm Correction](#adaptive-gradient-norm-correction) |

## Adaptive Gradient Clipping
### Adaptive Gradient Clipping

This idea originally proposed in `NFNet (Normalized-Free Network)`
paper.
`AGC (Adaptive Gradient Clipping)` clips gradients based on the
`unit-wise ratio of gradient norms to parameter norms`.
This idea originally proposed in `NFNet (Normalized-Free Network)` paper. `AGC (Adaptive Gradient Clipping)` clips gradients based on the `unit-wise ratio of gradient norms to parameter norms`.

- code :
[github](https://github.com/deepmind/deepmind-research/tree/master/nfnets)
- paper : [arXiv](https://arxiv.org/abs/2102.06171)
* code : [github](https://github.com/deepmind/deepmind-research/tree/master/nfnets)
* paper : [arXiv](https://arxiv.org/abs/2102.06171)

## Gradient Centralization
### Gradient Centralization

| |
|---------------------------------------------------------------------------------------------------------------|
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/gradient_centralization.png) |

`Gradient Centralization (GC)` operates directly on gradients by
centralizing the gradient to have zero mean.
`Gradient Centralization (GC)` operates directly on gradients by centralizing the gradient to have zero mean.

- code :
[github](https://github.com/Yonghongwei/Gradient-Centralization)
- paper : [arXiv](https://arxiv.org/abs/2004.01461)
* code : [github](https://github.com/Yonghongwei/Gradient-Centralization)
* paper : [arXiv](https://arxiv.org/abs/2004.01461)

## Softplus Transformation
### Softplus Transformation

By running the final variance denom through the softplus function, it
lifts extremely tiny values to keep them viable.
By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

- paper : [arXiv](https://arxiv.org/abs/1908.00700)
* paper : [arXiv](https://arxiv.org/abs/1908.00700)

## Gradient Normalization
### Gradient Normalization

## Norm Loss
### Norm Loss

| |
|-------------------------------------------------------------------------------------------------|
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/norm_loss.png) |

- paper : [arXiv](https://arxiv.org/abs/2103.06583)
* paper : [arXiv](https://arxiv.org/abs/2103.06583)

## Positive-Negative Momentum
### Positive-Negative Momentum

| |
|------------------------------------------------------------------------------------------------------------------|
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png) |

- code :
[github](https://github.com/zeke-xie/Positive-Negative-Momentum)
- paper : [arXiv](https://arxiv.org/abs/2103.17182)
* code : [github](https://github.com/zeke-xie/Positive-Negative-Momentum)
* paper : [arXiv](https://arxiv.org/abs/2103.17182)

## Linear learning rate warmup
### Linear learning rate warmup

| |
|--------------------------------------------------------------------------------------------------------|
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png) |

- paper : [arXiv](https://arxiv.org/abs/1910.04209)
* paper : [arXiv](https://arxiv.org/abs/1910.04209)

## Stable weight decay
### Stable weight decay

| |
|-----------------------------------------------------------------------------------------------------------|
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png) |

- code :
[github](https://github.com/zeke-xie/stable-weight-decay-regularization)
- paper : [arXiv](https://arxiv.org/abs/2011.11152)
* code : [github](https://github.com/zeke-xie/stable-weight-decay-regularization)
* paper : [arXiv](https://arxiv.org/abs/2011.11152)

## Explore-exploit learning rate schedule
### Explore-exploit learning rate schedule

| |
|-------------------------------------------------------------------------------------------------------------------|
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png) |

- code :
[github](https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis)
- paper : [arXiv](https://arxiv.org/abs/2003.03977)
* code : [github](https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis)
* paper : [arXiv](https://arxiv.org/abs/2003.03977)

## Lookahead
### Lookahead

`k` steps forward, 1 step back. `Lookahead` consisting of keeping an
exponential moving average of the weights that is
updated and substituted to the current weights every `k_{lookahead}`
steps (5 by default).
`k` steps forward, 1 step back. `Lookahead` consisting of keeping an exponential moving average of the weights that is updated and substituted to the current weights every `k` lookahead steps (5 by default).

## Chebyshev learning rate schedule
### Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules.

## (Adaptive) Sharpness-Aware Minimization
### (Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

### On the Convergence of Adam and Beyond

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value
and loss sharpness.
In particular, it seeks parameters that lie in neighborhoods having
uniformly low loss.
Convergence issues can be fixed by endowing such algorithms with 'long-term memory' of past gradients.

## On the Convergence of Adam and Beyond
### Improved bias-correction in Adam

Convergence issues can be fixed by endowing such algorithms with
'long-term memory' of past gradients.
With the default bias-correction, Adam may actually make larger than requested gradient updates early in training.

## Improved bias-correction in Adam
### Adaptive Gradient Norm Correction

With the default bias-correction, Adam may actually make larger than
requested gradient updates early in training.
Correcting the norm of a gradient in each iteration based on the adaptive training history of gradient norm.

## Adaptive Gradient Norm Correction
## Frequently asked questions

Correcting the norm of a gradient in each iteration based on the
adaptive training history of gradient norm.
[here](./qa.md)

## Citation

Please cite the original authors of optimization algorithms. You can
easily find it in the above table! If you use this software, please cite
it below. Or you can get it from "cite this repository" button.
Please cite the original authors of optimization algorithms. You can easily find it in the above table!
If you use this software, please cite it below. Or you can get it from "cite this repository" button.

@software{Kim_pytorch_optimizer_optimizer_2022,
@software{Kim_pytorch_optimizer_optimizer_2021,
author = {Kim, Hyeongchan},
month = jan,
title = {{pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch}},
url = {https://github.com/kozistr/pytorch_optimizer},
version = {2.11.0},
year = {2022}
version = {2.12.0},
year = {2021}
}

## Maintainer
Expand Down
Loading

0 comments on commit 14b6b58

Please sign in to comment.