Warmup-Exponential decay Learning schedular

This is a learning rate scheduler that was inspired by the Transformer paper, "Attention is All You Need" by Ashish Vaswani et al., 2017. It uses a warmup step to quickly adapt to the problem through large-scale learning, and it can converge to a desired learning rate (target learning rate) through a differentiable function. The rate of convergence from the maximum learning rate to the target learning rate can be adjusted by modulating the exponential function (variable a).

Variables (Hyper-parameters)

Variable	Explain
max_lr	maximum learning rate (warm-up)
min_lr	target(minimum) learning rate
num_warmup	number of warm-up steps
a(alpha)	Rate of convergence (curvature of the function): The larger the value, the faster the convergence (0 < a)

Graph

max_lr: 0.01, min_lr: 0.001, num_warm: 50, a: 0.1

Function

$step_{now} \leq step_{warmup}$

$$ lr=(\max lr/\max step_{warmup})*step_{now} $$

$step_{warmup} < step_{now}$

$$ lr=-e^{\alpha(step_{now}-\max step_{warmup})} + \min lr $$

Others

TF_warmup_exponential.py is for Tensorflow.

For Pytorch, I will make it. or welcome for pull request.

Contributor: Thanks to Gyeonghun Kim

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitattributes		.gitattributes
0.01_50_0.001.png		0.01_50_0.001.png
LICENSE		LICENSE
README.md		README.md
TF_warmup_exponential.py		TF_warmup_exponential.py
draw_graph.py		draw_graph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Warmup-Exponential decay Learning schedular

Variables (Hyper-parameters)

Graph

Function

Others

About

Releases

Packages

Languages

License

Sion1225/Warmup_Exponential_decay_LS

Folders and files

Latest commit

History

Repository files navigation

Warmup-Exponential decay Learning schedular

Variables (Hyper-parameters)

Graph

Function

Others

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages