This is a learning rate scheduler that was inspired by the Transformer paper, "Attention is All You Need" by Ashish Vaswani et al., 2017. It uses a warmup step to quickly adapt to the problem through large-scale learning, and it can converge to a desired learning rate (target learning rate) through a differentiable function. The rate of convergence from the maximum learning rate to the target learning rate can be adjusted by modulating the exponential function (variable a).
Variable | Explain |
---|---|
max_lr | maximum learning rate (warm-up) |
min_lr | target(minimum) learning rate |
num_warmup | number of warm-up steps |
a(alpha) | Rate of convergence (curvature of the function): The larger the value, the faster the convergence (0 < a) |
max_lr: 0.01, min_lr: 0.001, num_warm: 50, a: 0.1
TF_warmup_exponential.py is for Tensorflow.
For Pytorch, I will make it. or welcome for pull request.
Contributor: Thanks to Gyeonghun Kim