Skip to content

cogsys-tuebingen/PAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAL - Parabolic Approximation Line Search for DNNs:

This repository provides Tensorflow and Pytorch reference implementations for PAL. PAL is an An efficient and effective line search approach for DNNs which exploits the almost parabolic shape of the loss in negative gradient direction to automatically estimate good step sizes.

If you have any questions or suggestions, please do not hesitate to contact me: maximus.mutschler(at)uni-tuebingen.de

Short introduction to PAL:

PAL' basic idea

Fig1: PAL's basic idea

PAL is based on the empirical observarion that the loss function can be approximated by a one-dimensional parabola in negative gradient/line direction. To do this, one additional point has to be meashured on the line.
PAL performs a variable update step by jumping into the minimum of the approximated parabola.
PAL surpasses SLS, ALIG, SGD-HD and COCOB and competes against ADAM, SLS, SGD and RMSProp on ResNet-32, MobilenetV2, DenseNet-40 and EfficientNet architectures trained on CIFAR-10 and CIFAR-100.
However, the latter are tuned by piecewise constant step sizes, whereas PAL does derive its own learning rate schedule.
PAL surpasses all those optimizers when they are trained without a schedule.
Therefore we PAL could be used in scenarios where default schedules fail.
For a detailed explanation, please refer to our paper.: https://arxiv.org/abs/1903.11991

Exemplary Performance of PAL with data augmentation Exemplary Performance of PAL with data augmentation Exemplary Performance of PAL with data augmentation

Fig2: Exemplary performance of PAL with data augmentation

Exemplary Performance of PAL of PAL without data augmentation

Fig3: Exemplary performance of PAL without data augmentation, however this leads to severe overfitting

The hyperparameters:

For a detailed explanation, please refer to our paper. The introduced hyperparameters lead to good training and test errors:
Usually only the measuring step size has to be adapted slightly. Its sensitivity is not as high as the one of of the learning rate of SGD.

Abbreviation Name Description Default parameter intervalls Sensitivity compared to SGD leaning rate
μ measuring step size distance to the second sampled training loss value [0.1,1] medium
α update step adaptation Multiplier to the update step [1.0,1.2,1.7] low
β direciton adaptation factor Adapts the line direction depending on previous line directions [0.0.4] low
smax maximum step size maximum step size on line. [3.6] low

PyTorch Implementation:

  • No limitations. Can be used in the same way as any other PyTorch optimizer.
  • Runs with PyTorch 1.4
  • Uses tensorboardX for plotting
  • Parabola approximations and loss lines can be plotted

Tensorflow Implementation:

  • limitations:
    • The DNN must not contain any random components such as Dropout or ShakeDrop. This is because PALS requires two loss values of the same deterministic function (= two network inferences) to determine an update step. Otherwise the function would not be continuous and a parabolic approximation is not be possible. However, if these random component implementations could be changed so that drawn random numbers can be reused for at least two inferences, PAL would also support these operations.
    • If using Dropout this has to be replaced with the adapted implementation we provide which works with PAL.
    • With Tensorflow 1.15 and 2.0 is was not possible for us to write a completely graph-based optimizer. Therefore it has to be used slightly different as other optimizers. Have a look into the example code! This is not the case with Pytorch.
    • The Tensorflow implementation does not support Keras and Estimator API.
  • Runs with Tensorflow 1.15
  • Uses tensorboard for plotting
  • Parabola approximations and loss lines can be plotted

Virtual Environment

A virtual environment capable of executing the provided code can be created with the provided python_virtual_env_requirements.txt

About

Reference implementation of the PAL optimizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages