1.SGDR This is a Pytorch implementation of training a model (Resnet-50) using a differential learning rate. The optimizer used is Stochastic Gradient descent with RESTARTS ( SGDR) that uses Cosine Annealing which decreases the learning rate in the form of half a cosine curve. Cycling the learning rate allows the network to get out of spiky minima and enter a more robust one.
This code shows how to train a Resnet-50 model using Pytorch and a differential learning rate. The optimizer is Stochastic Gradient Descent with RESTARTS (SGDR), which applies Cosine Annealing to reduce the learning rate following a half-cosine curve. This technique helps the network escape from sharp minima and converge to a smoother one.