Learning Rate Schedules in Deep Learning

Learning rate schedules are techniques used to adjust Learning rate during training by predefined schedule.

See what is Learning rate here https://marko-kovacevic.com/blog/learning-rate-in-deep-learning/

Popular Learning rate schedules techniques are:

Step decay
Exponential decay
Time decay

1. Step Decay

Step decay reduce the Learning rate by predefined value after predefined epochs.

Usually Learning rate is reduces by a half every 5 epochs or by 0.1 every 20 epochs. These numbers depend on the type of problem.

1.1. Mathematical Implementation

lr = lr0 * d^floor( (1 + t) / r)

lr - learning rate
lr0 - initial learning rate
d - decay parameter (how much the learning rate should change at each drop)
t - iteration number
r - how often the rate should be dropped (10 corresponds to a drop every 10 iterations)

1.2. Programming Implementation

Keras:

def step_decay(epoch):
   initial_lrate = 0.1
   drop = 0.5
   epochs_drop = 10.0
   lrate = initial_lrate * math.pow(drop,  
           math.floor((1+epoch)/epochs_drop))
   return lrate

lrate = LearningRateScheduler(step_decay)

https://keras.io/callbacks/#learningratescheduler

LearningRateScheduler is callback function that allows you freedom to define your custom Learning rate schedule.

With callbacks you can customize the behavior of a Keras model during training.

2. Exponential Decay

Exponential Decay reduce the Learning rate by folowing an exponential curve.

2.1. Mathematical Implementation

lr=lr0*e^(−d*t)

lr - learning rate
lr0 - initial learning rate
e - Euler's number (it is 2.71828)
d - decay parameter (how much the learning rate should change at each drop)
t - iteration number

2.2. Programming Implementation

Tensorflow:

tf.compat.v1.train.exponential_decay(
    learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None
)

https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/exponential_decay

Keras:

tf.keras.optimizers.schedules.ExponentialDecay(
  initial_learning_rate, decay_steps, decay_rate, staircase=False, name=None
)

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/ExponentialDecay

3. Time Decay

TIme decay reduce the Learning rate by time that was elapsed. Time is number of elapsed iterations.

Time decay is also called 1/t decay (Inverse time decay).

3.1. Mathematical Implementation

lr=lr0/(1+d*t)

lr - learning rate
lr0 - initial learning rate
d - decay parameter (how much the learning rate should change at each drop)
t - iteration number

3.2. Programming Implementation

Tensorflow:

tf.compat.v1.train.inverse_time_decay(
  learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None
)

https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/inverse_time_decay

Keras:

tf.keras.optimizers.schedules.InverseTimeDecay(
  initial_learning_rate, decay_steps, decay_rate, staircase=False, name=None
)

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/InverseTimeDecay

Conclusion

Learning Rate Schedules are good techniques for updating Learning rate during training. It is better using them then using constant Learning rate (without updating Learning rate during training) because it will get faster minimum loss and it will have less chance to overshoot minimum loss.

Of all these Learning Rate Schedules techniques, most preferable is Step decay because it is easy to understand his parameters during training.

In addition to Learning Rate schedules techniques, there are Adaptive learning rate techniques which is better, learn about them here https://marko-kovacevic.com/blog/adaptive-learning-rate-in-deep-learning/ .

Thanks for reading this post.

References

Cs231n.github.io. 2020. Stanford University – Cs231n Convolutional Neural Networks For Visual Recognition. [online] Available at: <https://cs231n.github.io/neural-networks-3/> [Accessed 26 April 2020].
En.wikipedia.org. 2020. Learning Rate. [online] Available at: <https://en.wikipedia.org/wiki/Learning_rate> [Accessed 26 April 2020].
Medium. 2020. Learning Rate Schedules And Adaptive Learning Rate Methods For Deep Learning. [online] Available at: <https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1> [Accessed 26 April 2020].

1. Step Decay

1.1. Mathematical Implementation

1.2. Programming Implementation

2. Exponential Decay

2.1. Mathematical Implementation

2.2. Programming Implementation

3. Time Decay

3.1. Mathematical Implementation

3.2. Programming Implementation

Conclusion

References

Leave a Reply Cancel reply