Momentum in Deep Learning

AvatarPosted by

Momentum is Deep learning technique that is used along with Stochastic Gradient Descent. Instead of using only the gradient of the current step to guide the search, momentum also accumulates the gradient of the past steps to determine the direction to go.

It is used to avoid situation when your Stochastic Gradient Descent alghorithm think that he is reached global minimum but it is actually stuck in local minimum.

Algorithm reached Local minimum

With Momentum you will get more accurate model and it will converge faster by going in the right direction.

Gradient descent with and without Momentum when reaching minimum

Momentum is used in weight update formula. It is always less than 1.

Mathematical Implementation


v - current velocity vector
γ - momentum (γ < 1)
α - learning rate
∇θ - gradient w.r.t (with respect to) weight
J - loss function
θ - weight

Programming Implementation


keras.optimizers.SGD(learning_rate=0.01, momentum=0.9, nesterov=False)


    learning_rate, momentum, use_locking=False, name='Momentum', use_nesterov=False

Read about Nesterov Momentum here .

Thanks for reading this post.


  1. 2020. Unsupervised Feature Learning And Deep Learning Tutorial. [online] Available at: <> [Accessed 15 April 2020].
  2. Quora. 2020. What Does Momentum Mean In Neural Networks?. [online] Available at: <> [Accessed 15 April 2020].
  3. Medium. 2020. Stochastic Gradient Descent With Momentum. [online] Available at: <> [Accessed 15 April 2020].

Leave a Reply

Your email address will not be published. Required fields are marked *