Momentum is Deep learning technique that is used along with Stochastic Gradient Descent. Instead of using only the gradient of the current step to guide the search, momentum also accumulates the gradient of the past steps to determine the direction to go.
It is used to avoid situation when your Stochastic Gradient Descent alghorithm think that he is reached global minimum but it is actually stuck in local minimum.

With Momentum you will get more accurate model and it will converge faster by going in the right direction.

Momentum is used in weight update formula. It is always less than 1.
Mathematical Implementation
v=γv+α∇θJ(θ;x(i),y(i))
θ=θ−v
v - current velocity vector
γ - momentum (γ < 1)
α - learning rate
∇θ - gradient w.r.t (with respect to) weight
J - loss function
θ - weight
Programming Implementation
Keras:
keras.optimizers.SGD(learning_rate=0.01, momentum=0.9, nesterov=False)
Tensorflow:
tf.compat.v1.train.MomentumOptimizer(
learning_rate, momentum, use_locking=False, name='Momentum', use_nesterov=False
)
https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/MomentumOptimizer
Read about Nesterov Momentum here https://marko-kovacevic.com/blog/nesterov-momentum-in-deep-learning/ .
Thanks for reading this post.
References:
- Deeplearning.stanford.edu. 2020. Unsupervised Feature Learning And Deep Learning Tutorial. [online] Available at: <http://deeplearning.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/> [Accessed 15 April 2020].
- Quora. 2020. What Does Momentum Mean In Neural Networks?. [online] Available at: <https://www.quora.com/What-does-momentum-mean-in-neural-networks> [Accessed 15 April 2020].
- Medium. 2020. Stochastic Gradient Descent With Momentum. [online] Available at: <https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d> [Accessed 15 April 2020].