What is Regularization
Regularization is a technique that helps reduce overfitting or reduce variance in Neural network by penalizing for complexity. It is a technique that penalizes for relatively large weights in our model.
Learn more about Overfitting and Variance here https://marko-kovacevic.com/blog/bias-and-variance-in-machine-learning/ .
L2 Regularization
L2 regularization is the most common Regularization technique. It is implemented by adding a term to Loss function. Term penalizes for large weights.
Loss formula with implemented L2 Regularization is calculated by this formula:

Variable | Definition |
---|---|
n | Number of layers |
w[j] | Weight matrix for the j-th layer |
m | Number of inputs |
λ | Regularization parameter |
Regularization parameter ( λ ) is another hyperparameter that is used for tuning. If λ is large then it will incentivized to make the weights small. If weights are smaller then our model will be simpler.
Programming implementation
Keras:
dropout_model = tf.keras.Sequential([
layers.Dense(512, activation='elu', input_shape=(FEATURES,))
layers.Dense(512, activation='elu', kernel_regularizer=regularizers.l2(0.3),
layers.Dense(1)
])
Dropout Regularization
Dropout Regularization randomly ignore some subset of nodes in a given layer during training. It drops nodes from the layer.

Dropout, applied to a layer, consists of randomly “dropping out” (i.e. set to zero) a number of output features of the layer during training. Let’s say a given layer would normally have returned a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a given input sample during training, after applying dropout, this vector will have a few zero entries distributed at random, e.g. [0, 0.5, 1.3, 0, 1.1].
Dropout rate is parameter for dropping nodes, it is number between 0 and 1. The higher Dropout rate it will drop more nodes.
- 0.0 – No dropout regularization
- 1.0 – Drop out everything, the model learns nothing
- Values between 0.0 and 1.0 – More useful
Programming implementation
Keras:
dropout_model = tf.keras.Sequential([
layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(1)
])
https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_dropout
L2 + Dropout regularization
L2 and Dropout regularizations can be combined and often give very good results.
Programming implementation
Keras:
combined_model = tf.keras.Sequential([
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu', input_shape=(FEATURES,)),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(1)
])
https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#combined_l2_dropout
Data augmentation
Getting more data can reduce Overfitting. Data augmentation is Regularization method that reduce overfitting by augmenting the dataset. If dataset is limited then you can get more data from existing dataset by deriving new data based on old ones.
Like wedding photography artists you could derive new image by image mirroring or you could just zoom and rotate image.

Early stopping
Early stopping is regularization method that stops training when training error and test error are the smallest but also very similar and keeps weigts from that stopped iteration.

Thanks for reading this post.
References
- Coursera. 2020. Regularization – Practical Aspects Of Deep Learning | Coursera. [online] Available at: <https://www.coursera.org/learn/deep-neural-network/lecture/Srsrc/regularization> [Accessed 29 May 2020].
- Coursera. 2020. Dropout Regularization – Practical Aspects Of Deep Learning | Coursera. [online] Available at: <https://www.coursera.org/learn/deep-neural-network/lecture/eM33A/dropout-regularization> [Accessed 30 May 2020].
- Coursera. 2020. Other Regularization Methods – Practical Aspects Of Deep Learning | Coursera. [online] Available at: <https://www.coursera.org/learn/deep-neural-network/lecture/Pa53F/other-regularization-methods> [Accessed 30 May 2020].
- Deeplizard.com. 2020. Regularization In A Neural Network Explained. [online] Available at: <https://deeplizard.com/learn/video/iuJgyiS7BKM> [Accessed 29 May 2020].
- TensorFlow. 2020. Overfit And Underfit | Tensorflow Core. [online] Available at: <https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#combined_l2_dropout> [Accessed 30 May 2020].
- mc.ai. 2020. Why “Early-Stopping” Works As Regularization?. [online] Available at: <https://mc.ai/why-early-stopping-works-as-regularization/> [Accessed 30 May 2020].