Bias and Variance in Machine Learning

Bias

Bias is error of Machine learning model. It is inability for a Machine learning algorithm to capture true relationship.

An overly simplified model (straight line) has high Bias error.

Complex model (squiggly line) has low Bias error.

Variance

Variance is difference in error of Machine learning model between training data and test data.

High Variance is when error difference between Train dataset and Test dataset is high.

Low Variance is when error difference between Train dataset and Test dataset is low.

Underfitting and Overfitting

When model has High Bias then we say that our model is Underfitting the training data, because the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the output values (often called Y).

When model has Low Bias and High Variance then we say that our model is Overfitting training data, because the model performs well on the training data but does not perform well on the evaluation data. This is because the model is memorizing the data it has seen and is unable to generalize to unseen examples.

Bias and Variance cases

When looking on algorithm error on training set and algorithm error on test set you can diagnose whether it has problems of high bias or high variance or maybe both or maybe neither. With Bias and Variance diagnosis you can try different things to make better Machine learning model.

Case High Bias

Train set error – 15%

Test set error – 16%

This case has High Bias problem. To fix this this you could do:

make bigger network (more hidden layers and more neurons)
train longer
choose different neural network architecture

Case High Variance

Train set error – 1%

Test set error – 11%

This case has High Variance problem. To fix this this you can do:

Get more data
Regularization

Case High Bias and High Variance

Train set error – 15%

Test set error – 30%

This case has High Bias and High Variance problem. To fix this you could make trade-off between High Bias and High Variance by using above mentioned techniques for decreasing High Bias and High Variance.

Case Low Bias and Low Variance

Train set error – 0.5%

Test set error – 1%

This case has Low Bias and Low Variance problem. This is what a good model should look like and for this case no actions are required.

Thanks for reading this post.

References

Coursera. 2020. Bias / Variance – Practical Aspects Of Deep Learning | Coursera. [online] Available at: <https://www.coursera.org/learn/deep-neural-network/lecture/ZhclI/bias-variance> [Accessed 27 May 2020].
Coursera. 2020. Basic Recipe For Machine Learning – Practical Aspects Of Deep Learning | Coursera. [online] Available at: <https://www.coursera.org/learn/deep-neural-network/lecture/ZBkx4/basic-recipe-for-machine-learning> [Accessed 28 May 2020].
2020. Machine Learning Fundamentals: Bias And Variance. [online] Available at: <https://www.youtube.com/watch?v=EuBBz3bI-aA> [Accessed 27 May 2020].
Docs.aws.amazon.com. 2020. Model Fit: Underfitting Vs. Overfitting – Amazon Machine Learning. [online] Available at: <https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html> [Accessed 28 May 2020].