On a whole overfitting is a modelling error. When a model start to learn too much of training data or if a model is trying to “adapt” to the training data this leads to high training accuracy.

Fig1. (Source-Wikipedia)


lets assume for now that regularization is a penalty addition we’ll justify this in a moment.


How does regularization work

As we can notice in overfitting the the weights get overloaded with large magnitudes, can we find a way to control this factor. lets see… weights are increased or decreased by the gradient decent algorithm which in turn is controlled the the loss function. So you might be getting the intuitions now, we could control the loss function by penalizing it if the weights increase and by penalization we mean to increase the loss function since gradient descent algorithm concentrates on decreasing the loss and by us penalizing the loss would alter the gradient decent in updating the values. There we go that’s the solution.


L1 Regularization

l1 norm says that if we add the sum of the magnitude of weight values to the loss function, we could control overfitting. Simple, but lets take a look at how its done mathematically and then we’ll get into the details.


Problems with L1 regularization

the main problem with l1 regularization occurs because of the mod function on the weights. what happens is the in the calculation of gradient of decent the loss function is differentiated now with the addition of the mod function, it becomes expensive in computation. this happens so because modulus function is not differentiable at zero. Hence there was a need to tackle this problem and l2 regularization was introduced.

L2 Regularization

the processes and concept of l2 regularization is similar to that of L1, the major difference is the the penalty. here the penalty is the sum of square of the weights multiplied by lambda. The lambda constant in both the cases are the same “functionally”, so I would mention more on that. Below is the equation of loss function with l2 regularizer.

Fig:5 Blue indicates X² and red indicates mod(X) plots.(source: towards data science )



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store