The Problem of Overfitting
Regularization, that will allow us to ameliorate or to reduce this overfitting problem and get these learning algorithms to maybe work much better.
If you were to fit a very highorder polynomial, if you were to generate lots of highorder polynomial terms of speeches, then, logistical regression may contort itself, may try really hard to find a decision boundary that fits your training data or go to great lengths to contort itself, to fit every single training example well.
But this really doesn’t look like a very good hypothesis, for making predictions.
The term generalized refers to how well a hypothesis applies even to new examples.
In order to address over fitting, there are two main options for things that we can do.
Cost Function
Orignal Model :
\(h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2x\underset{2}{2} + \theta_3x\underset{3}{3} + \theta_4x\underset{4}{4}
\)
Modified Model :
\(\underset{\theta }{min}\frac{1}{2m}[\sum_{i=1}^{m}(h_\theta(x^{(i)}y^{(i)})^2+1000\theta\underset{3}{2}+10000\theta \underset{4}{2})]
\)
Suppose :
\(J(\theta) = \frac {1}{2m} [\sum_{i=1}^{m} (h_\theta(x^{(i)}y^{(i)})^2 + \lambda \sum_{j=1}^{n} \theta _j^2]
\)
regularization parameter : \(\lambda\)
Notice :
\(\lambda \sum_{j=1}^{n} \theta _j^2\)The extra regularization term at the end to shrink every single parameter and so this term we tend to shrink all of my parameters.
Regularized Linear Regression
\(J(\theta) = \frac {1}{2m} [\sum_{i=1}^{m} (h_\theta(x^{(i)}y^{(i)})^2 + \lambda \sum_{j=1}^{n} \theta _j^2]
\)
repeat until convergence {

\(\theta _0 := \theta _0 – \alpha \frac {1}{m} \sum _{i=1}^{m} (h_\theta (x^{(i)}) – y^{(i)})x_0^{(i)}\)
\(\theta _j := \theta _j – \alpha [\frac {1}{m} \sum _{i=1}^{m} (h_\theta (x^{(i)}) – y^{(i)})x_j^{(i)} + \frac {\lambda }{m}\theta _j]\)
}
Modified :
\(\theta _j := \theta _j(1 – \alpha \frac {\lambda }{m}) – \alpha \frac {1}{m} \sum _{i=1}^{m} (h_\theta (x^{(i)}) – y^{(i)})x_j^{(i)}\)Regularized Logistic Regression
\(J(\theta) = \frac {1}{m}\sum _{i=1}^{m} [y^{(i)} log(h_\theta (x^{(i)})) – (1 – y^{(i)})log(1 – h_\theta (x^{(i)}))] + \frac {\lambda}{2m} \sum _{i=1}^{n} \theta _j^{2}\)Python Code :
import numpy as np
def costReg(theta, X, y, learningRate):
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
first = np.multiply(y, np.log(sigmoid(X * theta.T)))
second = np.multiply((1  y), np.log(1  sigmoid(X * theta.T)))
reg = learningRate / (2 * len(X)) * np.sum(np.power(theta[:,1:theta.shape[1]], 2))
return np.sum(first  second) / len(X) + reg