## Linear Regression with Multiple Variables

\(x^{(i)}_j\) : refer to feature number **j** in the **x** factor.

## Gradient Descent for Multiple Variables

def computeCost(X, y, theta):

inner = np.power(((X * theta.T) - y), 2)

return np.sum(inner) / (2 * len(X))

## Gradient Descent in Practice I – Feature Scaling

the different features take on similar ranges of values, then gradient descents can converge more quickly.

\(

x_n = \frac {x_n -\mu _n}{s_n}

\)

\(\mu _n\) : the average value

\(s_n\) : the range of values

## Gradient Descent in Practice II – Learning Rate

Looking at figure that pluck the cost function j of theta as gradient descent runs and the x-axis here is the number of iteration of gradient descent and as gradient descent runs.

Maybe \(\alpha = 0.01, 0.03, 0.1, 0.3, 1, 3, 10\)

## Features and Polynomial Regression

Look at the data and choose features.

You can put polynomial functions as well and sometimes by appropriate insight into the feature simply get a much better model for your data.

Feature Scaling is very necessary if you use polynomial functions.

## Normal Equation

The normal equation, which for some linear regression problems, will give us a much better way to solve for the optimal value of the parameters theta.

\(\theta = (X^{T}X)^{-1}X^{T}y\)#### Disadvantages of gradient descent

#### Advantages of gradient descent

## Normal Equation Noninvertibility

Some matrices are invertible and some matrices do not have an inverse we call those **non-invertible matrices**.

Look at your features and see if you have redundant features or being a linear function of each other, and if you do have redundant features and if you just delete one of these features you really don’t need both of these features that will solve your **non-invertibility problem**.