4 Linear Regression with Multiple Variables

Linear Regression with Multiple Variables

\(x^{(i)}_j\) : refer to feature number j in the x factor.

Gradient Descent for Multiple Variables

  1. def computeCost(X, y, theta):
  2.     inner = np.power(((X * theta.T) - y), 2)
  3.     return np.sum(inner) / (2 * len(X))

Gradient Descent in Practice I – Feature Scaling

the different features take on similar ranges of values, then gradient descents can converge more quickly.

x_n = \frac {x_n -\mu _n}{s_n}
\(\mu _n\) : the average value
\(s_n\) : the range of values

Gradient Descent in Practice II – Learning Rate

Looking at figure that pluck the cost function j of theta as gradient descent runs and the x-axis here is the number of iteration of gradient descent and as gradient descent runs.

Maybe \(\alpha = 0.01, 0.03, 0.1, 0.3, 1, 3, 10\)

Features and Polynomial Regression

Look at the data and choose features.

You can put polynomial functions as well and sometimes by appropriate insight into the feature simply get a much better model for your data.

Feature Scaling is very necessary if you use polynomial functions.

Normal Equation

The normal equation, which for some linear regression problems, will give us a much better way to solve for the optimal value of the parameters theta.

\(\theta = (X^{T}X)^{-1}X^{T}y\)

Disadvantages of gradient descent

  • choose the learning rate Alpha.
  • many more iterations
  • Advantages of gradient descent

  • works pretty well even if you have millions of features
  • many kinds of models
  • Normal Equation Noninvertibility

    Some matrices are invertible and some matrices do not have an inverse we call those non-invertible matrices.

    Look at your features and see if you have redundant features or being a linear function of each other, and if you do have redundant features and if you just delete one of these features you really don’t need both of these features that will solve your non-invertibility problem.