2 Linear Regression with One Variable

Model Representation

\(m\) : the number of training examples

\(x\) : input variables

\(y\) : output variables

\((x, y)\) : a single training example

\((x^{(i)}, y^{(i)})\) : refer to the ith training example

\(h\) : representing the hypothesis

Cost Function

\(J({\theta }_0, {\theta }_1) = \frac {1}{2m}\sum ^{m}_{i=1}( h_{\theta }(x^{(i)}) – y^{(i)})^2\)

Cost function is also called the squared error function or sometimes called the square error cost function.

Gradient Descent

repeat until convergence {

    \({\theta }_j := {\theta }_j – {\alpha }\frac {\partial }{\partial {\theta }_j}J({\theta }_0 – {\theta }_1)
    \) (for j = 0 and j = 1)


when people talk about gradient descent, they always mean simultaneous update.

Gradient Descent For LinearRegression

    The term batch gradient descent means that refers to the fact that, in every step of gradient descent we’re looking at all of the training examples.
    In case it turns out gradient descent will scale better to larger data sets than that normal equals method.

Leave a Reply

Your email address will not be published. Required fields are marked *