## Model Representation

[latex]m[/latex] : the number of training examples [latex]x[/latex] : input variables [latex]y[/latex] : output variables [latex](x, y)[/latex] : a single training example [latex](x^{(i)}, y^{(i)})[/latex] : refer to the ith training example [latex]h[/latex] : representing the hypothesis

## Cost Function

[latex]J({\theta }_0, {\theta }_1) = \frac {1}{2m}\sum ^{m}_{i=1}( h_{\theta }(x^{(i)}) - y^{(i)})^2[/latex]

**Cost function is also called the squared error function or sometimes called the square error cost function.**

## Gradient Descent

repeat until convergence {

[latex]{\theta }_j := {\theta }_j - {\alpha }\frac {\partial }{\partial {\theta }_j}J({\theta }_0 - {\theta }_1) [/latex] (for j = 0 and j = 1)

} *when people talk about gradient descent, they always mean simultaneous update.*

## Gradient Descent For LinearRegression

The term batch gradient descent means that refers to the fact that, in every step of gradient descent we’re looking at **all of the training examples**.

In case it turns out gradient descent will scale better to **larger data sets** than that normal equals method.