Model Representation
\(m\) : the number of training examples
\(x\) : input variables
\(y\) : output variables
\((x, y)\) : a single training example
\((x^{(i)}, y^{(i)})\) : refer to the ith training example
\(h\) : representing the hypothesis
Cost Function
\(J({\theta }_0, {\theta }_1) = \frac {1}{2m}\sum ^{m}_{i=1}( h_{\theta }(x^{(i)}) – y^{(i)})^2\)Cost function is also called the squared error function or sometimes called the square error cost function.
Gradient Descent
repeat until convergence {

\({\theta }_j := {\theta }_j – {\alpha }\frac {\partial }{\partial {\theta }_j}J({\theta }_0 – {\theta }_1)
\) (for j = 0 and j = 1)
}
when people talk about gradient descent, they always mean simultaneous update.
Gradient Descent For LinearRegression
 The term batch gradient descent means that refers to the fact that, in every step of gradient descent we’re looking at all of the training examples.
 In case it turns out gradient descent will scale better to larger data sets than that normal equals method.