## Classification

Classification : [latex]y = [/latex] 0 or 1 [latex]h_\theta (x)[/latex] can be > 1 or < 0

Logistic Regression: [latex]0 \leq h_\theta (x) \leq 1 [/latex] Logistic regression which has the property that the output, the predictions of logistic regression are always between zero and one. Logistic Regression is actually a classification algorithm.

## Hypothesis Representation

**Sigmoid function** : [latex]g(z) = \frac {1}{1+e^{-z}}[/latex]

import numpy as np

def sigmoid(z):

return 1 / (1 + np.exp(-z))

## Decision Boundary

Much higher order polynomials, then it’s possible to show that you can get even more complex decision boundaries and logistic regression can be used to find the zero boundaries.

## Cost Function

How to fit the parameters theta for logistic regression. In particular, I’d like to define the optimization objective or the cost function that we’ll use to fit the parameters. Here’s to supervised learning problem of fitting a logistic regression model.

#### Linear regression the cost function

[latex] J(\theta ) = \frac {1}{m} \sum_{i = 1}^{m} \frac {1}{2}(h_\theta(x^{(i)}) - y^{(i)})^{2} [/latex]

#### Logistic regression the cost function

[latex] J(\theta ) = \frac {1}{m} \sum_{i = 1}^{m} Cost(h_\theta(x^{(i)}), y^{(i)}) [/latex]

[latex] Cost(h_\theta (x), y) = \begin{cases} -log(h_\theta(x)) & \text{ if } y=1 \\ -log(1 - h_\theta(x)) & \text{ if } y=0 \end{cases} [/latex]

## Simplified Cost Function and Gradient Descent

How to implement a fully working version of logistic regression. It’s too hard to notes here , if you are interested in details, you can visit coursera.org A vectorized implementation can update, you know, all of these N plus 1 parameters all in one fell swoop. Feature scaling can help gradient descents converge faster for linear regression. The idea of feature scaling also applies to gradient descent for logistic regression.

## Advanced Optimization

For gradient descent, I guess technically you don’t actually need code to compute the cost function [latex]J_\theta[/latex]. You only need code to compute the derivative terms. Conjugate gradient BFGS and L-BFGS are examples of more sophisticated optimization algorithms. **These algorithms have a number of advantages:*** do not need to manually pick the learning rate alpha.

*It is actually entirely possible to use these algorithms successfully and apply to lots of different learning problems without actually understanding the inter-loop of what these algorithms do.* For these algorithms also what I would recommend you do is just **use a software library**. Sophisticated optimization library, it makes the just a little bit more opaque and so just maybe a little bit harder to debug. But these algorithms **often run much faster** than gradient descent. If you have **a large machine learning problem**, you can use these algorithms instead of using gradient descent.

## Multiclass Classification_ One-vs-all

one-versus-all classification Do the same thing for the third class and fit a third classifier H superscript 3 of X and maybe this or give us a classifier that separates the positive and negative examples like that. Basically pick the classifier, pick whichever one of the three classifiers is most confident, or most enthusistically says that it thinks it has a right class. And with this little method you can now take the logistic regression classifier and make it work on multi-class classification problems as well.