## Problem Motivation

It’s mainly for unsupervised problem, that there’s some aspects of it that are also very similar to sort of the supervised learning problem.

**some examples : **

- detect strange behavior or fraudulent behavior
- manufacturing
- monitoring computers in a data center

## Gaussian Distribution

#### Gaussian distribution

\(x\sim N(\mu , \sigma ^2)\)

#### Gaussian probability density

\(p(x, \mu , \sigma ^2) = \frac {1}{\sqrt{2\pi }\sigma } exp(-\frac {(x – \mu)^2}{2 \sigma ^2})\)

#### The location of the center of this bell-shaped curve

\(\mu = \frac {1}{m} \sum _{i=1}^{m}x^{(i)}\)

#### The width of this bell-shaped curve

\(\sigma ^ 2 = \frac {1}{m} \sum _{i=1}^{m} (x^{(i)} – \mu) ^2\)

**Notice : **The formula here we use \(m\) instead of \(m – 1\) which is used in a statistics.

## Algorithm

Address anomaly detection :

\(\mu _j = \frac {1}{m} \sum _{i=1}^{m}x^{(i)} _j\)\(\sigma ^ 2 _j = \frac {1}{m} \sum _{i=1}^{m} (x^{(i)}_j – \mu _j) ^2\)

\(p(x) = \prod _{j=1}^{n}p(x_j; \mu _j, \sigma ^2_j) = \prod _{j=1}^{1}\frac {1}{\sqrt{2\pi } \sigma _j} exp(-\frac {(x_j – \mu_j)^2}{2 \sigma ^2_j})\)

**If \(p(x) < \varepsilon \), it’s anomaly.**

## Developing and Evaluating an Anomaly Detection System

**How to develop and evaluate an algorithm ?**

- Take the training sets and fit the model \(p(x)\)
- On the cross validation of the test set, try to use different \(\varepsilon\), and then compute the F1 score
- After choosed \(\varepsilon\), evaluation of the algorithm on the test sets

## Anomaly Detection vs. Supervised Learning

Anomaly Detection | Supervised Learning |
---|---|

very small number of positive, and a relatively large number of negative examples | a reasonably large number of both positive and negative examples |

many different types of anomalies | have enough positive examples for an algorithm to get a sense of what the positive examples are like |

future anomalies may look nothing like the ones you've seen so far | |

fraud detection, manufacturing, data center | SPAM email, weather prediction, classifying cancers |

## Choosing What Features to Use

- model the features using this sort of Gaussian distribution (play with different transformations of the data in order to make it look more Gaussian)
- do an error analysis procedure to come up with features for an anomaly detection algorithm
- create new features by combining me features

## Multivariate Gaussian Distribution

\(p(x) = \prod _{j=1}^{n}p(x_j; \mu, \sigma ^2_j) = \prod _{j=1}^{n}\frac {1}{\sqrt{2\pi } \sigma _j} exp(-\frac {(x_j – \mu_j)^2}{2 \sigma ^2_j})\)\(\mu = \frac {1}{m} \sum _{i=1}^{m}x^{(i)}\)

\(\sum = \frac {1}{m} \sum_{i=1}^{m} (x^{(i)} – \mu )(x^{(i)} – \mu )^T = \frac {1}{m} (X – \mu)^T(X – \mu)\)

\(p(x) = \frac {1}{(2 \pi)^{\frac {n}{2}}\left | \sum \right | ^{\frac {1}{2}}} exp(-\frac {1}{2} (x-\mu)^T\sum ^{-1}(x-\mu))\)

Gaussian Distribution | Multivariate Gaussian Distribution |
---|---|

Manually create features to capture anomalies | Automatically captures correlations between features |

Computationally cheaper | |

Must have m > 10n or else sum is non-invertible |