15 Anomaly Detection

Problem Motivation

It’s mainly for unsupervised problem, that there’s some aspects of it that are also very similar to sort of the supervised learning problem. some examples :

detect strange behavior or fraudulent behavior
manufacturing
monitoring computers in a data center

Gaussian Distribution

Gaussian distribution

[latex]x\sim N(\mu , \sigma ^2)[/latex]

Gaussian probability density

[latex]p(x, \mu , \sigma ^2) = \frac {1}{\sqrt{2\pi }\sigma } exp(-\frac {(x - \mu)^2}{2 \sigma ^2})[/latex]

The location of the center of this bell-shaped curve

[latex]\mu = \frac {1}{m} \sum _{i=1}^{m}x^{(i)}[/latex]

The width of this bell-shaped curve

[latex]\sigma ^ 2 = \frac {1}{m} \sum _{i=1}^{m} (x^{(i)} - \mu) ^2[/latex] Notice : The formula here we use [latex]m[/latex] instead of [latex]m - 1[/latex] which is used in a statistics.

Algorithm

Address anomaly detection : [latex]\mu _j = \frac {1}{m} \sum _{i=1}^{m}x^{(i)} _j[/latex] [latex]\sigma ^ 2 _j = \frac {1}{m} \sum _{i=1}^{m} (x^{(i)}_j - \mu _j) ^2[/latex] [latex]p(x) = \prod _{j=1}^{n}p(x_j; \mu _j, \sigma ^2_j) = \prod _{j=1}^{1}\frac {1}{\sqrt{2\pi } \sigma _j} exp(-\frac {(x_j - \mu_j)^2}{2 \sigma ^2_j})[/latex] If [latex]p(x) < \varepsilon [/latex], it’s anomaly.

Developing and Evaluating an Anomaly Detection System

How to develop and evaluate an algorithm ?

Take the training sets and fit the model [latex]p(x)[/latex]
On the cross validation of the test set, try to use different [latex]\varepsilon[/latex], and then compute the F1 score
After choosed [latex]\varepsilon[/latex], evaluation of the algorithm on the test sets

Anomaly Detection vs. Supervised Learning

[table id=3 /]

Choosing What Features to Use

model the features using this sort of Gaussian distribution (play with different transformations of the data in order to make it look more Gaussian)
do an error analysis procedure to come up with features for an anomaly detection algorithm
create new features by combining me features

Multivariate Gaussian Distribution

[latex]p(x) = \prod _{j=1}^{n}p(x_j; \mu, \sigma ^2_j) = \prod _{j=1}^{n}\frac {1}{\sqrt{2\pi } \sigma _j} exp(-\frac {(x_j - \mu_j)^2}{2 \sigma ^2_j})[/latex] [latex]\mu = \frac {1}{m} \sum _{i=1}^{m}x^{(i)}[/latex] [latex]\sum = \frac {1}{m} \sum_{i=1}^{m} (x^{(i)} - \mu )(x^{(i)} - \mu )^T = \frac {1}{m} (X - \mu)^T(X - \mu)[/latex] [latex]p(x) = \frac {1}{(2 \pi)^{\frac {n}{2}}\left \sum \right ^{\frac {1}{2}}} exp(-\frac {1}{2} (x-\mu)^T\sum ^{-1}(x-\mu))[/latex] [table id=4 /]