Problem Motivation
It’s mainly for unsupervised problem, that there’s some aspects of it that are also very similar to sort of the supervised learning problem. some examples :
- detect strange behavior or fraudulent behavior
- manufacturing
- monitoring computers in a data center
Gaussian Distribution
Gaussian distribution
[latex]x\sim N(\mu , \sigma ^2)[/latex]
Gaussian probability density
[latex]p(x, \mu , \sigma ^2) = \frac {1}{\sqrt{2\pi }\sigma } exp(-\frac {(x - \mu)^2}{2 \sigma ^2})[/latex]
The location of the center of this bell-shaped curve
[latex]\mu = \frac {1}{m} \sum _{i=1}^{m}x^{(i)}[/latex]
The width of this bell-shaped curve
[latex]\sigma ^ 2 = \frac {1}{m} \sum _{i=1}^{m} (x^{(i)} - \mu) ^2[/latex] Notice : The formula here we use [latex]m[/latex] instead of [latex]m - 1[/latex] which is used in a statistics.
Algorithm
Address anomaly detection : [latex]\mu _j = \frac {1}{m} \sum _{i=1}^{m}x^{(i)} _j[/latex] [latex]\sigma ^ 2 _j = \frac {1}{m} \sum _{i=1}^{m} (x^{(i)}_j - \mu _j) ^2[/latex] [latex]p(x) = \prod _{j=1}^{n}p(x_j; \mu _j, \sigma ^2_j) = \prod _{j=1}^{1}\frac {1}{\sqrt{2\pi } \sigma _j} exp(-\frac {(x_j - \mu_j)^2}{2 \sigma ^2_j})[/latex] If [latex]p(x) < \varepsilon [/latex], it’s anomaly.
Developing and Evaluating an Anomaly Detection System
How to develop and evaluate an algorithm ?
- Take the training sets and fit the model [latex]p(x)[/latex]
- On the cross validation of the test set, try to use different [latex]\varepsilon[/latex], and then compute the F1 score
- After choosed [latex]\varepsilon[/latex], evaluation of the algorithm on the test sets
Anomaly Detection vs. Supervised Learning
[table id=3 /]
Choosing What Features to Use
- model the features using this sort of Gaussian distribution (play with different transformations of the data in order to make it look more Gaussian)
- do an error analysis procedure to come up with features for an anomaly detection algorithm
- create new features by combining me features
Multivariate Gaussian Distribution
[latex]p(x) = \prod _{j=1}^{n}p(x_j; \mu, \sigma ^2_j) = \prod _{j=1}^{n}\frac {1}{\sqrt{2\pi } \sigma _j} exp(-\frac {(x_j - \mu_j)^2}{2 \sigma ^2_j})[/latex] [latex]\mu = \frac {1}{m} \sum _{i=1}^{m}x^{(i)}[/latex] [latex]\sum = \frac {1}{m} \sum_{i=1}^{m} (x^{(i)} - \mu )(x^{(i)} - \mu )^T = \frac {1}{m} (X - \mu)^T(X - \mu)[/latex] [latex]p(x) = \frac {1}{(2 \pi)^{\frac {n}{2}}\left \sum \right ^{\frac {1}{2}}} exp(-\frac {1}{2} (x-\mu)^T\sum ^{-1}(x-\mu))[/latex] [table id=4 /]