## Unsupervised Learning Introduction

In unsupervised learning, what we do is, we give this sort of unlabeled training set to an algorithm and we just ask the algorithm: find some structure in the data for us. Given this data set, one type of structure we might have an algorithm find, is that it looks like this data set has points grouped into two separate clusters and so an algorithm that finds that clusters like the ones I just circled, is called a **clustering algorithm**.

#### So what is clustering good for?

- Market segmentation
- Social network analysis
- Organize compute clusters or to organize data centers
- Understand galaxy formation and astronomical detail

## K-Means Algorithm

The K Means algorithm is by far the most popular, by far the most widely used clustering algorithm. K Means is an iterative algorithm and it does two things.

*randomly initialize two points, called the cluster centroids*

- cluster assignment step
- move centroid step

Repeat {

for i = 1 to m

c(i) := index (from 1 to K) of cluster centroid closet to x(i)

for k = 1 to K

µk := average (mean) of points assigned to cluster k

}

## Optimization Objective

**Distortion function : ** [latex]J(c^{(1)},\cdots c^{(m)}, \mu_1,\cdots \mu_K) = \frac{1}{m} \sum_{i=1}^{m}\left \ X^{(i)}-\mu_c(i) \right \^2[/latex] [latex]\mu_c(i)[/latex] : the distance between [latex]X^{(i)}[/latex] and the cluster centroid

## Random Initialization

K-means can end up converging to different solutions depending on exactly how the clusters were initialized, and so, depending on the random initialization. K-means can end up at different solutions. And, in particular, K-means can actually end up at local optima. How to initialize K-means and how to make K-means avoid local optima as well. What we can do is, **initialize K-means lots of times and run K-means lots of times**, and use that to try to make sure we get as good a solution, as good a local or global optima as possible. If **the number of clusters** is anywhere from **two up to maybe 10** then doing multiple random initialization can often, can sometimes make sure that you find a better local optima. **But if K is very large, less likely to make a huge difference.**

## Choosing the Number of Clusters

There **actually isn’t a great way** of answering this or doing this automatically and by far the most common way of choosing the number of clusters, is still **choosing it manually** by **looking at visualizations** or by **looking at the output of the clustering algorithm** or something else. One method is called the **Elbow Method,** but don’t always expect that to work well.