Loading in 5 sec....

Machine Learning and Data Mining ClusteringPowerPoint Presentation

Machine Learning and Data Mining Clustering

- By
**dino** - Follow User

- 88 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Machine Learning and Data Mining Clustering' - dino

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Machine Learning and Data MiningClustering

(adapted from) Prof. Alexander Ihler

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAA

Unsupervised learning

- Supervised learning
- Predict target value (“y”) given features (“x”)

- Unsupervised learning
- Understand patterns of data (just “x”)
- Useful for many reasons
- Data mining (“explain”)
- Missing data values (“impute”)
- Representation (feature generation or selection)

- One example: clustering

Clustering and Data Compression

- Clustering is related to vector quantization
- Dictionary of vectors (the cluster centers)
- Each original value represented using a dictionary index
- Each center “claims” a nearby region (Voronoi region)

Another simple clustering algorithm

Define a distance between clusters (return to this)

Initialize: every example is a cluster

Iterate:

Compute distances between all clusters (store for efficiency)

Merge two closest clusters

Save both clustering and sequence of cluster operations

“Dendrogram”

Hierarchical Agglomerative ClusteringInitially, every datum is a cluster

Iteration 3

- Builds up a sequence of clusters (“hierarchical”)
- Algorithm complexity O(N2)
- (Why?)

In matlab: “linkage” function (stats toolbox)

Various experimental conditions

Cancer, normal

Time

Subjects

Explore similarities

What genes change together?

What conditions are similar?

Cluster on both genes and conditions

Example: microarray expressionK-Means Clustering

- A simple clustering algorithm
- Iterate between
- Updating the assignment of data to clusters
- Updating the cluster’s summarization

- Suppose we have K clusters, c=1..K
- Represent clusters by locations ¹c
- Example i has features xi
- Represent assignment of ith example as ziin 1..K

- Iterate until convergence:
- For each datum, find the closest cluster
- Set each cluster to the mean of all assigned data:

Choosing the number of clusters

- With cost functionwhat is the optimal value of k?
(can increasing k ever increase the cost?)

- This is a model complexity issue
- Much like choosing lots of features – they only (seem to) help
- But we want our clustering to generalize to new data

- One solution is to penalize for complexity
- Bayesian information criterion (BIC)
- Add (# parameters) * log(N) to the cost
- Now more clusters can increase cost, if they don’t help “enough”

Choosing the number of clusters (2)

- The Cattell scree test:

Dissimilarity

1

4

6

2

3

5

7

Number of Clusters

Scree is a loose accumulation of broken rock at the base of a cliff or mountain.

Mixtures of Gaussians

- K-means algorithm
- Assigned each example to exactly one cluster
- What if clusters are overlapping?
- Hard to tell which cluster is right
- Maybe we should try to remain uncertain

- Used Euclidean distance
- What if cluster has a non-circular shape?

- Gaussian mixture models
- Clusters modeled as Gaussians
- Not just by their mean

- EM algorithm: assign data to cluster with some probability

- Clusters modeled as Gaussians

Multivariate Gaussian models

5

Maximum Likelihood estimates

4

3

2

1

0

We’ll model each cluster using one of these Gaussian “bells”…

-1

-2

-2

-1

0

1

2

3

4

5

EM Algorithm: E-step

- Start with parameters describing each cluster
- Mean μc, Covariance Σc, “size”πc
- E-step (“Expectation”)
- For each datum (example) x_i,
- Compute “r_{ic}”, the probability that it belongs to cluster c
- Compute its probability under model c
- Normalize to sum to one (over clusters c)

- If x_i is very likely under the cth Gaussian, it gets high weight
- Denominator just makes r’s sum to one

EM Algorithm: M-step

- Start with assignment probabilities ric
- Update parameters: mean μc, Covariance Σc, “size”πc
- M-step (“Maximization”)
- For each cluster (Gaussian) x_c,
- Update its parameters using the (weighted) data points

Total responsibility allocated to cluster c

Fraction of total assigned to cluster c

Weighted covariance of assigned data

(use new weighted means here)

Weighted mean of assigned data

Expectation-Maximization

- Each step increases the log-likelihood of our model(we won’t derive this, though)
- Iterate until convergence
- Convergence guaranteed – another ascent method

- What should we do
- If we want to choose a single cluster for an “answer”?
- With new data we didn’t see during training?

ICML 2001

ICML 2001

ICML 2001

ICML 2001

ICML 2001

ICML 2001

ICML 2001

ICML 2001

Summary

- Clustering algorithms
- Agglomerative clustering
- K-means
- Expectation-Maximization

- Open questions for each application
- What does it mean to be “close” or “similar”?
- Depends on your particular problem…

- “Local” versus “global” notions of similarity
- Former is easy, but we usually want the latter…

- Is it better to “understand” the data itself (unsupervised learning), to focus just on the final task (supervised learning), or both?

Download Presentation

Connecting to Server..