Chap. 17 Clustering. Objectives of Data Analysis. Classification Pattern recognition Diagnostics Trends Outliers Quality control Discrimination Regression Data comparisons. Clustering.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Objectives of Data Analysis
Clustering
Example: Amino Acid (AA) - Basic
Clustering of AAs
Physico-Chemical Properties
Red: acidic
Orange: basic
Green: polar
(hydrophillic)
Yellow: non-polar
(hydrophobic)
Hierarchical Clustering
Hierarchical_Clustering (d, n)
Form n clusters, each with 1 element
Construct a graph T by assigning an isolated vertex to each cluster
while there is more than 1 cluster
Find the two closest clusters C1 and C2
Merge C1 and C2 into new cluster C with | C1 | + | C2| elements
Compute distance from C to all other clusters
Add a new vertex C to T
Remove rows and columns of d for C1 and C2, and add for C
return T
Hierarchical Clustering
Dayhoff Clustering - 1978
Murphy, Wallqvist, Levy, 2000
k-mean Clustering
k-means Clustering Problem
Given n data points, find k center points minimizing the squared error
distortion,
d(V, X) = ∑id(vi,X)2/n
input: A set V of n data points and a parameter k
output: A set X consisting of k center points minimizing d(V,X)
over all possible choices of X
K-mean Clustering-2
Progressive_Greedy_k-means(n)
Select an arbitray partition P into k clusters
while forever
bestChange = 0
for every cluster C
for every element i not in C
if moving i to C reduces Cost(P)
if Δ(i → C) > bestChange
bestChange ← Δ(i → C)
i* = i
C* = C
if bestChange >0
change partition P by moving i* to C*
else
return P