Chapter 5: Clustering. Searching for groups. Clustering is unsupervised or undirected. Unlike classification, in clustering, no pre-classified data. Search for groups or clusters of data points (records) that are similar to one another.
Chapter 5: Clustering
1 2 3
0.5 0.2 0.3
where (xi1, xi2, …, xip) and(xj1, xj2, …, xjp) are two p-dimensional data objects, and q is a positive integer
yif = log(xif)
dij(f) = 0 if xif = xjf , or dij(f) = 1 o.w.
Typical convergence criteria are: no (or minimal) reassignment of data points to new cluster centers, or minimal decrease in squared error.
p is a point and mi is the mean of cluster Ci
Outlier (100 unit away)
where n is # of data, k is # of clusters
a b c d e
c d e
A Dendrogram Shows How the Clusters are Merged Hierarchically
Other Data Mining Methods
E.g., If a customer buys a bed, he/she is likely to come to buy a mattress later
2-D, 3-D scatter plots, bar charts, pie charts, line plots, animation, etc.
Rule visualizer, cluster visualizer, etc