Chapter 5: Clustering. Searching for groups. Clustering is unsupervised or undirected. Unlike classification, in clustering, no pre-classified data. Search for groups or clusters of data points (records) that are similar to one another.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Chapter 5: Clustering
1 2 3
0.5 0.2 0.3
where (xi1, xi2, …, xip) and(xj1, xj2, …, xjp) are two p-dimensional data objects, and q is a positive integer
yif = log(xif)
dij(f) = 0 if xif = xjf , or dij(f) = 1 o.w.
Typical convergence criteria are: no (or minimal) reassignment of data points to new cluster centers, or minimal decrease in squared error.
p is a point and mi is the mean of cluster Ci
Outlier (100 unit away)
where n is # of data, k is # of clusters
a b c d e
c d e
A Dendrogram Shows How the Clusters are Merged Hierarchically
Other Data Mining Methods
E.g., If a customer buys a bed, he/she is likely to come to buy a mattress later
2-D, 3-D scatter plots, bar charts, pie charts, line plots, animation, etc.
Rule visualizer, cluster visualizer, etc