Loading in 2 Seconds...
Loading in 2 Seconds...
Data Warehousing/Mining Comp 150 DW Chapter 8. Cluster Analysis. Instructor: Dan Hebert. Chapter 8. Cluster Analysis. What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods DensityBased Methods
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Instructor: Dan Hebert
where
where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two pdimensional data objects, and q is a positive integer
1/q
q
q
q
=

+

+
+

d
(
i
,
j
)
(
x
x


x
x

...

x
x

)
i
j
i
j
i
j
1
1
2
2
p
p
Object j
a  i & j both equal 1
b  i=1, j=0
…
Object i
Most likely to have same disease
Unlikely to have same disease
yif = log(xif)
dij(f) = 0 if xif = xjf , or dij(f) = 1 o.w.
Step 1
Step 2
Step 3
Step 4
agglomerative
(AGNES)
a
a b
b
a b c d e
c
c d e
d
d e
e
divisive
(DIANA)
Step 3
Step 2
Step 1
Step 0
Step 4
Hierarchical ClusteringBottomup
Places each object in its own
cluster & then merges
Topdown
All objects in a single
cluster & then subdivides
A Dendrogram Shows How the Clusters are Merged Hierarchically
Construct
Sparse Graph
Partition the Graph
Data Set
Merge Partition
Final Clusters
MinPts = 5
Eps = 1 cm
q
DensityBased Clustering: BackgroundNEps (q) >= MinPts
q
o
DensityBased Clustering: Background (II)p
p1
q
Border
Eps = 1cm
MinPts = 5
Core
DBSCAN: Density Based Spatial Clustering of Applications with NoiseD
p1
o
p2
o
Max (coredistance (o), d (o, p))
r(p1, o) = 2.8cm. r(p2,o) = 4cm
MinPts = 5
e = 3 cm
It uses hatshape filters to emphasize region where points cluster, but simultaneously to suppress weaker information in their boundary