RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data 資工系張蕙珠 01/04/2005

Clustering • Clustering is the unsupervised classification of patterns into groups (clusters) • Data objects are similar in same cluster, but they are dissimilar in other clusters • Clustering is useful in pattern-analysis, market or customer segmentation, machine learning, information retrieval, and data mining

Clustering class • Partitioned clustering • Hierarchical clustering

Partitioned clustering • Give k, the number of partitions construct • Create an initial partitioning • Use an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another

Hierarchical clustering • The bottom-up approach starts with each object forming a separate group, it successively merges the objects or group close to one another, until a termination condition holds. • The top-down approach starts with all the objects in the same cluster. In each successive iteration, a cluster is split up into smaller clusters, until eventually each object is in one cluster, or until a termination condition holds.

Motive • partitioning clustering need to refine initial cluster centers (e.g.) • partitioning clustering need an accurate number of cluster (e.g.)

RIC clustering algorithm Definition (1) Width stands for number of distinct data item in dataset as W (2) Height stands for appeared times of each data item as H (3) Average height stands for averaging height of each data item as Avg-H

RIC Clustering Algorithm Step1:count the item of market bucket data( W , H, Avg-H) Step2:select item(H) > Avg-H Step3: combine selected item Step4: repeat Count result of permutation and combination of every distinct combination item set, and compute the H、W and Avg-H select combined item set > Avg-H Step5:until (get k cluster center) or (all H(i) < Avg-H) or (last result doesn’t change)

Example 1.H(A)=2，H(B)=9，H(C)=4，H(D)=8，H(E)=2，H(F)=3，H(G)=2，H(H)=5，H(I)=6，W=9，Avg-H= (2+9+4+8+2+3+2+5+6)/9 ≒ 4 2.select item if item(H) >= Avg-H then H(B)=9，H(C)=4，H(D)=8，H(H)=5，H(I)=6 3.Combine selected item {BD，BC，CD，BH，CH，DH，HI，BI，CI，DI} 4. H(B,D)=4，H(B,C)=3，H(C,D)=2，H(D,H)=4，H(B,I)=4，H(C,I)=2，otherwise H=0，W=10 ，Avg-H=(4+3+2+4+4+2)/10≒2 5.Select combined set then H(B,D)=4，H(B,C)=3，H(C,D)=2，H(D,H)=4，H(B,I)=4，H(C,I)=2

Example(con’d) 6.select item(i) for combination {B,C,D}is combined from (B,D)、(B,C) and (C,D) {B,C,I} is combined from (B,I)、(B,C) and (C,I) {D,H} So center1={B,C,D}，center2={D,H}，center3={B,C,I}

Result Then continue clustering process and result on below

Improvement • RIC permits one pass clustering process to converge to a good solution, but partitioning clustering method need an unknown number of iterations to converge to a good solution • RIC leads to an optimal solution, but partitioning clustering method require result what it is acceptable but far from the optimal one. • RIC prevents acquiring inaccurate clustering result derived from user defining unfit number of cluster, but partitioning clustering method can’t prevent.

partitioning clustering method to clustering use an iterative procedure what converges to one of numerous local minima. • these iterative techniques are especially sensitive to initial starting condition • The refined initial cluster center allow the clustering process to converge to a better local minimum

arbitrary selection cluster centers C1={H}(id=230) C2={D, H}(id=320) C3={A, B, I}(id=250) distance formule d(transi , centerj) = #(transi∩centerj) / #(transi) thread-hold=1/2

Motive→

We acquire inaccurate clustering result when we setting number of cluster doesn’t conform to actual data distribution. • A great deal of data can not be belonged anyone cluster if we setting number of cluster is smaller than real number of cluster. • Someone cluster doesn’t contain any data object if we setting number of cluster is bigger than real number of cluster. →

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data

Presentation Transcript

Data Mining: Clustering

Clustering and Partitioning for Spatial and Temporal Data Mining

Data Mining--Clustering

Clustering in Data Mining

Clustering Data Streams

Clustering Data Streams

Multithreaded Clustering for Multi-level Hypergraph Partitioning

Collaborative Clustering for Entity Clustering

Data Stream Clustering

Clustering Uncertain Data

Initial K-Means Clustering :

Data Clustering Methods

RIC: Parameter-Free Noise-Robust Clustering

Data Clustering

Clustering microarray data

Data Clustering

Clustering and Partitioning for Spatial and Temporal Data Mining

Clustering Biological Data

Clustering Categorical Data