1 / 16

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data. 資工系 張蕙珠 01/04/2005. Clustering. Clustering is the unsupervised classification of patterns into groups (clusters)

ginny
Download Presentation

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data 資工系 張蕙珠 01/04/2005

  2. Clustering • Clustering is the unsupervised classification of patterns into groups (clusters) • Data objects are similar in same cluster, but they are dissimilar in other clusters • Clustering is useful in pattern-analysis, market or customer segmentation, machine learning, information retrieval, and data mining

  3. Clustering class • Partitioned clustering • Hierarchical clustering

  4. Partitioned clustering • Give k, the number of partitions construct • Create an initial partitioning • Use an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another

  5. Hierarchical clustering • The bottom-up approach starts with each object forming a separate group, it successively merges the objects or group close to one another, until a termination condition holds. • The top-down approach starts with all the objects in the same cluster. In each successive iteration, a cluster is split up into smaller clusters, until eventually each object is in one cluster, or until a termination condition holds.

  6. Motive • partitioning clustering need to refine initial cluster centers (e.g.) • partitioning clustering need an accurate number of cluster (e.g.)

  7. RIC clustering algorithm Definition (1) Width stands for number of distinct data item in dataset as W (2) Height stands for appeared times of each data item as H (3) Average height stands for averaging height of each data item as Avg-H

  8. RIC Clustering Algorithm Step1:count the item of market bucket data( W , H, Avg-H) Step2:select item(H) > Avg-H Step3: combine selected item Step4: repeat Count result of permutation and combination of every distinct combination item set, and compute the H、W and Avg-H select combined item set > Avg-H Step5:until (get k cluster center) or (all H(i) < Avg-H) or (last result doesn’t change)

  9. Example 1.H(A)=2,H(B)=9,H(C)=4,H(D)=8,H(E)=2,H(F)=3,H(G)=2,H(H)=5,H(I)=6,W=9,Avg-H= (2+9+4+8+2+3+2+5+6)/9 ≒ 4 2.select item if item(H) >= Avg-H then H(B)=9,H(C)=4,H(D)=8,H(H)=5,H(I)=6 3.Combine selected item {BD,BC,CD,BH,CH,DH,HI,BI,CI,DI} 4. H(B,D)=4,H(B,C)=3,H(C,D)=2,H(D,H)=4,H(B,I)=4,H(C,I)=2,otherwise H=0,W=10 ,Avg-H=(4+3+2+4+4+2)/10≒2 5.Select combined set then H(B,D)=4,H(B,C)=3,H(C,D)=2,H(D,H)=4,H(B,I)=4,H(C,I)=2

  10. Example(con’d) 6.select item(i) for combination {B,C,D}is combined from (B,D)、(B,C) and (C,D) {B,C,I} is combined from (B,I)、(B,C) and (C,I) {D,H} So center1={B,C,D},center2={D,H},center3={B,C,I}

  11. Result Then continue clustering process and result on below

  12. Improvement • RIC permits one pass clustering process to converge to a good solution, but partitioning clustering method need an unknown number of iterations to converge to a good solution • RIC leads to an optimal solution, but partitioning clustering method require result what it is acceptable but far from the optimal one. • RIC prevents acquiring inaccurate clustering result derived from user defining unfit number of cluster, but partitioning clustering method can’t prevent.

  13. partitioning clustering method to clustering use an iterative procedure what converges to one of numerous local minima. • these iterative techniques are especially sensitive to initial starting condition • The refined initial cluster center allow the clustering process to converge to a better local minimum

  14. arbitrary selection cluster centers C1={H}(id=230) C2={D, H}(id=320) C3={A, B, I}(id=250) distance formule d(transi , centerj) = #(transi∩centerj) / #(transi) thread-hold=1/2

  15. Motive→

  16. We acquire inaccurate clustering result when we setting number of cluster doesn’t conform to actual data distribution. • A great deal of data can not be belonged anyone cluster if we setting number of cluster is smaller than real number of cluster. • Someone cluster doesn’t contain any data object if we setting number of cluster is bigger than real number of cluster. →

More Related