1 / 32

Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin

國立雲林科技大學 National Yunlin University of Science and Technology. Hierarchical Growing Cell Structures: TreeGCS. Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin.

hashim
Download Presentation

Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Victoria J. Hodge Jim Austin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 國立雲林科技大學 National Yunlin University of Science and Technology Hierarchical Growing Cell Structures: TreeGCS Advisor :Dr. Hsu Graduate:Ching-Lung Chen Author :Victoria J. Hodge Jim Austin IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 13, NO.2, MARCH/APRIL 2001

  2. Outline N.Y.U.S.T. I.M. • Motivation • Objective • Introduction • GCS • TreeGCS • Evaluation • Single-pass TreeGCS • Cyclic TreeGCS • Conclusions • Personal Opinion • Review

  3. Motivation N.Y.U.S.T. I.M. The GCS network topology is susceptible to the ordering of the input vectors. The original algorithm to visualization of dendograms for large data sets as there are too many leaf nodes and branches to visualize. Parameter selection is a combinatorial problem.

  4. Objective N.Y.U.S.T. I.M. To overcome the instability problem in GCS approach. To overcome the visualization of dendograms for large data sets. To recommendations for effective parameter combinations for TreeGCS that are easily derived.

  5. Introduction 1/3 N.Y.U.S.T. I.M. • Clustering algorithms have been investigated previously • However, nearly all clustering techniques suffer from at least one of the following: • assume specific forms for the probability distribution e.x. : normal • require unique global minima of the input probability distribution • The cannot handle identical cluster similarities. • do not scale well as the training time is often O(n2) • Require prior knowledge to set parameters.

  6. Introduction 2/3 N.Y.U.S.T. I.M. The hierarchy may e formed agglomeratively (buttoon-up) by progressively merging the most similar clusters. TreeGCS is an unsupervised, growing, self-organizing hierarchy of nodes able to form discrete clusters. In TreeGCS, high dimensional inputs are mapped onto a two-dimensional hierarchy reflecting the topological ordering of the input space. TreeGCS is similar to HiGS.

  7. Introduction 3/3 N.Y.U.S.T. I.M. However, the structure of HiGS does not match our requirements. The toplology induced for HiGS is not a tree configuration as the parent must be a member of a cluster of cardinality at least three. The HiGS algorithm generates child clusters and periodically deletes superfluous children so, at any particular time, the tree representation may be incorrect. Our proposal maintains the correct cluster topology at each epoch.

  8. GCS 1/7 N.Y.U.S.T. I.M. GCS is a two-dimensional structure of cell linked by vertices. Each cell has a neighborhood defined as those cells directly linked by a vertex to the cell. The adaptation strength is constant over time and only the best matching unit (bmu) and its direct topological neighbors are adapted, unlike SOM. Each cell has a winner counter denoting the number of times that cell has been the bmu.

  9. GCS 2/7 N.Y.U.S.T. I.M. The GCS algorithm is described below. Initialized in (1) and (2~7) represent one iteration. A random triangular structure of connected cells with attached vectors ( ) and E representing the winner counter. The next random input vector is selected from the input vector density distribution. The bmu is determined for and the bmu’s winning counter is incremented.

  10. GCS 3/7 N.Y.U.S.T. I.M. • The bmu and its neighbor are adapted toward by adaptation increments set by the user. • If the number of input signals exceeds a threshold set by the user, a new cell ( ) is inserted between the cell with the highest winning counter ( ) and its farthest neighbor (wf) Fig. 2

  11. GCS 4/7 N.Y.U.S.T. I.M. • The winner counter of all neighbors of is redistributed to donate fractions of the neighboring cells’ winning counters to the new cell.The winner counter for the new cell is set to the total decremented: • After a user-specified number of iterations, the cell with the greatest mean Euclidean distance between itself and its neighbors is deleted and any cells within the neighborhood that would be left “dangling” are also deleted (see Fig. 3).

  12. GCS 5/7 N.Y.U.S.T. I.M. • The winning counter variable of all cells is decreased by a user-specified factor to implement temporal decay:

  13. GCS 6/7 N.Y.U.S.T. I.M. • The user-specified parameters are: • The dimensionality of GCS, which is fixed. • The maximum number of neighbor connections per cell • The maximum cells in the structure, • The adaptation step for the winning cell, • The adaptation step of the neighborhood, • The temporal decay factor; • The number of iterations for insertion • The number of iterations for deletion.

  14. GCS 7/7 N.Y.U.S.T. I.M. • Fritzke has demonstrated superior performance for the GCS over SOMs. Superiority with respect to: • Topology preservation, with similar input vectors being mapped onto identical or closely neighboring neurons ensuring robustness against distortions. • Neighboring cells having similar attached vectors, ensuring robustness. If the dimensionality of the input vectors is greater than the network dimensionality, then the mapping usually preserves the similarities among the input vectors. • Lower distribution-modeling error (which is the standard deviation of all counters divided by the mean value of the counters).

  15. GCS Evaluation N.Y.U.S.T. I.M. The run-time is complexity for GCS :(numberCells * dimension * numberInputs * epochs) The GCS algorithm was susceptible in data order. In this paper, we utilize three data orderings to illustrate the initial susceptibility of the algorithm to input data order and how cycling improves the stability.

  16. TreeGCS 1/2 N.Y.U.S.T. I.M. When the cluster subdivides, new node are added to the tree to reflect the additional cluster (Fig. 4) Only leaf nodes maintain a cluster list. The hierarchy generation is run once after each GCS epoch

  17. TreeGCS 2/2 N.Y.U.S.T. I.M. If the number of clusters has decreased, a cluster has been deleted and the associated tree node is deleted. (Fig.5) All tree nodes except leaf nodes have only a identifier and pointers to their children.

  18. Evaluation N.Y.U.S.T. I.M. • The data set is comprised of 41 countries in Europe each one with 47-dimensional real-valued vector. • We use three different orderings of the data to evaluate stability. • Alphabetical order of the country names. • Middle to front. • Numerical order.

  19. Dendogram N.Y.U.S.T. I.M. • If we take the dendogram as three cluster, the clusters produced are: • {Den, Fra, Ger, It, UK} • {Lux} • {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Mac, Mal, Mon, NL, Nor, Pol, Rom, SM, Ser, Slk, Sln, Spa, Swe, Swi, Ukr}. • The parameter setting for TreeGCS were: • The are six permutations of the three data orders (1,2,3) (1,3,2) (2,3,1) (2,1,3) (3,1,2) (3,2,1).

  20. Single-Pass TreeGCS 1/3 N.Y.U.S.T. I.M.

  21. Single-Pass TreeGCS 2/3 N.Y.U.S.T. I.M. Alphabetical order of countries (see Fig. 6). 34 {Lux, Ukr} 9 {Den, Fra, Ger, It, Spa, UK} 80 {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin ,Gib, Gre, Hun, Ice, Lat, Lie, Lit, Mac, Mal, Mon, NL, Nor, Pol, Rom, SM, Ser, Slk, Sln, Swi} 3 {Swe} Middle to front order of countries (see Fig. 6). 11a {Den, Fra, Ger, It, Spa, UK} 11b {Aus, Bel, NL, Swe, Swi, Ukr} 12 {Cze, Fin, Gre, Nor, Rom} 13 {Bul, Eir, Hun, Pol, Slk} 14 {Lux, Ice} 65 {Alb, And, Bos, Bul, Cro, Cyp, Est, Far, Gib, Lat, Lie, Lit, Mac, Mal, Mon, SM, Ser, Sln}

  22. Single-Pass TreeGCS 3/3 N.Y.U.S.T. I.M. Numerical order of first attributes (see Fig. 6). 29 {Aus, Bel, Den, Fra, Ger, It, NL, Spa, Swe, Swi,UK} 69 {Alb, And, Bos, Bul, Cro, Cyp, Est, Far, Gib, Ice, Lat, Lie, Lit, Mac, Mal, Mon, Pol, Rom, SM, Slk, Sln} 8 {Hun, Lux, Ser} 20 {Cze, Eir, Fin, Gre, Nor, Ukr}

  23. Cyclic TreeGCS 1/4 N.Y.U.S.T. I.M. D = alphabetical data order M =middle to front S = sorted numerically by the first attribute

  24. Cyclic TreeGCS 2/4 N.Y.U.S.T. I.M. 1. DMS (see Fig. 7). 18 {Den, Fra, Ger, It, NL, Spa, UK} 108 {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Lux, Mac, Mal, Mon, Nor, Pol, Rom, SM, Ser, Slk, Sln, Swe, Swi, Ukr} 2. DSM (see Fig. 7). 30 {Bel, Den, Fra, Ger, It, NL, Spa, Swe, UK} 8 {Aus, Lux, Ser, Swi, Ukr} 88 {Alb, And, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far,Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Mac, Mal, Mon, Nor, Pol, Rom, SM, Slk, Sln}

  25. Cyclic TreeGCS 3/4 N.Y.U.S.T. I.M. 3. MSD (see Fig. 7). 10 {Den, Fra, Ger, It, Spa, UK} 116 {Alb, And, Aus, Bel, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Lux, Mac, Mal, Mon, NL, Nor, Pol, Rom, SM, Ser, Slk, Sln, Swe, Swi, Ukr} 4. MDS (see Fig. 7). 17 {Den, Fra, Ger, It, NL, Spa, UK} 32 {Aus, Bel, Cze, Fin, Gre, Nor, Rom, Swe, Swi, Ukr} 11 {Bul, Eir, Hun, Lux, Ser, Slk} 66 {Alb, And, Bos, Cro, Cyp, Est, Far, Gib, Ice, Lat, Lie, Lit, Mac, Mal, Mon, Pol, SM, Sln}

  26. Cyclic TreeGCS 4/4 N.Y.U.S.T. I.M. 5. SDM (see Fig. 7). 18 {Den, Fra, Ger, It, NL, Spa, UK} 5 {Cze, Gre, Lux, Ser} 15 {Aus, Bel, Rom, Swe, Swi} 12 {Eir, Fin, Hun, Nor, Ukr} 76 {Alb, And, Bos, Bul, Cro, Cyp, Est, Far, Gib, Ice, Lat, Lie, Lit, Mac, Mal, Mon, Pol, SM, Slk, Sln} 6. SMD (see Fig. 7). 23 {Bel, Den, Fra, Ger, It, NL, Spa, UK} 90 {Alb, And, Bos, Bul, Cro, Cyp, Cze, Eir, Est, Far, Fin, Gib, Gre, Hun, Ice, Lat, Lie, Lit, Lux, Mac, Mal, Mon, Pol, SM, Ser, Slk, Sln} 13 {Aus, Nor, Rom, Swe, Swi, Ukr}

  27. Parameter Settings 1/2 N.Y.U.S.T. I.M. For the final column, a “T” indicates a static hierarchy and ”F” indicates that the hierarchy never became static.

  28. Parameter Settings 2/2 N.Y.U.S.T. I.M. For the final column, a “T” indicates a static hierarchy and “F” indicates that the hierarchy never became static.

  29. Analysis N.Y.U.S.T. I.M. One solution would be to maintain a list of the hierarchy nodes removed with details of parents and siblings. Another solution would be a posteriori manual inspection of the run-time output.

  30. Conclusions N.Y.U.S.T. I.M. TreeGCS overcoming the instability problem. The algorithm adaptively determines the depth of the cluster hierarchy; there is no requirement to prespecify network dimensions as with most SOM-based algorithms. The superimposed are no user-specified parameters for the hierarchy. A further advantage of our approach over dendograms is that leaf nodes in our hierarchy represent groups of input vectors.

  31. Personal Opinion N.Y.U.S.T. I.M. We can learning the skill of TreeGCS to subdivides cluster by added a new node to the tree in hierarchical clustering.

  32. Review N.Y.U.S.T. I.M. GCS seven point for one epoch. TreeGCS Parameter Settings

More Related