1 / 36

Fully Automatic Clustering System

Fully Automatic Clustering System. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Giuseppe Patane Marco Russo Department of Information Management. IEEE Transactions on Neural Networks, vol. 13, no. 6, November 2002. Outline. Motivation Objective

lapis
Download Presentation

Fully Automatic Clustering System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fully Automatic Clustering System Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Giuseppe Patane Marco Russo Department of Information Management IEEE Transactions on Neural Networks, vol. 13, no. 6, November 2002

  2. Outline • Motivation • Objective • Introduction • VQ • Previous Works: ELBG • FACS • Results • Conclusion • Personal Opinion • Review

  3. Motivation • Fully automatic clustering? • The number of computations per iteration.

  4. Objective • In this paper, the fully automatic clustering system (FACS) is presented. • The objective is the automatic calculation of the codebook of the right dimension, the desired error being fixed. • In order to save on the number of computations per iteration, greedy techniques are adopted.

  5. Introduction • Cluster Analysis(CA, or clustering). • Vector Quantization (VQ). • Groups (or cells). • Each cell is represented by a vector (called codeword). • The set of the codewords is called the codebook. • The different of CA and VQ. • Grouping data into a certain number of groups so that a loss (or error) function is minimized.

  6. Clustering and VQ

  7. VQ-Definition • The objective of VQ is the representation of a set of feature vectors by a set, , of reference vector in .

  8. VQ-Quantization Error(QE) • Square error(SE) • Weighted square error(WSE)

  9. VQ-Nearest neighbor condition (NNC) • Nearest neighbor condition (NNC): Given a fixed codebook Y, the NNC consists in assigning to each input vector the nearest codeword.

  10. VQ-Centroid condition (CC) • Centroid condition (CC): Given a fixed partition S, the CC concerns the procedure for finding the optimal codebook.

  11. Previous Works: ELBG • The starting point of the research reported in this paper was our previous work: the ELBG [39]. • Initialization. • Partition calculation. According to the NNC (6). • Termination condition check. • ELBG-block execution. • New codebook calculation. According to the CC (9). • Return to Step 2.

  12. A. ELBG-Block • The basic idea of the ELBG-block. • Joining a low-distortion cell with a cell adjacent to it. • A high-distortion cell is split into two smaller ones. • If we define the mean distortion per cell as

  13. A. ELBG-Block

  14. A. ELBG-Block

  15. A. ELBG-Block • 1) SoCAs (shift of codeword attempt): • is looked for in a stochastic way.

  16. A. ELBG-Block • Splitting: • We place both and on the principal diagonal of ; in this sense, we can say that the two codewords are near each other. • Executing some local rearrangements. • Union:

  17. A. ELBG-Block • 2) Mean Quantization Error Estimation and Eventual SoC: • After the shift, we have a new codebook (Y’) and a new partition (S’). Therefore, we can calculate the new MQE. • If it is lower than the value we had before the SoCA, this is confirmed. Otherwise, it is rejected.

  18. B. Conderations Regarding the ELBG • Insertions are effected in the regions where the error is higher ; Deletions where the error is lower. • operations are executed locally. • Several insertions or deletions can be effected during the same iteration always working locally.

  19. FACS • Introduction. • The CA/VQ technique whose objective is to automatically find the codebook of the right dimension. • FACS - increase or decrease happens smartly. • To insert new codewords where the QE is higher. • To eliminate them where the error is lower.

  20. FACS iteration

  21. Smart growing phase.

  22. p versus the number of iteration

  23. Smart reduction phase.

  24. FACS • The cell to eliminate is chosen with a probability that is a decreasing function of its distortion.

  25. Behavior of FACS Versus the Number of Iterations and Termination Condition

  26. Discussion about outliers

  27. Result • Introduction. • Comparison With ELBG. • Comparison With GNG and GNG-U. • Comparison With FOSART. • Comparison With the Competitive Agglomeration Algorithm. • Classification.

  28. B. Comparison with ELBG

  29. C. Comparison With GNG and GNG-U. • GNG, GNG-U. Insert codewords until • The prefixed number. • The “performance measure” is fulfilled. • Our case,

  30. D. Comparison With FOSART. • The family of the ART algorithms called FOSART. • They use it also for tasks of VQ.

  31. E. Comparison With the Competitive Agglomeration.

  32. F. Classification • Comparison between FACS and the GCS algorithm for a problem, the two spirals, of supervised classification. • Mode 1: • The input is constituted by 194 2-D vectors representing the two spirals. • The output is the related membership class (0 or 1). • We employed the WSE. • Mode 2: • The clustering phase occurs using only the part of the patterns related to the input, and using SE.

  33. F. Classification(cont.)

  34. Conclusion • FACS, a new algorithm for CA/VQ that is able to autonomously find the number of codewords once the desired quantization error is specified. • In comparison to previous similar works a significative improvement in the running time has been obtained. • Further studies will be made regarding the use of different distortion measures.

  35. Personal Opinion • The starting point of the research reported in this paper was author’s previous work:the ELBG. • The QE is a key index.

  36. Review • Clustering V.S VQ. • Previous works: ELBG. • FACS • Smart Growing • Smart Reduction

More Related