Extensions of vector quantization for incremental clustering

Download Presentation

Extensions of vector quantization for incremental clustering

Loading in 2 Seconds...

- 153 Views
- Uploaded on
- Presentation posted in: General

Extensions of vector quantization for incremental clustering

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Extensions of vector quantization for incremental clustering

Edwin Lughofer

PR, Vol.41 2008, pp. 995–1011

Presenter : Wei-Shen Tai

2011/1/19

- Introduction
- Vector quantization
- Extensions of vector quantization
- Evaluation
- Conclusion and outlook
- Comments

- Incremental clustering processes
- Quite often online measurements are recorded resulting in data streams for various applications.
- In an online manner, guarantee that queries are up-to-date and that results can be answered with a small time delay.

- An incremental and evolving vector quantization
- Processes data streams in a on-line clustering scheme.
- Omits pre-definition of the number of clusters and improve the quality of cluster partitions with several strategies.

- Choose initial values for the C cluster centers.
- Fetch out the next data sample of the data set.
- Calculate the distance of the selected data point to all cluster centers.
- Elicit the cluster center which is closest to the data point.
- Update the p components of the winning cluster by moving it towards the selected point.
- If the data set contains data points which were not processed through steps 2–5, goto step 2.
- If any cluster center was moved significantly in the last iteration, say more than , reset the pointer to the data buffer at the beginning and goto step 2, otherwise stop.

- Stability / plasticity dilemma in ART-2
- Using vigilance parameter ρtocontrol the tradeoff between adaptation of already learned clusters (stability) and generation of new clusters (plasticity).

- Differences between VQ and VQ-INC
- The starting number of clusters is zeros.
- If the distance between the incoming input x and the closest cluster center cwin is larger than ρand x is not faulty, a new cluster will be created. Otherwise, cwinis updated to move toward to x.
- Update the ranges of all p variables if x is not faulty. Besides, ηis changed with the amount of data points belonging to each cluster in a monotonic decreasing way.

- Both ‘over-clustering’ and incorrect partition of the input space occur in VQ-INC.
- Instead of classic Euclidean distance, the ranges of influence for all clusters or the surface along the direction towards the cluster center are applied in VQ-INC-EXT.

- Cluster satellites
- Undesirable tiny clusters, which lie very close to significantly bigger ones.

- Identify outliers and satellites
- If ki/N <1%, cluster i is regarded as an outlier cluster.
- If ki/N < low_mass and cilies inside the range of influence of any other cluster, elicit the closest centercwin.
- Calculate the distance of ci to the surface of all other clusters.

- Parameter ρ
- Cannot be known in advance and a bad setting may cause an incorrect cluster structure.

- Not-optimal clustering
- It is prevented by merging clusters grown together or by splitting big clusters including more than one distinct data cloud.
- Calculate the quality of cluster partition in three phases including before spilt, after spilt (p results)and after merged. Then pick the best cluster partition to replace existing one.

- A new extended vector quantization (VQ-INCEXT)
- Can be applied for data streams in fast online applications or for huge data bases.
- Provides an incremental learning scheme and incorporates new distance measurement, satellite deletion and online split-and-merge strategy.

- Outlooks
- Split-and-merge strategy may suffer from computation speed.
- Reacting to drifts or shifts in the data, drifts changes the distribution of the underlying data smoothly over time; shifts trigger abrupt and sudden changes of the data characteristics.

- Advantage
- This proposed method extends VQ to a incremental learning VQ and adds several strategies to improve the quality of cluster partition simultaneously.
- Data streams can be effectively processed by this on-line learning VQ.

- Drawback
- In algorithm 3, the vector of winning cluster is updated by Eq.(1) according to the Manhattan distance between the winning cluster and the input whenever the new distance strategy is applied.

- Application
- Data stream on-line learning issue.