Very large data sets

Very large data sets Speech and Image Processing UnitSchool of Computing University of Eastern Finland Clustering methods: Part 10 Pasi Fränti 5.5.2014

Let’s study this (no material for the others)  Methods for large data sets • Birch • Clarans • On-line EM • Scalable EM • GMG

Gradual model generator (GMG)[Kärkkäinen & Fränti, 2007: Pattern Recognition]

Goal of the GMG algorithm GMG EM

Contours of probability density distributions GMG EM

Model update • New data points are mapped immediately when input. • Points too far (from any model) will remain in buffer. • Buffered points are re-tested when new models created. Before update After update

Generating new components • When buffer full, selected points are used to generate new components. • Most compact k-neighborhood is selected as seed for a new component. Data in buffer Selected points and a new component

Example

Post-processing Model before processing

Post-processing Model before processing Updated model

Post-processing Model before processing Updated model + data

Literature • I. Kärkkäinen and P. Fränti, "Gradual model generator for single-pass clustering", Pattern Recognition, 40 (3), 784-795, March 2007. • P. Bradley, U. Fayyad, C. Reina, Clustering Very Large Databases Using EM Mixture Models, Proc. of the 15th Int. Conf. on Pattern Recognition, vol. 2, 2000, pp. 76-80. • R. Ng, J. Han, CLARANS: A Method for Clustering Objects for Spatial Data Mining, IEEE Trans. Knowledge & Data Engineering 14(5) (2002) 1003-1016. • M. Sato, S. Ishii, On-line EM Algorithm for the Normalized Gaussian Network, Neural Computation 12(2) (2000) 407-432. • T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: A New Data Clustering Algorithm and Its Applications, Data Mining and Knowledge Discovery 1(2) (1997) 141-182.

Very large data sets

Very large data sets

Presentation Transcript

Compressing Very Large Data Sets in Oracle

Algorithms for Large Data Sets

Algorithms for Large Data Sets

Algorithms for Large Data Sets

Algorithms for Large Data Sets

Algorithms for Large Data Sets

Algorithms for Large Data Sets

u sing large data sets

Manipulating Large Data Sets

Working with Large Data Sets

Experiences with Large Data Sets

using large data sets

Compressing Very Large Data Sets in Oracle

Very Large Array data

Experiences with Large Data Sets

using large data sets

using large data sets

Interacting with Large Data Sets

Manipulating Large Data Sets