Clustering functional data methods and applications
1 / 47

Clustering Functional Data: Methods and Applications - PowerPoint PPT Presentation

  • Updated On :

Clustering Functional Data: Methods and Applications. Catherine Sugar University of Southern California [email protected] This is joint work with Gareth James of USC UCLA May 1st, 2006. Clustering and Functional Data.

Related searches for Clustering Functional Data: Methods and Applications

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Clustering Functional Data: Methods and Applications' - lee

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Clustering functional data methods and applications l.jpg

Clustering Functional Data:Methods and Applications

Catherine Sugar

University of Southern California

[email protected]

This is joint work with Gareth James of USC


May 1st, 2006

Clustering and functional data l.jpg
Clustering and Functional Data

  • Cluster Analysis: The art of finding groups in data. Points in the same cluster should be as similar as possible and points in disjoint clusters should be widely separated

  • Functional Data: Observations for a subject consist of curves or trajectories rather than finite dimensional vectors.

    • Growth curves

    • Longitudinal measurements of clinical status

    • Technology evolution

    • Spectra

Outline l.jpg

  • Traditional approaches to clustering curves and problems with sparse data

  • A new approach using basis functions and a mixture model

  • Applications of our approach in medicine and business

  • Tools, extensions, and model selection issues

Functional examples l.jpg
Functional Examples

Spinal Bone Mineral Density Data

Technology Evolution Curves

Functional examples5 l.jpg
Functional Examples:

Membranous Nephropathy Data

Traditional approaches to functional clustering l.jpg
Traditional Approaches To Functional Clustering

  • Regularization:

    • Form a grid of equally spaced time points.

    • Evaluate each curve at the time points, giving a finite representation of each curve.

    • Apply a standard finite dimensional method possibly with a regularization constraint

  • Filtering:

    • Fit a smooth curve to each subject using a finite set of basis functions,

    • Perform clustering on the basis coefficients ()

Problems with the traditional approaches l.jpg
Problems With the Traditional Approaches

  • Regularization:

    • Cannot be easily applied when curves are measured at different or unevenly spaced time points or when the data are too sparse

    • Even when it can be used, the resulting data vectors are

      high-dimensional and auto-correlated

  • Filtering:

    • Measurements may be too sparse to fit a curve for each subject

    • Requires fitting many parameters

    • If subjects are measured at different time points, the basis coefficients will not have a common covariance

Our model l.jpg
Our Model

  • Let gi(t),Yi(t) and i(t) respectively be the true value, observed value and error for ith curve at time t. i.e.

  • We represent g(t) using a natural cubic spline basis:

    where s(t) is a spline basis vector and i is the vector of spline coefficients.

  • The coefficients are treated as random effects with

    where zi denotes cluster membership

Our model9 l.jpg
Our Model

  • Our model becomes

  • We fit this model using the observed time points and an EM algorithm.

Fitting the model bone density data in two clusters l.jpg
Fitting The Model:Bone Density Data In Two Clusters

Fitting the model technology data in two clusters l.jpg
Fitting The Model: Technology Data In Two Clusters

Model applications i low dimensional representations l.jpg
Model Applications I:Low Dimensional Representations

  • One can plot functional data but it is hard to assess relative “distances” between curves

  • We use the basis coefficients to project data into a low-dimensional space where it can be plotted as points

  • Projecting causes no information loss in terms of cluster assignment

  • The projections are exact analogues of the discriminants used in LDA

Model applications i low dimensional representations13 l.jpg
Model Applications I:Low Dimensional Representations

Model applications i low dimensional representations bone data l.jpg
Model Applications ILow Dimensional Representations: Bone Data

Model applications i low dimensional representations technology data l.jpg
Model Applications I:Low Dimensional Representations: Technology Data

Model applications i low dimensional representations nephrology data l.jpg
Model Applications I:Low Dimensional Representations: Nephrology Data

Model applications ii dimensions of separation l.jpg
Model Applications II:Dimensions of Separation

  • It is useful to know what dimensions do the best job of separating the clusters.

  • This depends on a combination of distance between clusters and within cluster covariance, and is equivalent to identifying which dimensions determine cluster assignment

  • The optimal weights for cluster assignment are given by an extension of the classical discriminant function for a Gaussian mixture:

Model applications ii dimensions of separation18 l.jpg
Model Applications II:Dimensions of Separation

Correlation and Covariance:

Discriminating Functions:

Model applications iii prediction and confidence intervals l.jpg
Model Applications III:Prediction and Confidence Intervals

  • Another advantage of our method is that it provides accurate predictions for missing portions of g(t)

  • Natural estimate:

  • The prediction with minimum mean squared error is

  • CI’s and PI’s: Two step procedure—find the set of clusters most likely to contain g(t) and then create intervals conditional on cluster membership

Model applications iii prediction on bone data l.jpg
Model Applications IIIPrediction on Bone Data

Model applications iii prediction on technology data l.jpg
Model Applications IIIPrediction on Technology Data

Optical bit density

Magnetic storage bit density

Hdd 3 5 in storage capacity l.jpg
HDD 3.5 in. storage capacity

  • Black = Functional Clustering

  • Red = Linear Gompertz

  • Green = Mansfield-Blackman

  • Cyan = Weibull

  • Orange = Bass

  • Blue = S-curve

A comparison with standard approaches l.jpg
A Comparison With Standard Approaches

  • We took the first 10 years as training data and tried to predict the following 5 years using various different approaches.

  • Here we report the MSE on the left out data as a percentage of that from using a traditional S-curve (logistic curve).

Advantages of our model l.jpg
Advantages of Our Model

  • Borrows strength from all curve fragments to simultaneously estimate mixture parameters and requires fitting fewer parameters.

  • Allows one to make more accurate predictions into the future based on only a few observations.

  • Flexible. Can be used effectively when data are sparse, irregularly spaced or sampled at different time points

  • Automatically puts the correct weights on estimated basis coefficients

  • Can be easily extended to include multiple functional and finite dimensional covariates.

Extensions i multiple functional covariates l.jpg
Extensions I:Multiple Functional Covariates

  • Just as finite dimensional clustering algorithms can incorporate multiple covariates one should be able to use multiple functional variables

  • We can do this creating a block diagonal spline basis matrix using the entries for the p individual curves:

  • More care must be taken with the error structure but the same basic model and fitting procedure apply.

Extensions ii finite dimensional covariates l.jpg
Extensions II:Finite Dimensional Covariates

  • It is just as easy to add finite dimensional covariates to the model

  • Let Xi be the vector of finite dimensional covariates.

  • We replace the spline basis matrix, Si, by the identity, Iix

  • The model can be fit just as before

  • Note that this provides a way of doing high dimensional standard clustering problems with missing data—just delete the corresponding rows of the identity matrix

Extensions iii dimension reduction l.jpg
Extensions III:Dimension Reduction

  • Reducing dimensions ahead of time (e.g. by PCA) may be risky.

  • Example below shows a case where the dimensions that explain most of the variability are not the ones determining cluster separation. Our method (right) does a superior job

References l.jpg

  • Bacrach, L. et al.(1999) Bone mineral Acquisition in healthy Asian, Hispanic, Black, and Caucasian youth; a longitudinal study. Journal of Clinical Endocrinology & Metabolism 84, 4702-4712

  • Banfield, J. and Raftery, A. (1993). Model-based Gaussian and non-gaussian clustering. Biometrics 49, 803-821

  • James, G., and Hastie, T. (2001). Functional Linear Discriminant analysis for irregularly sampled curves. JRSSB 63, 533-550

  • James, G., Hastie, T., and Sugar, C. (2000). Principal component models for sparsely sampled functional data. Biometrika87, 587-602

  • James, G. and Sugar, C. (2003) Clustering for sparsely sampled functional data. JASA, 98, 397-408

  • Sugar, C., and James, G. (2003) Finding the number of clusters in a data set: An information theoretic approach. JASA, 98, 750-763

Model selection issues l.jpg
Model Selection Issues:

  • Choosing the spline basis and number and placement of knots

  • Choosing the dimension of the mean space

  • Choosing the covariance structure for the clusters

  • Choosing the number of clusters

How many clusters l.jpg
How Many Clusters:

  • Raftery et al. suggest using approximate Bayes factors in the finite dimensional setting

  • We propose an approach based theory from Electrical Engineering involving distortion

    • Distortion is

    • Plot distortion as a function of k, the number of clusters

    • Rate distortion theory suggests the form of the resulting “distortion curve”

How many clusters basic results l.jpg
How Many Clusters:Basic Results

  • If the data are generated from a single cluster in q dimensions then asymptotically the distortion curve can be made linear, specifically

  • When there are an unknown number, K, of clusters, the inverse distortion plot will be straight both before and after K, and will experience its maximum jump at K subject to certain conditions.

How many clusters examples l.jpg
How Many Clusters:Examples

The figures below show a transformed distortion curve when there is (a) a single component and (b) six components in the generating mixture distribution

A general functional model l.jpg
A General Functional Model

  • Let g(t) be the curve for a randomly chosen individual. (We will assume g(t) follows a Gaussian process.)

  • If g(t) is in the kth cluster we write

  • If Y is the vector of observed values at times t1,…,tn and errors are assumed independent then

A general functional model34 l.jpg
A General Functional Model

  • Regularization and filtering can be viewed as approaches to fitting the general functional clustering model.

  • The regularization approach estimates k(t) and k(t,t') on a fine grid of time points with structural constraints on k(t,t')

  • The filtering approach assumes g(t) = (t) where (t) is a vector of basis functions and  is the vector of coefficients. The 's are estimated separately for each individual and then clustered

Our model35 l.jpg
Our Model

  • It is further useful to parameterize the cluster mean coefficients as

    where 0 and k are respectively q and h dimensional vectors and  is a qh matrix.

  • If h < K-1 then the we are assuming the cluster mean coefficients lie in a restricted subspace.

  • Our model becomes

Fitting our model via em l.jpg
Fitting Our Model Via EM

  • Fitting our model involves estimating

  • We can do this by maximizing either the classification likelihood or the clustering likelihood, noting that conditional on class membership

  • We can use an iterative procedure such as EM to obtain the estimates

Model selection iii alternate covariance structures l.jpg
Model Selection III:Alternate Covariance Structures

  • So far we have assumed a common covariance matrix, , for all clusters

  • Raftery et al. suggest a class of covariance structures in their model-based clustering methods for finite-dimensional data:

  • James and Hastie suggest regularization or rank reduction in their papers on functional PCA and LDA