Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Panel Data Analysis: A Survey onModel-Based Clustering of TimeSeries An Academic presentationby Dr. Nancy Agens, Head, Technical Operations, Statswork Group www.statswork.com Email:info@statswork.com

Outline ofTopics TODAY'SDISCUSSION Dirichlet Prior MCMCSimulation In Brief Longitudinal Data Model BasedClustering Example on Model BasedClustering Conclusion

InBrief Clustering technique in Statistical Analysisis used to determine the subsets as clusters in the data using specified distancemeasure. We will discuss about some of the methods used for modeling longitudinal or panel data using Clustering Analysistechnique

Longitudinal data is actually a sample of observations which are measured repeatedly overtime. And, nowadays, longitudinal/repeated measure data or panel data exists in all areas of Applied statisticssuch as finance, psychology, economics and socialsciences. Most studies deals with analyzing homogeneity in such Time seriesdata. The most common method of capturing the heterogeneity is to assume the presence of latent classes and each class are stratified using thecovariates. LongitudinalData

ModelBased Clustering Measuring the distance between time series data is not appropriate thus a cluster based modeling strategy for finite mixture models is adopted using Bayesianrule. Model based clustering considers each time series to a single unit contained in an unknown latentclass. One can see an excellent review of finite mixture models for longitudinal data in Vermunt (2010) especially in the areas of psychology, bio-statistics and other appliedareas.

The data consists of 237 teenagers who use marijuana for the year 1976-1980. The use marijuana is categorized into three types as never, not more than once a month and more than once amonth. The following figure represents the sample of 10 observed response of use of marijuana usage among the 237teenagers. The model considered for analyzing the marijuana usage is based on Generalized transitionmodel. Example on Model Based Clustering

Figure:Model Basedclustering

A Dirichlet prior is chosen in this case since the observed response variable is of categorical innature. Five different kernel classes are considered and evaluated the model using Dirichletprior distribution and the results for the same is presented in the followingtable. The clustering kernel M2 to M5 shows that there exists a common behaviour in marijuana usage. If the value is smaller than one, then one may conclude that the method is overfitting, in this case, H3 class of kernel seems to beoverfitting. Dirichlet Prior

Table: Dirichlet Prior Distribution

An MCMC simulation is carried out for M3 with H2 and the following figure explains thesample of boxplots of the posterior probabilities for male and femalegroups. Comparing the likelihood results obtained from the above table (598.5) and the previous table (596.5) the stratified Model based clustering reduces to Standard Modelbased clustering andit is clear that the use of marijuana is not associated with the genderclassification. From this results, it is concluded that the use of marijuana among teenagers may beclustered into two with never-use and other being more usergroups. MCMCSimulation

Figure:Boxplots for MCMC Simulation

Table: Gender Specific PosteriorInference

To sum up, model-based clustering technique along with the Bayesian flavor yields better results since it provides an answer to the most troublesome problems in the clusteranalysis. In longitudinal or Panel datastudies, usage of eculidean distance may be a valid one and hence a kernel based clustering for Time series data Analysisis considered and selectionof the best method is analysed using different informationcriteria. An MCMC simulation is carried out to find the optimal clusteringmethodology. Conclusion

UNITEDKINGDOM +44-1143520021 INDIA +91-4448137070 EMAIL info@statswork.com CONTACTUS

Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Presentation Transcript

Data Mining: Concepts and Techniques Mining time-series data

Cluster Analysis

Econometric Analysis of Panel Data

Cluster Analysis

Econometric Analysis of Panel Data

Econometrics I

Econometrics Courses Online

Clustering Analysis

Econometric Analysis of Panel Data

Cluster Analysis

Time Series Analysis -- An Introduction --

Clustering methods used in microarray data analysis

Financial Time Series

Model-Based Query Processing Over Uncertain Data (in ICDE 2011)

Notes on Cluster Analysis

Important clustering methods used in microarray data analysis

Introduction to Time Series

Panel Data Analysis Introduction

MODELS FOR PANEL DATA

Time Series and Trend Analysis

Cluster Analysis