1 / 46

Statistical Analysis: Multivariate Descriptive & Factor Analysis

Learn about statistical analysis techniques including Multivariate Descriptive Analysis, Factor Analysis, and Clustering. Discover how these methods can help identify important relationships between variables and detect underlying dimensions in a multidimensional space. Perfect for researchers, analysts, and decision-makers.

wethington
Download Presentation

Statistical Analysis: Multivariate Descriptive & Factor Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instructor: Prof. Louis Chauvel Statistical Analysis Multivariate descriptive analysis Factor analysis and clustering (PCA and HCA) +kmeans Principal components analysis Hierarchic cluster analysis

  2. This session: descriptive multidimensional analysis • Good to detect important relations between variables • Not relevant for causality, net effects, confidence intervals,… • “Heuristic” (from Greek εὑρίσκω "I find, discover") methods • Efficient tool for synthesis • To put in the annexes of your thesis, or in reports • Politicians, decision makers, CEO$, etc. like their results • Useful if you need money • Factor analysis to find the main dimensions in a multidimensional space • Cluster analysis to find subgroups intra-homogeneous and inter-heterogeneous (« classes ») • Part 1 = Principal Component Analysis PCA • (ex. Welfareregimes) • Part 2 = Hierarchical Cluster AnalysisHCA • (ex. Welfareregimes) • Part 3 = Joint PCA and HCA • (ex. U.S. General Social Survey GSS)

  3. Principal Components • Simplify N dimensional tables in 2 (3 or 4) axes • Reduce « noise » and keep « signal » • Identify underlying dimensions or principal components of a distribution • Helps understand the joint or common variation among a set of variables • Commonly used method of detection “latent dimensions” Rotation (=eigen-decomposition / spectral decomposition of the correlation matrix)

  4. Principal Components • The first principal component is identified as the vector (or equivalently the linear combination of variables) on which the most data variation can be projected • The 2nd principal component is a vector perpendicular to the first, chosen so that it contains as much of the remaining variation as possible • And so on for the 3rd principal component, the 4th, the 5th etc.

  5. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Principal Components Analysis (PCA) • Principle • Linear projection method to reduce the number of parameters • Transfer a set of correlated variables into a new set of uncorrelated variables • Map the data into a space of lower dimensionality • Form of unsupervised learning • Properties • It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables • New axes are orthogonal and represent the directions with maximum variability

  6. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Computing the Components • Data points are vectors in a multidimensional space • Projection of vector x onto an axis (dimension) u is u.x • Direction of greatest variability is that in which the average square of the projection is greatest • I.e. u such that E((u.x)2) over all x is maximized • (we subtract the mean along each dimension, and center the original axis system at the centroid of all data points, for simplicity) • This direction of u is the direction of the first Principal Component

  7. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Computing the Components • E((u.x)2) = E ((u.x) (u.x)T) = E (u.x.xT.uT) • The matrix C = x.xT contains the correlations (similarities) of the original axes based on how the data values project onto them • So we are looking for w that maximizes uCuT, subject to u being unit-length • It is maximized when w is the principal eigenvector of the matrix C, in which case • uCuT = uluT = l if u is unit-length, where l is the principal eigenvalue of the correlation matrix C • The eigenvalue denotes the amount of variability captured along that dimension

  8. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Why the Eigenvectors? Maximise uTxxTu s.tuTu = 1 Construct Langrangian uTxxTu–λuTu Vector of partial derivatives set to zero xxTu –λu =(xxT –λI) u = 0 As u ≠ 0 then u must be an eigenvector of xxT with eigenvalue λ

  9. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Singular Value Decomposition The first root is called the prinicipal eigenvalue which has an associated orthonormal (uTu = 1) eigenvectoru Subsequent roots are ordered such that λ1> λ2 >… > λM with rank(D) non-zero values. Eigenvectors form an orthonormal basis i.e. uiTuj = δij The eigenvalue decomposition of xxT = UΣUT whereU = [u1, u2, …, uM] and Σ= diag[λ1, λ2, …, λM] Similarly the eigenvalue decomposition ofxTx = VΣVT The SVD is closely related to the above x=U Σ1/2 VT The left eigenvectors U, right eigenvectors V, singular values = square root of eigenvalues.

  10. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Computing the Components Similarly for the next axis, etc. So, the new axes are the eigenvectors of the matrix of correlations of the original variables, which captures the similarities of the original variables based on how data samples project to them • Geometrically: centering followed by rotation • Linear transformation

  11. http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA: General From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk

  12. http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA: General From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk such that: yk's are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.

  13. 2nd Principal Component, y2 1st Principal Component, y1 http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt

  14. xi2 yi,1 yi,2 xi1 http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA Scores

  15. λ2 λ1 http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA Eigenvalues

  16. http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA: Another Explanation From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk yk's are Principal Components such that: yk's are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.

  17. http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt Principal Components Analysis on: • Covariance Matrix: • Variables must be in same units • Emphasizes variables with most variance • Mean eigenvalue ≠1.0 • Correlation Matrix: • Variables are standardized (mean 0.0, SD 1.0) • Variables can be in different units • All variables have same impact on analysis • Mean eigenvalue = 1.0

  18. http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA: General {a11,a12,...,a1k} is 1st Eigenvector of correlation/covariance matrix, and coefficients of first principal component {a21,a22,...,a2k} is 2nd Eigenvector of correlation/covariance matrix, and coefficients of 2nd principal component … {ak1,ak2,...,akk} is kth Eigenvector of correlation/covariance matrix, and coefficients of kth principal component

  19. http://www.cs.cmu.edu/~16385/s14/lec_slides/lec-18.ppt PCA Summary until now • Rotates multivariate dataset into a new configuration which is easier to interpret • Purposes • simplify data • look at relationships between variables • look at patterns of units

  20. PCA: Yet Another Explanation

  21. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt How Many PCs? For n original dimensions, correlation matrix is nxn, and has up to n eigenvectors. So n PCs. Where does dimensionality reduction come from?

  22. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Dimensionality Reduction Can ignore the components of lesser significance. You do lose some information, but if the eigenvalues are small, you don’t lose much • n dimensions in original data • calculate n eigenvectors and eigenvalues • choose only the first p eigenvectors, based on their eigenvalues • final data set has only p dimensions

  23. https://www.cs.princeton.edu/picasso/mats/Lecture1_jps.ppt Eigenvectors of a Correlation Matrix

  24. Graphic presentation • correlation circle: variables and their correlation with axe 1, axe 2, etc. • Principal plane: individuals or groups • Interpretations to be done in terms of directions from the center • The center (0,0) means average situation (indiscriminate)

  25. How to do? • Select “active variables”: numeric or pseudo-numeric (ordinal) variables • “active variables” should represent the different dimensions of your “field”, not too correlated, no bias in the representativity of your dimensions • Begin with a correlation matrix • IMPORTANT: Be sure the variables are oriented the way you think recode if “Height” is “1: high ; 2: medium ; 3: small” • (test differentvariantsof “active variables” ) • Thenprocess « internalanalysis » (of “active variables”) • And then «externalanalysis » (other variables and individuals)

  26. Cluster Analysis • Techniques for identifying separate groups of similar cases • Similarity of cases is either specified directly in a distance matrix, or defined in terms of some distance function • Also used to summarise data by defining segments of similar cases • 3 main types of cluster analysis methods • Descending hierarchical cluster analysis • Each cluster (starting with the whole dataset) is divided into two, then divided again, and so on • Ascending hierarchical cluster analysis • Individuals are iteratively aggregated optimally to one each other from N to a small number of clusters • Iterative methods • k-means clustering (statakmeans) • Analogous non-parametric density estimation method

  27. Clustering Techniques Wage The Ward Method Iterative HCA We have a M- dimensions multidimensional cloud of N dots i) Matrix of distance and 2 « closest » (weighted distance) points ; Mergethem as a single point withsum of weights ; Compute the change in inertia of the cloud of dots ; Back to i) => The N dots become a smallnumber of groups education

  28. Cluster Analysis Options • Many definitions of distance (Manhattan, Euclidian, square euclidian) • Several choices of how to form clusters in hierarchical cluster analysis • Single linkage • Average linkage • Density linkage • Ward’s method • Many others • Ward’s method (like k-means) tends to form equal sized, roundish clusters • Average linkage generally forms roundish clusters with equal variance • Density linkage can identify clusters of different shapes

  29. K-means Clustering Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  30. K-means Clustering – Details • Initial centroids are often chosen randomly. • Clusters produced vary from one run to another. • The centroid is (typically) the mean of the points in the cluster. • ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc. • K-means will converge for common similarity measures mentioned above. • Most of the convergence happens in the first few iterations. • Often the stopping condition is changed to ‘Until relatively few points change clusters’ • Complexity is O( n * K * I * d ) • n = number of points, K = number of clusters, I = number of iterations, d = number of attributes https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  31. Optimal Clustering Sub-optimal Clustering Two different K-means Clusterings Original Points https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  32. Importance of Choosing Initial Centroids https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  33. Importance of Choosing Initial Centroids https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  34. Evaluating K-means Clusters • Most common measure is Sum of Squared Error (SSE) • For each point, the error is the distance to the nearest cluster • To get SSE, we square these errors and sum them. • x is a data point in cluster Ci and mi is the representative point for cluster Ci • can show that micorresponds to the center (mean) of the cluster • Given two clusters, we can choose the one with the smallest error • One easy way to reduce SSE is to increase K, the number of clusters • A good clustering with smaller K can have a lower SSE than a poor clustering with higher K https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  35. Solutions to Initial Centroids Problem • Multiple runs • Helps, but probability is not on your side • Sample and use hierarchical clustering to determine initial centroids • Select more than k initial centroids and then select among these initial centroids • Select most widely separated • Postprocessing • Bisecting K-means • Not as susceptible to initialization issues https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  36. Limitations of K-means • K-means has problems when clusters are of differing • Sizes • Densities • Non-globular shapes • K-means has problems when the data contains outliers. https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  37. Limitations of K-means: Differing Sizes K-means (3 Clusters) Original Points https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  38. Limitations of K-means: Differing Density K-means (3 Clusters) Original Points https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  39. Limitations of K-means: Non-globular Shapes Original Points K-means (2 Clusters) https://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.ppt

  40. Beyond the conclusion. Assessment of the typology • How does the GostaEsping-Andersen’s Typology of welfare regimes and countries really fits with social facts? • Fenger (2007) assesses the GEA typology • http://www.louischauvel.org/fenger_2007.pdf • We can implement this on STATA : • http://www.louischauvel.org/pca_fenger.do

  41. Assessing the typology near 2005 Fempart : Female participation (% of women in total workforce) Fertili : Total fertility rate (births per woman) Gini : Inequality (GINI-coefficient; 2002 or latest available year) Govhealth Government health expenditures (% of total gov expenditures) Healthexpen General health expenditures % GDP Labormarkt Spending on labor market policies (% of GDP) Lifeexp : Life expectancy (years) Oldageexpen Spending on old age (% of GDP) Soccontrib Revenues from social contributions (% of GDP) Socprotect Spending on social protection (% of GDP) Spendedu : Spending on education (% of GDP) taxesgdp : Taxes on revenue in % of GDP Unemployment Unemployment rates % Lab force

  42. A. Factor Analysis PCA

  43. A. Factor Analysis PCA

  44. B. Assessment of the typology (…) cluster wardslinkage z* , measure(L2) cluster gen grp = group(5) tabstat z* , by(grp) pca z* predict f1 f2 f3 tabstat f* , by(ISO) tabstat f* , by(grp) cluster dendrogram _clus_2, labels(ISO) xlabel(, /// angle(90) labsize(*.75))

  45. B. Assessment of the typology

  46. B. Assessment of the typology

More Related