Multivariate statistical methods

Multivariate statistical methods

Multivariate methods • multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). • confirmation vs. eploration analysis • confirmation – impact on parameter estimate and hypothesis testing • exploration – impact on data exploration, finding out of patterns and structure

Multivariate statistical methods Unit classification • Cluster analysis • Discrimination analysis Analysis of relations among variables • Cannonical correlation analysis • Factor analysis • Principal component analysis

Methods for analysis of relations among variables

Principal component analysis • the oldest and the most used multivariate statistical methods • standed by Pearson in 1901 and independently from Pearson also by Hotelling in 1933 • principal aims: • detection of relations among variables • reduction of variables number and finding of new purposeful variables

Principal component analysis • as fundament is linear transformation of original variables into less number of new fictituous variables, so called principal components • component characteristics: • are not mutually correlated • for m original variables is r<=m good dimension, r (best a lot less than m) principal components explain sufficiency variability of original variables

PCA • component characteristics: • method is based on full explanation of total variability • principal components are ordered according share of explained variance • the most of variance is explained by first component, the least by last component

PCA procedure • starting analysis – exploration of relations among variables (graphs, descriptive statistics) • exploration of correlation matrix (existence of correlation among original variables – reduction of variables is possible) • principal component analysis, choice of suitable number of components (usually is enough 70 – 90 % of explained variance) • interpretation of principal components

PCA procedure • PCA is based on • covariance matrix (the same units of variables, similar variance) • correlation matrix (standardized data or different units of variables)

Model of PCA → standardized original variable … weights of principal component … prin. components in standardized expression j,k = 1,2, …., p i = 1,2, …., n - number of units j = 1,2, …., p - number of variables

PCA – mathematical model • original matrix – dataset X (n x m), n objects, m variables • Z = [zij] standardized matrix X i = 1,…., n j = 1,…., m • aim is find out transformation matrix Q, which convert m standardized variables (matrix Z) into m mutual independent component (matrix P) P = Z . Q

PCA – mathematical model • Modification of P = Z . Q→ we get matrix

PCA – mathematical model • matrix Λ is matrix of covariance and variance of principal components. With regard to independence of principal components are covariances 0 and matrix Λ is diagonal with variances of principal component on diagonal • sum of variances standardized variables equals to m. proportions indicate, how large is the share of the first, second, … last component on explanation of the total variance of all variables

PCA – mathematical model • matrix R is correlation matrix of original variables where Diagonal values of matrix Λ are eigenvalues of matrix R, in columns of matrix Q are eigenvectors related to each eigenvalue

PCA – other notions • coordinates of nonstandardized principal component are called „score“ • matrix of all score for all objects (n) is called „score matrix“ • scores for objects are in rows • matrix columns are vectors of score

PCA – other notions • share of total variability of each original variable Xi, i = 1, 2,…, m, which is explained by r principals components is called communality of variable Xi. • is computed as second power of multiple coefficient of correlation → r2

PCA – graphical visualisation • Cattel´s graph → scree plot • tool for determination of number of principal components

PCA – graphical visualization • graph of coefficients of correlation (1st and 2nd principal component)

PCA – graphical visualization • Graph of component score

Multivariate statistical methods

Multivariate statistical methods

Presentation Transcript

Multivariate Statistical Analysis

Multivariate Methods

Multivariate statistical analysis

Multivariate Statistical Analysis

Multivariate Methods

Multivariate Methods

Multivariate Methods

Multivariate Methods

Multivariate Methods

Multivariate statistical methods

Multivariate Methods

Statistical Methods for Particle Physics Lecture 2: multivariate methods

Statistical Methods in Particle Physics Day 3: Multivariate Methods (II)

Multivariate Methods

Multivariate Statistical Analysis

Statistical Methods for Particle Physics Lecture 2: multivariate methods

Statistical Methods in Particle Physics Day 3: Multivariate Methods (II)

Multivariate Methods

Statistical Methods for Particle Physics Lecture 2: Introduction to Multivariate Methods