Loading in 5 sec....

Multivariate statistical methodsPowerPoint Presentation

Multivariate statistical methods

- 168 Views
- Uploaded on
- Presentation posted in: General

Multivariate statistical methods

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Multivariate statistical methods

- multivariate dataset – group of n objects, m variables (as a rule n>m, if possible).
- confirmation vs. eploration analysis
- confirmation – impact on parameter estimate and hypothesis testing
- exploration – impact on data exploration, finding out of patterns and structure

Unit classification

- Cluster analysis
- Discrimination analysis
Analysis of relations among variables

- Cannonical correlation analysis
- Factor analysis
- Principal component analysis

Methods for analysis of relations among variables

- the oldest and the most used multivariate statistical methods
- standed by Pearson in 1901 and independently from Pearson also by Hotelling in 1933
- principal aims:
- detection of relations among variables
- reduction of variables number and finding of new purposeful variables

- as fundament is linear transformation of original variables into less number of new fictituous variables, so called principal components
- component characteristics:
- are not mutually correlated
- for m original variables is r<=m good dimension, r (best a lot less than m) principal components explain sufficiency variability of original variables

- component characteristics:
- method is based on full explanation of total variability
- principal components are ordered according share of explained variance
- the most of variance is explained by first component, the least by last component

- starting analysis – exploration of relations among variables (graphs, descriptive statistics)
- exploration of correlation matrix (existence of correlation among original variables – reduction of variables is possible)
- principal component analysis, choice of suitable number of components (usually is enough 70 – 90 % of explained variance)
- interpretation of principal components

- PCA is based on
- covariance matrix (the same units of variables, similar variance)
- correlation matrix (standardized data or different units of variables)

→ standardized original variable

… weights of principal component

… prin. components in standardized expression

j,k = 1,2, …., p

i = 1,2, …., n- number of units

j = 1,2, …., p- number of variables

- original matrix – dataset X (n x m), n objects, m variables
- Z = [zij]standardized matrix X
i = 1,…., nj = 1,…., m

- aim is find out transformation matrix Q, which convert m standardized variables (matrix Z) into m mutual independent component (matrix P)
P = Z . Q

- Modification of P = Z . Q→ we get matrix

- matrix Λ is matrix of covariance and variance of principal components. With regard to independence of principal components are covariances 0 and matrix Λ is diagonal with variances of principal component on diagonal
- sum of variances standardized variables equals to m.
proportions indicate, how large is the

share of the first, second, … last component on explanation of the total variance of all variables

- matrix R is correlation matrix of original variables
where

Diagonal values of matrix Λ are eigenvalues of matrix R, in columns of matrix Q are eigenvectors related to each eigenvalue

- coordinates of nonstandardized principal component are called „score“
- matrix of all score for all objects (n) is called „score matrix“
- scores for objects are in rows
- matrix columns are vectors of score

- share of total variability of each original variable Xi, i = 1, 2,…, m, which is explained by r principals components is called communality of variable Xi.
- is computed as second power of multiple coefficient of correlation → r2

- Cattel´s graph → scree plot
- tool for determination of number of principal components

- graph of coefficients of correlation (1st and 2nd principal component)

- Graph of component score