290 likes | 435 Views
EE3J2 Data Mining Lecture 11 Vector Data Analysis and Principle Components Analysis (PCA) Martin Russell. Objectives. To review basic data analysis To review the notions of mean, variance and covariance To explain Principle Components Analysis (PCA). Example from speech processing.
E N D
EE3J2 Data MiningLecture 11Vector Data Analysis and Principle Components Analysis (PCA)Martin Russell EE3J2 Data Mining
Objectives • To review basic data analysis • To review the notions of mean, variance and covariance • To explain Principle Components Analysis (PCA) EE3J2 Data Mining
Example from speech processing Plot of high-frequency energy vs low-frequency energy, for 25 ms speech segments, sampled every 10ms EE3J2 Data Mining
Sample variance ‘x’ ‘y’ max ‘y’ min Sample variance ‘y’ Sample mean ‘x’ max ‘x’ min Basic statistics EE3J2 Data Mining
Basic statistics • Denote samples by X = x1, x2, … ,xT, where xt = (xt1, xt2, … , xtN) • The sample mean (X) is given by: EE3J2 Data Mining
More basic statistics • The sample variance (X) is given by: EE3J2 Data Mining
Covariance • As the x value increases, the y value also increases • This is (positive) co-variance • If y decreases as x increases, the result is negative covariance EE3J2 Data Mining
Definition of covariance • The covariance between the mth and nthcomponents of the sample data is defined by: • In practice it is useful to subtract the mean (X) from each of the data points xt. The sample mean is then 0 and EE3J2 Data Mining
The covariance matrix EE3J2 Data Mining
Implies positive covariance Data with mean subtracted EE3J2 Data Mining
Implies negative covariance Sample data rotated through 2 EE3J2 Data Mining
Data with covariance removed EE3J2 Data Mining
Principle Components Analysis • PCA is the technique which I used to diagonalise the sample covariance matrix • The first step is to write the covariance matrix in the form: where D is diagonal and U is a matrix corresponding to a rotation • Can do this using SVD (see lecture 8) or eigenvalue decomposition EE3J2 Data Mining
PCA continued U implements rotation through angle e1is the first column of U d11is the variance in the direction e1 e2 is the second column of U d22is the variance in the direction e2 e1 e2 EE3J2 Data Mining
Example • Illustration of PCA through an example application • 3D dance motion modelling EE3J2 Data Mining
Data • Analysis of dance sequence data • Body position represented as 90 dimensional vector • Dance sequence represented as a sequence of these vectors • MEng FYP 2004/5, Wan Ni Chong EE3J2 Data Mining
Data Capture (1) EE3J2 Data Mining
Data Capture (2) EE3J2 Data Mining
Data Capture (3) EE3J2 Data Mining
Calculating PCA • Step 1: Arrange data as a matrix • Rows correspond to individual data points • Number of columns = dimension of data (= 90) • Number of rows = number of data points = N EE3J2 Data Mining
Calculating PCA (step 2) • Compute the covariance matrix of the data • In MATLAB >>C = cov(X) • Alternatively (as in slides from last lecture): • calculate the mean vector m, • subtract m from each row of X to give Y • Then EE3J2 Data Mining
Calculating PCA (step 3) • Do an eigenvector decomposition of C, so that: C = UDUT • Where • U is a unitary (rotation) matrix • D is a diagonal matrix (in fact all elements of D will be real and non-negative) • In MATLAB type >>[U,D] = eig(C) EE3J2 Data Mining
Calculating PCA (step 4) • Each column of U is a principle vector • The corresponding eigenvalue indictates the variance of the data along that dimension • Large eigenvalues indicate significant components of the data • Small eigenvalues indicate that the variation along the corresponding eigenvectors may be noise EE3J2 Data Mining
Data PCs Eigenvalues 1st 1st More Significant Components 2nd 3rd Insignificant Components 90th 90th Eigenvalues EE3J2 Data Mining
Calculating PCA (step 6) • It may be advantageous to ignore dimensions which correspond to small eigenvalues and only consider the projection of the data onto the most significant eigenvectors • In this way the dimension of the data can be reduced EE3J2 Data Mining
Eigenspace Eigenspace Visualising PCA Original pattern (blue) U Set coordinates 11 – 90 to zero Reduced pattern (red) U-1 EE3J2 Data Mining
PCA Example • Original 90 dimensional data reduced to just 1 dimension EE3J2 Data Mining
PCA Example • Original 90 dimensional data reduced to 10 dimensions EE3J2 Data Mining
Summary • Example of PCA • Analysis of 90 dimensional 3D dance data • Analysis shows that PCA can reduce 90 dimensional representation to just 10 dimensions with minimal loss of accuracy EE3J2 Data Mining