1 / 12

Dynamic graphics, Principal Component Analysis

Dynamic graphics, Principal Component Analysis. Ker-Chau Li UCLA department of Statistics. Xlisp-stat (demo). (plot-points x y) (scatterplot-matrix (list x y z u w)) (spin-plot (list x y z)) Link, remove, select, rescale Examples : (1) simulated data (2) Iris data

senwe
Download Presentation

Dynamic graphics, Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics

  2. Xlisp-stat (demo) • (plot-points x y) • (scatterplot-matrix (list x y z u w)) • (spin-plot (list x y z)) • Link, remove, select, rescale • Examples : • (1) simulated data • (2) Iris data • (3) Boston Housing data

  3. PCA(principal component analysis) • A fundamental tool for reducing dimensionality by finding projections with largest variance • (1)Data version • (2) Population version • Each has a number of variations • (3) Let’s begin with an illustration using • (pca-model (list x y z))

  4. Find a 2-D plane in 4-D space • Generate 100 cases of u from uniform(0,1) • Generate 100 cases of v from uniform(0,1) • Define x = u + v, y= u-v, • Apply PCA-model to (x, y,u,v); demo. • It still works with small errors (e ~N(0,1)) present: • x = u + v + .01 e_1 ; y=u - v +.01e_2 • Define x = u + v^2 , y= u - v^2, z = v^2 • Apply PCA-model to (x, y, z, u); works fine • But not so well with Nonlinear manifold; try • ( pca-model (list x y u v))

  5. Other examples • 1-D from 2-D • rings • Ying and Yang

  6. Data version • 1. Construct the sample variance-covariance matrix • 2. Find the eigenvectors • 3. Projection : use each eigenvector to form a linear combination of original variables • 4. The larger, the better : the k-th principal component is the projection with the k-th largest eigenvalue

  7. Data Version(alternative view) • 1-D data matrix : rank 1 • 2-D data matrix :rank 2 • K-D data matrix : rank k • Eigenvectors for 1-D sample covariance matrix: rank 1 • Eigenvectors for 2-D sample covariance matrix: rank 2 • Eigenvectors for k-D sample matrix • Adding i.i.d. noise • Connection with automatic basis curve finding (to be discussed later)

  8. Population version • Let the sample size tend to the infinity • Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number) • The rest of steps remain the same • We shall use the population version for theoretical discussion

  9. Some Basic facts • Variance of linear combination of random variables • var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y) • Easier if using matrix representation : • (B.1) var ( m’ X)= m’ Cov(X) m • here m is a p-vector, X consists of p random variables (x_1, …,x_p)’ • From (B.1), it follows that

  10. Basic facts (Cont.) • Maximizing var(m’x) subject to ||m||=1 is the same as Max m’cov(X)m subject to ||m||=1 • (here ||m|| denotes the length of the vector m) • Eigenvalue decomposition : • (B.2) M vi = i vi, where • 1 ≥ 2 ≥ …. ≥ p • Basic linear algebra tells us that the first eigenvector will do : • Solution of max m’ M m subject to ||m||=1 must satisfy M m= 1 m

  11. Basic facts(cont.) • Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S • Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S • This explain why pca works for our first example • Why small errors can be tolerated ? • Large i.i.d. errors are fine too • Heterogeneity is harmful, correlated errors too

  12. Further discussion • No guarantee of finding nonlinear structure like clusters , curves, etc. • In fact, sampling properties for pca are mostly developed for normal data • Still useful • Scaling problem • Projection pursuit: guided; random

More Related