Dimensional reduction pca
1 / 26

- PowerPoint PPT Presentation

  • Updated On :

Dimensional reduction, PCA. Curse of dimensionality. The higher the dimension, the more data is needed to draw any conclusion Probability density estimation: Continuous: histograms Discrete: k-factorial designs Decision rules: Nearest-neighbor and K-nearest neighbor.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - makya

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Curse of dimensionality l.jpg
Curse of dimensionality

  • The higher the dimension, the more data is needed to draw any conclusion

  • Probability density estimation:

    • Continuous: histograms

    • Discrete: k-factorial designs

  • Decision rules:

    • Nearest-neighbor and K-nearest neighbor

How to reduce dimension l.jpg
How to reduce dimension?

  • Assume we know something about the distribution

    • Parametric approach: assume data follow distributions within a family H

  • Example: counting histograms for 10-D data needs lots of bins, but knowing it’s normal allows to summarize the data in terms of sufficient statistics

    • (Number of bins)10 v.s. (10 + 10*11/2)

Linear dimension reduction l.jpg
Linear dimension reduction

  • Normality assumption is crucial for linear methods

  • Examples:

    • Principle Components Analysis (also Latent Semantic Indexing)

    • Factor Analysis

    • Linear discriminant analysis

Covariance structure of multivariate gaussian l.jpg
Covariance structure of multivariate Gaussian

  • 2-dimensional example

  • No correlations --> diagonal covariance matrix, e.g.

    • Special case:  = I

    • - log likelihood  Euclidean distance to the center

Variance in each dimension

Correlation between dimensions

Covariance structure of multivariate gaussian6 l.jpg
Covariance structure of multivariate Gaussian

  • Non-zero correlations --> full covariance matrix, COV(X1,X2)  0

    • E.g.  =

  • Nice property of Gaussians: closed under linear transformation

  • This means we can remove correlation by rotation

Covariance structure of multivariate gaussian7 l.jpg
Covariance structure of multivariate Gaussian

  • Rotation matrix: R = (w1, w2), where w1, w2 are two unit vectors perpendicular to each other

    • Rotation by 90 degree

    • Rotation by 45 degree



w1 w2



Covariance structure of multivariate gaussian8 l.jpg
Covariance structure of multivariate Gaussian

  • Matrix diagonalization: any 2X2 covariance matrix A can be written as:

  • Interpretation: we can always find a rotation to make the covariance look “nice” -- no correlation between dimensions

  • This IS PCA when applied to N dimensions


Computation of pca l.jpg


3-D: 3 coordinates



Computation of PCA

  • The new coordinates uniquely identify the rotation

  • In computation, it’s easier to identify one coordinate at a time.

  • Step 1: centering the data

    • X <-- X - mean(X)

    • Want to rotate around the center

Computation of pca10 l.jpg


Computation of PCA

  • Step 2: finding a direction of projection that has the maximal variance

  • Linear projection of X onto vector w:

    • Projw(X) = XNXd * wdX1 (X centered)

  • Now measure the stretch

    • This is sample variance = Var(X*w)




Computation of pca11 l.jpg
Computation of PCA

  • Step 3: formulate this as a constrained optimization problem

    • Objective of optimization: Var(X*w)

    • Need constraint on w: (otherwise can explode), only consider the direction, not the scaling

  • So formally:argmax||w||=1 Var(X*w)

Computation of pca12 l.jpg
Computation of PCA

  • Recall single variable case:Var(a*X) = a2 Var(X)

  • Apply to multivariate case using matrix notation:Var(X*w) = wT XT X w = wTCov(X) w

  • Cov(X) is a dXd matrixSymmetric (easy)

    • For any y, yTCov(X) y > 0

Computation of pca13 l.jpg
Computation of PCA

  • Going back to the optimization problem:= max||w||=1 Var(X*w)= max||w||=1 wTCOV(X) w

  • The answer is the largest eigenvalue for COV(X)


The first

Principle Component!

(see demo)

More principle components l.jpg
More principle components

  • We keep looking among all the projections perpendicular to w1

  • Formally:max||w2||=1,w2w1 wTCov(X) w

  • This turns out to be another eigenvector corresponding to the 2nd largest eigenvalue(see demo)


New coordinates!

Rotation l.jpg

  • Can keep going until we find all projections/coordinates w1,w2,…,wd

  • Putting them together, we have a big matrix W=(w1,w2,…,wd)

  • W is called an orthogonal matrix

    • This corresponds to a rotation (sometimes plus reflection) of the pancake

    • This pancake has no correlation between dimensions (see demo)

When does dimension reduction occur l.jpg
When does dimension reduction occur?

  • Decomposition of covariance matrix

  • If only the first few ones are significant, we can ignore the rest, e.g.

2-D coordinates of X

Measuring degree of reduction l.jpg



Measuring “degree” of reduction

Pancake data in 3D

An application of pca l.jpg
An application of PCA

  • Latent Semantic Indexing in document retrieval

    • Documents as vectors of word counts

    • Try to extract some “features” by linear combination of word counts

    • The underlying geometry unclear (mean? Distance?)

    • The meaning of principle components unclear (rotation?)




Summary of pca l.jpg
Summary of PCA:

  • PCA looks for:

    • A sequence of linear, orthogonal projections that reveal interesting structure in data (rotation)

  • Defining “interesting”:

    • Maximal variance under each projection

    • Uncorrelated structure after projection

Departure from pca l.jpg
Departure from PCA

  • 3 directions of divergence

    • Other definitions of “interesting”?

      • Linear Discriminant Analysis

      • Independent Component Analysis

    • Other methods of projection?

      • Linear but not orthogonal: sparse coding

      • Implicit, non-linear mapping

    • Turning PCA into a generative model

      • Factor Analysis

Re thinking interestingness l.jpg
Re-thinking “interestingness”

  • It all depends on what you want

  • Linear Disciminant Analysis (LDA): supervised learning

  • Example: separating 2 classes

Maximal separation

Maximal variance

Re thinking interestingness22 l.jpg
Re-thinking “interestingness”

  • Most high-dimensional data look like Gaussian under linear projections

  • Maybe non-Gaussian is more interesting

    • Independent Component Analysis

    • Projection pursuits

  • Example: ICA projection of 2-class data

Most unlike Gaussian (e.g. maximize kurtosis)

The efficient coding perspective l.jpg






The “efficient coding” perspective

  • Sparse coding:

    • Projections do not have to be orthogonal

    • There can be more basis vectors than the dimension of the space

      • Representation using over-complete basis

Basis expansion

p << d; compact coding (PCA)

p > d; sparse coding

Interesting can be expensive l.jpg
“Interesting” can be expensive

  • Often faces difficult optimization problems

    • Need many constraints

    • Lots of parameter sharing

    • Expensive to compute, no longer an eigenvalue problem

Pca s relatives factor analysis l.jpg
PCA’s relatives: Factor Analysis

  • PCA is not a generative model: reconstruction error is not likelihood

    • Sensitive to outliers

    • Hard to build into bigger models

  • Factor Analysis: adding a measurement noise to account for variability


Measurement noise

N(0,R), R diagonal

Loading matrix (scaled PC’s)

Factors: spherical Gaussian N(0,I)

Pca s relatives factor analysis26 l.jpg
PCA’s relatives: Factor Analysis

  • Generative view: sphere --> stretch and rotate --> add noise

  • Learning: a version of EM algorithm