1 / 42

Dimensionality Reduction - PowerPoint PPT Presentation

Dimensionality Reduction. Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs Surveys – Netflix: 480k users x 177k movies. Dimensionality Reduction. Compress / reduce dimensionality:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Dimensionality Reduction' - willow-burton

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Dimensionality Reduction

• High-dimensional == many features

• Find concepts/topics/genres:

• Documents:

• Features: Thousands of words, millions of word pairs

• Surveys – Netflix:480k users x 177k movies

Slides by Jure Leskovec

• Compress / reduce dimensionality:

Slides by Jure Leskovec

• Assumption: Data lies on or near a low d-dimensional subspace

• Axes of this subspace are effective representation of the data

Slides by Jure Leskovec

Why reduce dimensions?

• Discover hidden correlations/topics

• Words that occur commonly together

• Remove redundant and noisy features

• Not all words are useful

• Interpretation and visualization

• Easier storage and processing of the data

Slides by Jure Leskovec

A[m x n]= U[m x r][ r x r] (V[n x r])T

• A: Input data matrix

• mx nmatrix (e.g., mdocuments, nterms)

• U: Left singular vectors

• mx r matrix (mdocuments, r concepts)

• : Singular values

• rx r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A)

• V: Right singular vectors

• nx r matrix (nterms, r concepts)

Slides by Jure Leskovec

m

SVD

T

n

VT

A

m

U

Slides by Jure Leskovec

T

n

1u1v1

2u2v2

A

+

m

σi… scalar

ui … vector

vi … vector

Slides by Jure Leskovec

It is alwayspossible to decompose a real matrix A into A = U VT , where

• U, , V: unique

• U, V: column orthonormal:

• UT U = I; VT V = I (I: identity matrix)

• (Cols. are orthogonal unit vectors)

• : diagonal

• Entries (singular values) are positive, and sorted in decreasing order (σ1σ2 ...  0)

Slides by Jure Leskovec

• A = U  VT - example:

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

• A = U  VT - example:

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

Romance-concept

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

U is “user-to-concept”

similarity matrix

• A = U  VT - example:

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

Romance-concept

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

• A = U VT - example:

Matrix

Alien

Serenity

Casablanca

Amelie

‘strength’ of SciFi-concept

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

• A = U VT - example:

V is “movie-to-concept”

similarity matrix

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

• A = U VT - example:

V is “movie-to-concept”

similarity matrix

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

‘movies’, ‘users’ and ‘concepts’:

• U: user-to-concept similarity matrix

• V: movie-to-concept sim. matrix

• : its diagonal elements: ‘strength’ of each concept

Slides by Jure Leskovec

‘best’ = min sum of squares of projection errors

minimum reconstruction error

SVD - interpretation #2

Movie 2 rating

first right singularvector

v1

Movie 1 rating

Slides by Jure Leskovec

x

v1

SVD - Interpretation #2

• A = U VT - example:

first right singularvector

Movie 2 rating

v1

Movie 1 rating

=

Slides by Jure Leskovec

• A = U VT - example:

variance (‘spread’) on the v1 axis

x

x

=

Slides by Jure Leskovec

x

SVD - Interpretation #2

More details

• Q:How exactly is dim. reduction done?

=

Slides by Jure Leskovec

More details

• Q:How exactly is dim. reduction done?

• A:Set the smallest singular values to zero

x

x

=

A=

Slides by Jure Leskovec

More details

• Q: How exactly is dim. reduction done?

• A:Set the smallest singular values to zero

x

x

~

A=

Slides by Jure Leskovec

More details

• Q: How exactly is dim. reduction done?

• A:Set the smallest singular values to zero:

x

x

~

A=

Slides by Jure Leskovec

More details

• Q: How exactly is dim. reduction done?

• A:Set the smallest singular values to zero:

x

x

~

A=

Slides by Jure Leskovec

More details

• Q: How exactly is dim. reduction done?

• A:Set the smallest singular values to zero

B=

Frobenius norm:

ǁMǁF= Σij Mij2

~

A=

ǁA-BǁF= Σij(Aij-Bij)2

is “small”

Slides by Jure Leskovec

A

Sigma

VT

=

B is approx A

B

U

Sigma

=

VT

Slides by Jure Leskovec

• Theorem: Let A = U  VT(σ1σ2…, rank(A)=r)

thenB = U S VT

• S = diagonal nxn matrix where si=σi (i=1…k) else si=0

is a best rank-k approximation to A:

• Bis solution to minBǁA-BǁFwhere rank(B)=k

• We will need 2 facts:

• where M= PQR is SVD of M

• U  VT - U S VT = U ( - S) VT

Slides by Jure Leskovec

• We will need 2 facts:

• where M= PQR is SVD of M

• U VT -U S VT= U ( - S) VT

We apply:

-- P column orthonormal

-- R row orthonormal

-- Q is diagonal

Slides by Jure Leskovec

• A = U  VT , B = U S VT(σ1σ2…  0, rank(A)=r)

• S = diagonal nxn matrix where si=σi (i=1…k) else si=0

then Bis solution to minBǁA-BǁF, rank(B)=k

• Why?

• We want to choose si to minimize we set si=σi (i=1…k) else si=0

• U  VT - U S VT = U ( - S) VT

Slides by Jure Leskovec

Equivalent:

‘spectral decomposition’ of the matrix:

σ1

x

x

=

u1

u2

σ2

v1

v2

Slides by Jure Leskovec

u2

u1

vT2

vT1

σ2

σ1

SVD - Interpretation #2

Equivalent:

‘spectral decomposition’ of the matrix

m

k terms

=

+

+...

1 x m

n x 1

n

Assume: σ1σ2σ3  ...  0

Why is setting small σsthe thing to do?

Vectors ui and vi are unit length, so σiscales them.

So, zeroing small σs introduces less error.

Slides by Jure Leskovec

u1

u2

vT1

vT2

σ1

σ2

SVD - Interpretation #2

Q: How many σsto keep?

A: Rule-of-a thumb: keep 80-90% of ‘energy’(=σi2)

m

=

+

+...

n

assume: σ1σ2σ3  ...

Slides by Jure Leskovec

• To compute SVD:

• O(nm2) or O(n2m) (whichever is less)

• But:

• Less work, if we just want singular values

• or if we want first k singular vectors

• or if the matrix is sparse

• Implemented in linear algebra packages like

• LINPACK, Matlab, SPlus, Mathematica ...

Slides by Jure Leskovec

SVD - Conclusions so far

• SVD:A= U  VT: unique

• U: user-to-concept similarities

• V: movie-to-concept similarities

•  : strength of each concept

• Dimensionality reduction:

• keep the few largest singular values (80-90% of ‘energy’)

• SVD: picks up linear correlations

Slides by Jure Leskovec

Case study: How to query?

Q: Find users that like ‘Matrix’ and ‘Alien’

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

Case study: How to query?

Q: Find users that like ‘Matrix’

A: Map query into a ‘concept space’ – how?

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi

x

x

=

Romnce

Slides by Jure Leskovec

Case study: How to query?

Q: Find users that like ‘Matrix’

A: map query vectors into ‘concept space’ – how?

Matrix

Alien

Serenity

Casablanca

Amelie

q

Alien

v2

q=

v1

Project into concept space:Inner product with each ‘concept’ vector vi

Matrix

Slides by Jure Leskovec

Q: Find users that like ‘Matrix’

A: map the vector into ‘concept space’ – how?

Matrix

Alien

Serenity

Casablanca

Amelie

q

Alien

v2

q=

v1

q*v1

Project into concept space:Inner product with each ‘concept’ vector vi

Matrix

Slides by Jure Leskovec

Compactly, we have:

qconcept = q V

E.g.:

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

=

q=

movie-to-concept

similarities

Slides by Jure Leskovec

How would the user d that rated (‘Alien’, ‘Serenity’) be handled?

dconcept = d V

E.g.:

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

=

d=

movie-to-concept

similarities

Slides by Jure Leskovec

Case study: How to query?

Observation:User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’!

Matrix

Alien

Serenity

Casablanca

Amelie

SciFi-concept

q=

Similarity = 0

Similarity ≠ 0

Slides by Jure Leskovec

SVD: Drawbacks

• Optimal low-rank approximation:

• in Frobenius norm

• Interpretability problem:

• A singular vector specifies a linear combination of all input columns or rows

• Lack of sparsity:

• Singular vectors are dense!

VT

U

Slides by Jure Leskovec