Multimedia indexing and dimensionality reduction
Download
1 / 125

Multimedia Indexing and Dimensionality Reduction - PowerPoint PPT Presentation


  • 205 Views
  • Updated On :

Multimedia Indexing and Dimensionality Reduction. Multimedia Data Management. The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video tracks) has increased in the recent years.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multimedia Indexing and Dimensionality Reduction' - eli


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Multimedia data management l.jpg
Multimedia Data Management

  • The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video tracks) has increased in the recent years.

  • Joint Research from Database Management, Computer Vision, Signal Processing and Pattern Recognition aims to solve problems related to multimedia data management.


Multimedia data l.jpg
Multimedia Data

  • There are four major types of multimedia data: images, video sequences, sound tracks, and text.

  • From the above, the easiest type to manage is text, since we can order, index, and search text using string management techniques, etc.

  • Management of simple sounds is also possible by representing audio as signal sequences over different channels.

  • Image retrieval has received a lot of attention in the last decade (CV and DBs). The main techniques can be extended and applied also for video retrieval.


Content based image retrieval l.jpg
Content-based Image Retrieval

  • Images were traditionally managed by first annotating their contents and then using text-retrieval techniques to index them.

  • However, with the increase of information in digital image format some drawbacks of this technique were revealed:

    • Manual annotation requires vast amount of labor

    • Different people may perceive differently the contents of an image; thus no objective keywords for search are defined

  • A new research field was born in the 90’s: Content-based Image Retrieval aims at indexing and retrieving images based on their visual contents.


Feature extraction l.jpg
Feature Extraction

  • The basis of Content-based Image Retrieval is to extract and index some visual features of the images.

  • There are general features (e.g., color, texture, shape, etc.) and domain-specific features (e.g., objects contained in the image).

    • Domain-specific feature extraction can vary with the application domain and is based on pattern recognition

    • On the other hand, general features can be used independently from the image domain.


Color features l.jpg
Color Features

  • To represent the color of an image compactly, a color histogram is used. Colors are partitioned to k groups according to their similarity and the percentage of each group in the image is measured.

  • Images are transformed to k-dimensional points and a distance metric (e.g., Euclidean distance) is used to measure the similarity between them.

k-dimensional space

k-bins


Using transformations to reduce dimensionality l.jpg
Using Transformations to Reduce Dimensionality

  • In many cases the embedded dimensionality of a search problem is much lower than the actual dimensionality

  • Some methods apply transformations on the data and approximate them with low-dimensional vectors

  • The aim is to reduce dimensionality and at the same time maintain the data characteristics

  • If d(a,b) is the distance between two objects a, b in real (high-dimensional) and d’(a’,b’) is their distance in the transformed low-dimensional space, we want d’(a’,b’)d(a,b).

d’(a’,b’)

d(a,b)


Problem motivation l.jpg
Problem - Motivation

  • Given a database of documents, find documents containing “data”, “retrieval”

  • Applications:

    • Web

    • law + patent offices

    • digital libraries

    • information filtering


Problem motivation9 l.jpg
Problem - Motivation

  • Types of queries:

    • boolean (‘data’ AND ‘retrieval’ AND NOT ...)

    • additional features (‘data’ ADJACENT ‘retrieval’)

    • keyword queries (‘data’, ‘retrieval’)

  • How to search a large collection of documents?



Text inverted files11 l.jpg
Text – Inverted Files

Q: space overhead?

A: mainly, the postings lists


Slide12 l.jpg

how to organize dictionary?

stemming – Y/N?

Keep only the root of each word ex. inverted, inversion  invert

insertions?

Text – Inverted Files


Slide13 l.jpg

how to organize dictionary?

B-tree, hashing, TRIEs, PATRICIA trees, ...

stemming – Y/N?

insertions?

Text – Inverted Files


Slide14 l.jpg

Text – Inverted Files

  • postings list – more Zipf distr.: eg., rank-frequency plot of ‘Bible’

log(freq)

freq ~ 1/rank / ln(1.78V)

log(rank)


Slide15 l.jpg

Text – Inverted Files

  • postings lists

    • Cutting+Pedersen

      • (keep first 4 in B-tree leaves)

    • how to allocate space: [Faloutsos+92]

      • geometric progression

    • compression (Elias codes) [Zobel+] – down to 2% overhead!

    • Conclusions: needs space overhead (2%-300%), but it is the fastest


Text detailed outline l.jpg
Text - Detailed outline

  • Text databases

    • problem

    • full text scanning

    • inversion

    • signature files (a.k.a. Bloom Filters)

    • Vector model and clustering

    • information filtering and LSI


Vector space model and clustering l.jpg
Vector Space Model and Clustering

  • Keyword (free-text) queries (vs Boolean)

  • each document: -> vector (HOW?)

  • each query: -> vector

  • search for ‘similar’ vectors


Vector space model and clustering18 l.jpg
Vector Space Model and Clustering

  • main idea: each document is a vector of size d: d is the number of different terms in the database

document

zoo

aaron

data

‘indexing’

...data...

d (= vocabulary size)


Document vectors l.jpg
Document Vectors

  • Documents are represented as “bags of words”

    OR as vectors.

    • A vector is like an array of floating points

    • Has direction and magnitude

    • Each vector holds a place for every term in the collection

    • Therefore, most vectors are sparse


Document vectors one location for each word l.jpg
Document VectorsOne location for each word.

A

B

C

D

E

F

G

H

I

nova galaxy heat h’wood film role diet fur

10 5 3

5 10

10 8 7

9 10 5

10 10

9 10

5 7 9

6 10 2 8

7 5 1 3

“Nova” occurs 10 times in text A

“Galaxy” occurs 5 times in text A

“Heat” occurs 3 times in text A

(Blank means 0 occurrences.)


Document vectors one location for each word21 l.jpg
Document VectorsOne location for each word.

A

B

C

D

E

F

G

H

I

nova galaxy heat h’wood film role diet fur

10 5 3

5 10

10 8 7

9 10 5

10 10

9 10

5 7 9

6 10 2 8

7 5 1 3

“Hollywood” occurs 7 times in text I

“Film” occurs 5 times in text I

“Diet” occurs 1 time in text I

“Fur” occurs 3 times in text I


Document vectors22 l.jpg
Document Vectors

nova galaxy heat h’wood film role diet fur

10 5 3

5 10

10 8 7

9 10 5

10 10

9 10

5 7 9

6 10 2 8

7 5 1 3

Document ids

A

B

C

D

E

F

G

H

I


We can plot the vectors l.jpg
We Can Plot the Vectors

Star

Doc about movie stars

Doc about astronomy

Doc about mammal behavior

Diet


Assigning weights to terms l.jpg
Assigning Weights to Terms

  • Binary Weights

  • Raw term frequency

  • tf x idf

    • Recall the Zipf distribution

    • Want to weight terms highly if they are

      • frequent in relevant documents … BUT

      • infrequent in the collection as a whole


Binary weights l.jpg
Binary Weights

  • Only the presence (1) or absence (0) of a term is included in the vector


Raw term weights l.jpg
Raw Term Weights

  • The frequency of occurrence for the term in each document is included in the vector


Assigning weights l.jpg
Assigning Weights

  • tf x idf measure:

    • term frequency (tf)

    • inverse document frequency (idf) -- a way to deal with the problems of the Zipf distribution

  • Goal: assign a tf * idf weight to each term in each document



Inverse document frequency l.jpg
Inverse Document Frequency

  • IDF provides high values for rare words and low values for common words

For a collection

of 10000 documents


Similarity measures for document vectors l.jpg
Similarity Measures for document vectors

Simple matching (coordination level match)

Dice’s Coefficient

Jaccard’s Coefficient

Cosine Coefficient

Overlap Coefficient


Tf x idf normalization l.jpg
tf x idf normalization

  • Normalize the term weights (so longer documents are not unfairly given more weight)

    • normalize usually means force all values to fall within a certain range, usually between 0 and 1, inclusive.


Vector space similarity use the weights to compare the documents l.jpg
Vector space similarity(use the weights to compare the documents)


Computing similarity scores l.jpg
Computing Similarity Scores

1.0

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

1.0


Vector space with term weights and cosine matching l.jpg
Vector Space with Term Weights and Cosine Matching

Di=(di1,wdi1;di2, wdi2;…;dit, wdit)

Q =(qi1,wqi1;qi2, wqi2;…;qit, wqit)

Term B

1.0

Q = (0.4,0.8)

D1=(0.8,0.3)

D2=(0.2,0.7)

Q

D2

0.8

0.6

0.4

D1

0.2

0

0.2

0.4

0.6

0.8

1.0

Term A


Text detailed outline35 l.jpg
Text - Detailed outline

  • Text databases

    • problem

    • full text scanning

    • inversion

    • signature files (a.k.a. Bloom Filters)

    • Vector model and clustering

    • information filtering and LSI


Information filtering lsi l.jpg
Information Filtering + LSI

  • [Foltz+,’92] Goal:

    • users specify interests (= keywords)

    • system alerts them, on suitable news-documents

  • Major contribution: LSI = Latent Semantic Indexing

    • latent (‘hidden’) concepts


Information filtering lsi37 l.jpg
Information Filtering + LSI

Main idea

  • map each document into some ‘concepts’

  • map each term into some ‘concepts’

    ‘Concept’:~ a set of terms, with weights, e.g.

    • “data” (0.8), “system” (0.5), “retrieval” (0.6) -> DBMS_concept


Information filtering lsi38 l.jpg
Information Filtering + LSI

Pictorially: term-document matrix (BEFORE)


Information filtering lsi39 l.jpg
Information Filtering + LSI

Pictorially: concept-document matrix and...


Information filtering lsi40 l.jpg
Information Filtering + LSI

... and concept-term matrix


Information filtering lsi41 l.jpg
Information Filtering + LSI

Q: How to search, eg., for ‘system’?


Information filtering lsi42 l.jpg
Information Filtering + LSI

A: find the corresponding concept(s); and the corresponding documents


Information filtering lsi43 l.jpg
Information Filtering + LSI

A: find the corresponding concept(s); and the corresponding documents


Information filtering lsi44 l.jpg
Information Filtering + LSI

Thus it works like an (automatically constructed) thesaurus:

we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’)


Svd detailed outline l.jpg
SVD - Detailed outline

  • Motivation

  • Definition - properties

  • Interpretation

  • Complexity

  • Case studies

  • Additional properties


Svd motivation l.jpg
SVD - Motivation

  • problem #1: text - LSI: find ‘concepts’

  • problem #2: compression / dim. reduction


Svd motivation47 l.jpg
SVD - Motivation

  • problem #1: text - LSI: find ‘concepts’


Svd motivation48 l.jpg
SVD - Motivation

  • problem #2: compress / reduce dimensionality


Problem specs l.jpg
Problem - specs

  • ~10**6 rows; ~10**3 columns; no updates;

  • random access to any cell(s) ; small error: OK




Svd definition l.jpg
SVD - Definition

A[n x m] = U[n x r]L [ r x r] (V[m x r])T

  • A: n x m matrix (eg., n documents, m terms)

  • U: n x r matrix (n documents, r concepts)

  • L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)

  • V: m x r matrix (m terms, r concepts)


Svd properties l.jpg
SVD - Properties

THEOREM [Press+92]:always possible to decomposematrix A into A = ULVT , where

  • U,L,V: unique (*)

  • U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other)

    • UTU = I; VTV = I (I: identity matrix)

  • L: eigenvalues are positive, and sorted in decreasing order


Svd example l.jpg
SVD - Example

  • A = ULVT - example:

retrieval

inf.

lung

brain

data

CS

x

x

=

MD


Svd example55 l.jpg
SVD - Example

  • A = ULVT - example:

retrieval

CS-concept

inf.

lung

MD-concept

brain

data

CS

x

x

=

MD


Svd example56 l.jpg
SVD - Example

doc-to-concept

similarity matrix

  • A = ULVT - example:

retrieval

CS-concept

inf.

lung

MD-concept

brain

data

CS

x

x

=

MD


Svd example57 l.jpg
SVD - Example

  • A = ULVT - example:

retrieval

‘strength’ of CS-concept

inf.

lung

brain

data

CS

x

x

=

MD


Svd example58 l.jpg
SVD - Example

  • A = ULVT - example:

term-to-concept

similarity matrix

retrieval

inf.

lung

brain

data

CS-concept

CS

x

x

=

MD


Svd example59 l.jpg
SVD - Example

  • A = ULVT - example:

term-to-concept

similarity matrix

retrieval

inf.

lung

brain

data

CS-concept

CS

x

x

=

MD


Svd detailed outline60 l.jpg
SVD - Detailed outline

  • Motivation

  • Definition - properties

  • Interpretation

  • Complexity

  • Case studies

  • Additional properties


Svd interpretation 1 l.jpg
SVD - Interpretation #1

‘documents’, ‘terms’ and ‘concepts’:

  • U: document-to-concept similarity matrix

  • V: term-to-concept sim. matrix

  • L: its diagonal elements: ‘strength’ of each concept


Svd interpretation 2 l.jpg
SVD - Interpretation #2

  • best axis to project on: (‘best’ = min sum of squares of projection errors)



Svd interpretation 264 l.jpg

minimum RMS error

SVD - interpretation #2

SVD: gives

best axis to project

v1



Svd interpretation 266 l.jpg

x

x

=

v1

SVD - Interpretation #2

  • A = ULVT - example:


Svd interpretation 267 l.jpg
SVD - Interpretation #2

  • A = ULVT - example:

variance (‘spread’) on the v1 axis

x

x

=


Svd interpretation 268 l.jpg
SVD - Interpretation #2

  • A = ULVT - example:

    • UL gives the coordinates of the points in the projection axis

x

x

=


Svd interpretation 269 l.jpg

x

x

=

SVD - Interpretation #2

  • More details

  • Q: how exactly is dim. reduction done?


Svd interpretation 270 l.jpg
SVD - Interpretation #2

  • More details

  • Q: how exactly is dim. reduction done?

  • A: set the smallest eigenvalues to zero:

x

x

=






Svd interpretation 275 l.jpg
SVD - Interpretation #2

Equivalent:

‘spectral decomposition’ of the matrix:

x

x

=


Svd interpretation 276 l.jpg
SVD - Interpretation #2

Equivalent:

‘spectral decomposition’ of the matrix:

l1

x

x

=

u1

u2

l2

v1

v2


Svd interpretation 277 l.jpg

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2

Equivalent:

‘spectral decomposition’ of the matrix:

m

=

+

+...

n


Svd interpretation 278 l.jpg

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2

‘spectral decomposition’ of the matrix:

m

r terms

=

+

+...

n

n x 1

1 x m


Svd interpretation 279 l.jpg

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2

approximation / dim. reduction:

by keeping the first few terms (Q: how many?)

m

To do the mapping you use VT

X’ = VT X

=

+

+...

n

assume: l1 >= l2 >= ...


Svd interpretation 280 l.jpg

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2

A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s)

m

=

+

+...

n

assume: l1 >= l2 >= ...


Svd interpretation 3 l.jpg
SVD - Interpretation #3

  • finds non-zero ‘blobs’ in a data matrix

x

x

=


Svd interpretation 382 l.jpg
SVD - Interpretation #3

  • finds non-zero ‘blobs’ in a data matrix

x

x

=


Svd interpretation 383 l.jpg
SVD - Interpretation #3

  • Drill: find the SVD, ‘by inspection’!

  • Q: rank = ??

??

x

x

=

??

??


Svd interpretation 384 l.jpg
SVD - Interpretation #3

  • A: rank = 2 (2 linearly independent rows/cols)

??

x

x

=

??

??

??


Svd interpretation 385 l.jpg
SVD - Interpretation #3

  • A: rank = 2 (2 linearly independent rows/cols)

x

x

=

orthogonal??


Svd interpretation 386 l.jpg
SVD - Interpretation #3

  • column vectors: are orthogonal - but not unit vectors:

0

0

x

x

0

=

0

0

0

0

0

0

0


Svd interpretation 387 l.jpg
SVD - Interpretation #3

  • and the eigenvalues are:

0

0

x

x

0

=

0

0

0

0

0

0

0


Svd interpretation 388 l.jpg
SVD - Interpretation #3

  • A: SVD properties:

    • matrix product should give back matrix A

    • matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other

    • ditto for matrix V

    • matrixLshould be diagonal, with positive values


Svd complexity l.jpg
SVD - Complexity

  • O( n * m * m) or O( n * n * m) (whichever is less)

  • less work, if we just want eigenvalues

  • or if we want first k eigenvectors

  • or if the matrix is sparse [Berry]

  • Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica ...)


Optimality of svd l.jpg
Optimality of SVD

Def: TheFrobenius norm of a n x m matrix M is

(reminder) The rank of a matrix M is the number of independent rows (or columns) of M

Let A=ULVT and Ak = UkLk VkT (SVD approximation of A)

Ak is annxm matrix, Uk an nxk, Lk kxk, and Vk mxk

Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:


Kleinberg s algorithm l.jpg
Kleinberg’s Algorithm

  • Main idea: In many cases, when you search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times)

    • Harvard : www.harvard.edu

    • Search Engines: yahoo, google, altavista

  • Authorities and hubs


Kleinberg s algorithm92 l.jpg
Kleinberg’s algorithm

  • Problem dfn: given the web and a query

  • find the most ‘authoritative’ web pages for this query

    Step 0: find all pages containing the query terms (root set)

    Step 1: expand by one move forward and backward (base set)


Kleinberg s algorithm93 l.jpg
Kleinberg’s algorithm

  • Step 1: expand by one move forward and backward


Kleinberg s algorithm94 l.jpg
Kleinberg’s algorithm

  • on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to

  • give high importance score (‘hubs’) to nodes that point to good ‘authorities’)

hubs

authorities


Kleinberg s algorithm95 l.jpg
Kleinberg’s algorithm

observations

  • recursive definition!

  • each node (say, ‘i’-th node) has both an authoritativeness score ai and a hubness score hi


Kleinberg s algorithm96 l.jpg
Kleinberg’s algorithm

Let E be the set of edges and A be the adjacency matrix:

the (i,j) is 1 if the edge from i to j exists

Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores.

Then:


Kleinberg s algorithm97 l.jpg
Kleinberg’s algorithm

Then:

ai = hk + hl + hm

that is

ai = Sum (hj) over all j that (j,i) edge exists

or

a = ATh

k

i

l

m


Kleinberg s algorithm98 l.jpg
Kleinberg’s algorithm

symmetrically, for the ‘hubness’:

hi = an + ap + aq

that is

hi = Sum (qj) over all j that (i,j) edge exists

or

h = Aa

i

n

p

q


Kleinberg s algorithm99 l.jpg
Kleinberg’s algorithm

In conclusion, we want vectors h and a such that:

h = Aa

a = ATh

Recall properties:

C(2): A[n x m]v1 [m x 1] = l1 u1[n x 1]

C(3): u1T A = l1 v1T


Kleinberg s algorithm100 l.jpg
Kleinberg’s algorithm

In short, the solutions to

h = Aa

a = ATh

are the left- and right- eigenvectors of the adjacency matrix A.

Starting from random a’ and iterating, we’ll eventually converge

(Q: to which of all the eigenvectors? why?)


Kleinberg s algorithm101 l.jpg
Kleinberg’s algorithm

(Q: to which of all the eigenvectors? why?)

A: to the ones of the strongest eigenvalue, because of property B(5):

B(5): (ATA )k v’ ~ (constant) v1


Kleinberg s algorithm results l.jpg
Kleinberg’s algorithm - results

Eg., for the query ‘java’:

0.328 www.gamelan.com

0.251 java.sun.com

0.190 www.digitalfocus.com (“the java developer”)


Kleinberg s algorithm discussion l.jpg
Kleinberg’s algorithm - discussion

  • ‘authority’ score can be used to find ‘similar pages’ to page p

  • closely related to ‘citation analysis’, social networs / ‘small world’ phenomena


Google page rank algorithm l.jpg
google/page-rank algorithm

  • closely related: The Web is a directed graph of connected nodes

  • imagine a particle randomly moving along the edges (*)

  • compute its steady-state probabilities. That gives the PageRank of each pages (the importance of this page)

    (*) with occasional random jumps


Pagerank definition l.jpg
PageRank Definition

  • Assume a page A and pages T1, T2, …, Tm that point to A. Let d is a damping factor. PR(A) the pagerank of A. C(A) the out-degree of A. Then:


Google page rank algorithm106 l.jpg
google/page-rank algorithm

  • Compute the PR of each page~identical problem: given a Markov Chain, compute the steady state probabilities p1 ... p5

2

1

3

4

5


Computing pagerank l.jpg
Computing PageRank

  • Iterative procedure

  • Also, … navigate the web by randomly follow links or with prob p jump to a random page. Let A the adjacency matrix (n x n), di out-degree of page i

    Prob(Ai->Aj) = pn-1+(1-p)di–1Aij

    A’[i,j] = Prob(Ai->Aj)


Google page rank algorithm108 l.jpg
google/page-rank algorithm

  • Let A’ be the transition matrix (= adjacency matrix, row-normalized : sum of each row = 1)

2

1

3

=

4

5


Google page rank algorithm109 l.jpg
google/page-rank algorithm

  • A p = p

A p = p

2

1

3

=

4

5


Google page rank algorithm110 l.jpg
google/page-rank algorithm

  • A p = p

  • thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is row-normalized)


Kleinberg google conclusions l.jpg
Kleinberg/google - conclusions

SVD helps in graph analysis:

hub/authority scores: strongest left- and right- eigenvectors of the adjacency matrix

random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix


Conclusions so far l.jpg
Conclusions – so far

  • SVD: a valuable tool

  • given a document-term matrix, it finds ‘concepts’ (LSI)

  • ... and can reduce dimensionality (KL)


Conclusions cont d l.jpg
Conclusions cont’d

  • ... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains)

  • ... and can solve optimally over- and under-constraint linear systems (least squares)


References l.jpg

Brin, S. and L. Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.

References


Embeddings l.jpg
Embeddings Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • Given a metric distance matrix D, embed the objects in a k-dimensional vector space using a mapping F such that

    • D(i,j) is close to D’(F(i),F(j))

  • Isometric mapping:

    • exact preservation of distance

  • Contractive mapping:

    • D’(F(i),F(j)) <= D(i,j)

  • d’ is some Lp measure


Slide116 l.jpg
PCA Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • Intuition: find the axis that shows the greatest variation, and project all points into this axis

f2

e1

e2

f1


Svd the mathematical formulation l.jpg
SVD: The mathematical formulation Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • Normalize the dataset by moving the origin to the center of the dataset

  • Find the eigenvectors of the data (or covariance) matrix

  • These define the new space

  • Sort the eigenvalues in “goodness” order

f2

e1

e2

f1


Svd cont d l.jpg
SVD Cont’d Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • Advantages:

    • Optimal dimensionality reduction (for linear projections)

  • Disadvantages:

    • Computationally expensive… but can be improved with random sampling

    • Sensitive to outliers and non-linearities


Fastmap l.jpg
FastMap Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

What if we have a finite metric space (X, d )?

Faloutsos and Lin (1995) proposed FastMap as metric analogue to the KL-transform (PCA). Imagine that the points are in a Euclidean space.

  • Select two pivot pointsxa and xb that are far apart.

  • Compute a pseudo-projection of the remaining points along the “line” xaxb .

  • “Project” the points to an orthogonal subspace and recurse.


Selecting the pivot points l.jpg
Selecting the Pivot Points Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

The pivot points should lie along the principal axes, and hence should be far apart.

  • Select any point x0.

  • Let x1 be the furthest from x0.

  • Let x2 be the furthest from x1.

  • Return (x1, x2).

x2

x0

x1


Pseudo projections l.jpg
Pseudo-Projections Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

xb

Given pivots (xa , xb ), for any third point y, we use the law of cosines to determine the relation of y along xaxb .

The pseudo-projection for y is

This is first coordinate.

db,y

da,b

y

cy

da,y

xa


Project to orthogonal plane l.jpg
“Project to orthogonal plane” Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

xb

cz-cy

Given distances along xaxb we can compute distances within the “orthogonal hyperplane” using the Pythagorean theorem.

Using d ’(.,.), recurse until k features chosen.

dy,z

z

y

xa

y’

z’

d’y’,z’


Random projections l.jpg
Random Projections Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • Based on the Johnson-Lindenstrauss lemma:

  • For:

    • 0< e < 1/2,

    • any (sufficiently large) set S of M points in Rn

    • k = O(e-2lnM)

  • There exists a linear map f:SRk, such that

    • (1- e) D(S,T) < D(f(S),f(T)) < (1+ e)D(S,T) for S,T in S

  • Random projection is good with constant probability


Random projection application l.jpg
Random Projection: Application Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • Set k = O(e-2lnM)

  • Select k random n-dimensional vectors

    • (an approach is to select k gaussian distributed vectors with variance 0 and mean value 1: N(1,0) )

  • Project the original points into the k vectors.

  • The resulting k-dimensional space approximately preserves the distances with high probability


Random projection l.jpg
Random Projection Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

  • A very useful technique,

  • Especially when used in conjunction with another technique (for example SVD)

  • Use Random projection to reduce the dimensionality from thousands to hundred, then apply SVD to reduce dimensionality farther


ad