- By
**eli** - Follow User

- 206 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Multimedia Indexing and Dimensionality Reduction' - eli

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Multimedia Data Management

- The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video tracks) has increased in the recent years.
- Joint Research from Database Management, Computer Vision, Signal Processing and Pattern Recognition aims to solve problems related to multimedia data management.

Multimedia Data

- There are four major types of multimedia data: images, video sequences, sound tracks, and text.
- From the above, the easiest type to manage is text, since we can order, index, and search text using string management techniques, etc.
- Management of simple sounds is also possible by representing audio as signal sequences over different channels.
- Image retrieval has received a lot of attention in the last decade (CV and DBs). The main techniques can be extended and applied also for video retrieval.

Content-based Image Retrieval

- Images were traditionally managed by first annotating their contents and then using text-retrieval techniques to index them.
- However, with the increase of information in digital image format some drawbacks of this technique were revealed:
- Manual annotation requires vast amount of labor
- Different people may perceive differently the contents of an image; thus no objective keywords for search are defined
- A new research field was born in the 90’s: Content-based Image Retrieval aims at indexing and retrieving images based on their visual contents.

Feature Extraction

- The basis of Content-based Image Retrieval is to extract and index some visual features of the images.
- There are general features (e.g., color, texture, shape, etc.) and domain-specific features (e.g., objects contained in the image).
- Domain-specific feature extraction can vary with the application domain and is based on pattern recognition
- On the other hand, general features can be used independently from the image domain.

Color Features

- To represent the color of an image compactly, a color histogram is used. Colors are partitioned to k groups according to their similarity and the percentage of each group in the image is measured.
- Images are transformed to k-dimensional points and a distance metric (e.g., Euclidean distance) is used to measure the similarity between them.

k-dimensional space

k-bins

Using Transformations to Reduce Dimensionality

- In many cases the embedded dimensionality of a search problem is much lower than the actual dimensionality
- Some methods apply transformations on the data and approximate them with low-dimensional vectors
- The aim is to reduce dimensionality and at the same time maintain the data characteristics
- If d(a,b) is the distance between two objects a, b in real (high-dimensional) and d’(a’,b’) is their distance in the transformed low-dimensional space, we want d’(a’,b’)d(a,b).

d’(a’,b’)

d(a,b)

Problem - Motivation

- Given a database of documents, find documents containing “data”, “retrieval”
- Applications:
- Web
- law + patent offices
- digital libraries
- information filtering

Problem - Motivation

- Types of queries:
- boolean (‘data’ AND ‘retrieval’ AND NOT ...)
- additional features (‘data’ ADJACENT ‘retrieval’)
- keyword queries (‘data’, ‘retrieval’)
- How to search a large collection of documents?

how to organize dictionary?

stemming – Y/N?

Keep only the root of each word ex. inverted, inversion invert

insertions?

Text – Inverted Files

how to organize dictionary?

B-tree, hashing, TRIEs, PATRICIA trees, ...

stemming – Y/N?

insertions?

Text – Inverted Files

Text – Inverted Files

- postings list – more Zipf distr.: eg., rank-frequency plot of ‘Bible’

log(freq)

freq ~ 1/rank / ln(1.78V)

log(rank)

Text – Inverted Files

- postings lists
- Cutting+Pedersen
- (keep first 4 in B-tree leaves)
- how to allocate space: [Faloutsos+92]
- geometric progression
- compression (Elias codes) [Zobel+] – down to 2% overhead!
- Conclusions: needs space overhead (2%-300%), but it is the fastest

Text - Detailed outline

- Text databases
- problem
- full text scanning
- inversion
- signature files (a.k.a. Bloom Filters)
- Vector model and clustering
- information filtering and LSI

Vector Space Model and Clustering

- Keyword (free-text) queries (vs Boolean)
- each document: -> vector (HOW?)
- each query: -> vector
- search for ‘similar’ vectors

Vector Space Model and Clustering

- main idea: each document is a vector of size d: d is the number of different terms in the database

document

zoo

aaron

data

‘indexing’

...data...

d (= vocabulary size)

Document Vectors

- Documents are represented as “bags of words”

OR as vectors.

- A vector is like an array of floating points
- Has direction and magnitude
- Each vector holds a place for every term in the collection
- Therefore, most vectors are sparse

Document VectorsOne location for each word.

A

B

C

D

E

F

G

H

I

nova galaxy heat h’wood film role diet fur

10 5 3

5 10

10 8 7

9 10 5

10 10

9 10

5 7 9

6 10 2 8

7 5 1 3

“Nova” occurs 10 times in text A

“Galaxy” occurs 5 times in text A

“Heat” occurs 3 times in text A

(Blank means 0 occurrences.)

Document VectorsOne location for each word.

A

B

C

D

E

F

G

H

I

nova galaxy heat h’wood film role diet fur

10 5 3

5 10

10 8 7

9 10 5

10 10

9 10

5 7 9

6 10 2 8

7 5 1 3

“Hollywood” occurs 7 times in text I

“Film” occurs 5 times in text I

“Diet” occurs 1 time in text I

“Fur” occurs 3 times in text I

Document Vectors

nova galaxy heat h’wood film role diet fur

10 5 3

5 10

10 8 7

9 10 5

10 10

9 10

5 7 9

6 10 2 8

7 5 1 3

Document ids

A

B

C

D

E

F

G

H

I

Assigning Weights to Terms

- Binary Weights
- Raw term frequency
- tf x idf
- Recall the Zipf distribution
- Want to weight terms highly if they are
- frequent in relevant documents … BUT
- infrequent in the collection as a whole

Binary Weights

- Only the presence (1) or absence (0) of a term is included in the vector

Raw Term Weights

- The frequency of occurrence for the term in each document is included in the vector

Assigning Weights

- tf x idf measure:
- term frequency (tf)
- inverse document frequency (idf) -- a way to deal with the problems of the Zipf distribution
- Goal: assign a tf * idf weight to each term in each document

Inverse Document Frequency

- IDF provides high values for rare words and low values for common words

For a collection

of 10000 documents

Similarity Measures for document vectors

Simple matching (coordination level match)

Dice’s Coefficient

Jaccard’s Coefficient

Cosine Coefficient

Overlap Coefficient

tf x idf normalization

- Normalize the term weights (so longer documents are not unfairly given more weight)
- normalize usually means force all values to fall within a certain range, usually between 0 and 1, inclusive.

Vector Space with Term Weights and Cosine Matching

Di=(di1,wdi1;di2, wdi2;…;dit, wdit)

Q =(qi1,wqi1;qi2, wqi2;…;qit, wqit)

Term B

1.0

Q = (0.4,0.8)

D1=(0.8,0.3)

D2=(0.2,0.7)

Q

D2

0.8

0.6

0.4

D1

0.2

0

0.2

0.4

0.6

0.8

1.0

Term A

Text - Detailed outline

- Text databases
- problem
- full text scanning
- inversion
- signature files (a.k.a. Bloom Filters)
- Vector model and clustering
- information filtering and LSI

Information Filtering + LSI

- [Foltz+,’92] Goal:
- users specify interests (= keywords)
- system alerts them, on suitable news-documents
- Major contribution: LSI = Latent Semantic Indexing
- latent (‘hidden’) concepts

Information Filtering + LSI

Main idea

- map each document into some ‘concepts’
- map each term into some ‘concepts’

‘Concept’:~ a set of terms, with weights, e.g.

- “data” (0.8), “system” (0.5), “retrieval” (0.6) -> DBMS_concept

Information Filtering + LSI

Pictorially: term-document matrix (BEFORE)

Information Filtering + LSI

Pictorially: concept-document matrix and...

Information Filtering + LSI

... and concept-term matrix

Information Filtering + LSI

Q: How to search, eg., for ‘system’?

Information Filtering + LSI

A: find the corresponding concept(s); and the corresponding documents

Information Filtering + LSI

A: find the corresponding concept(s); and the corresponding documents

Information Filtering + LSI

Thus it works like an (automatically constructed) thesaurus:

we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’)

SVD - Detailed outline

- Motivation
- Definition - properties
- Interpretation
- Complexity
- Case studies
- Additional properties

SVD - Motivation

- problem #1: text - LSI: find ‘concepts’
- problem #2: compression / dim. reduction

SVD - Motivation

- problem #1: text - LSI: find ‘concepts’

SVD - Motivation

- problem #2: compress / reduce dimensionality

Problem - specs

- ~10**6 rows; ~10**3 columns; no updates;
- random access to any cell(s) ; small error: OK

SVD - Definition

A[n x m] = U[n x r]L [ r x r] (V[m x r])T

- A: n x m matrix (eg., n documents, m terms)
- U: n x r matrix (n documents, r concepts)
- L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)
- V: m x r matrix (m terms, r concepts)

SVD - Properties

THEOREM [Press+92]:always possible to decomposematrix A into A = ULVT , where

- U,L,V: unique (*)
- U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other)
- UTU = I; VTV = I (I: identity matrix)
- L: eigenvalues are positive, and sorted in decreasing order

SVD - Example

doc-to-concept

similarity matrix

- A = ULVT - example:

retrieval

CS-concept

inf.

lung

MD-concept

brain

data

CS

x

x

=

MD

SVD - Example

- A = ULVT - example:

term-to-concept

similarity matrix

retrieval

inf.

lung

brain

data

CS-concept

CS

x

x

=

MD

SVD - Example

- A = ULVT - example:

term-to-concept

similarity matrix

retrieval

inf.

lung

brain

data

CS-concept

CS

x

x

=

MD

SVD - Detailed outline

- Motivation
- Definition - properties
- Interpretation
- Complexity
- Case studies
- Additional properties

SVD - Interpretation #1

‘documents’, ‘terms’ and ‘concepts’:

- U: document-to-concept similarity matrix
- V: term-to-concept sim. matrix
- L: its diagonal elements: ‘strength’ of each concept

SVD - Interpretation #2

- best axis to project on: (‘best’ = min sum of squares of projection errors)

SVD - Interpretation #2

- A = ULVT - example:
- UL gives the coordinates of the points in the projection axis

x

x

=

SVD - Interpretation #2

- More details
- Q: how exactly is dim. reduction done?
- A: set the smallest eigenvalues to zero:

x

x

=

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2‘spectral decomposition’ of the matrix:

m

r terms

=

+

+...

n

n x 1

1 x m

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2approximation / dim. reduction:

by keeping the first few terms (Q: how many?)

m

To do the mapping you use VT

X’ = VT X

=

+

+...

n

assume: l1 >= l2 >= ...

l1

l2

u1

u2

vT1

vT2

SVD - Interpretation #2A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s)

m

=

+

+...

n

assume: l1 >= l2 >= ...

SVD - Interpretation #3

- A: SVD properties:
- matrix product should give back matrix A
- matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other
- ditto for matrix V
- matrixLshould be diagonal, with positive values

SVD - Complexity

- O( n * m * m) or O( n * n * m) (whichever is less)
- less work, if we just want eigenvalues
- or if we want first k eigenvectors
- or if the matrix is sparse [Berry]
- Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica ...)

Optimality of SVD

Def: TheFrobenius norm of a n x m matrix M is

(reminder) The rank of a matrix M is the number of independent rows (or columns) of M

Let A=ULVT and Ak = UkLk VkT (SVD approximation of A)

Ak is annxm matrix, Uk an nxk, Lk kxk, and Vk mxk

Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:

Kleinberg’s Algorithm

- Main idea: In many cases, when you search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times)
- Harvard : www.harvard.edu
- Search Engines: yahoo, google, altavista
- Authorities and hubs

Kleinberg’s algorithm

- Problem dfn: given the web and a query
- find the most ‘authoritative’ web pages for this query

Step 0: find all pages containing the query terms (root set)

Step 1: expand by one move forward and backward (base set)

Kleinberg’s algorithm

- Step 1: expand by one move forward and backward

Kleinberg’s algorithm

- on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to
- give high importance score (‘hubs’) to nodes that point to good ‘authorities’)

hubs

authorities

Kleinberg’s algorithm

observations

- recursive definition!
- each node (say, ‘i’-th node) has both an authoritativeness score ai and a hubness score hi

Kleinberg’s algorithm

Let E be the set of edges and A be the adjacency matrix:

the (i,j) is 1 if the edge from i to j exists

Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores.

Then:

Kleinberg’s algorithm

Then:

ai = hk + hl + hm

that is

ai = Sum (hj) over all j that (j,i) edge exists

or

a = ATh

k

i

l

m

Kleinberg’s algorithm

symmetrically, for the ‘hubness’:

hi = an + ap + aq

that is

hi = Sum (qj) over all j that (i,j) edge exists

or

h = Aa

i

n

p

q

Kleinberg’s algorithm

In conclusion, we want vectors h and a such that:

h = Aa

a = ATh

Recall properties:

C(2): A[n x m]v1 [m x 1] = l1 u1[n x 1]

C(3): u1T A = l1 v1T

Kleinberg’s algorithm

In short, the solutions to

h = Aa

a = ATh

are the left- and right- eigenvectors of the adjacency matrix A.

Starting from random a’ and iterating, we’ll eventually converge

(Q: to which of all the eigenvectors? why?)

Kleinberg’s algorithm

(Q: to which of all the eigenvectors? why?)

A: to the ones of the strongest eigenvalue, because of property B(5):

B(5): (ATA )k v’ ~ (constant) v1

Kleinberg’s algorithm - results

Eg., for the query ‘java’:

0.328 www.gamelan.com

0.251 java.sun.com

0.190 www.digitalfocus.com (“the java developer”)

Kleinberg’s algorithm - discussion

- ‘authority’ score can be used to find ‘similar pages’ to page p
- closely related to ‘citation analysis’, social networs / ‘small world’ phenomena

google/page-rank algorithm

- closely related: The Web is a directed graph of connected nodes
- imagine a particle randomly moving along the edges (*)
- compute its steady-state probabilities. That gives the PageRank of each pages (the importance of this page)

(*) with occasional random jumps

PageRank Definition

- Assume a page A and pages T1, T2, …, Tm that point to A. Let d is a damping factor. PR(A) the pagerank of A. C(A) the out-degree of A. Then:

google/page-rank algorithm

- Compute the PR of each page~identical problem: given a Markov Chain, compute the steady state probabilities p1 ... p5

2

1

3

4

5

Computing PageRank

- Iterative procedure
- Also, … navigate the web by randomly follow links or with prob p jump to a random page. Let A the adjacency matrix (n x n), di out-degree of page i

Prob(Ai->Aj) = pn-1+(1-p)di–1Aij

A’[i,j] = Prob(Ai->Aj)

google/page-rank algorithm

- Let A’ be the transition matrix (= adjacency matrix, row-normalized : sum of each row = 1)

2

1

3

=

4

5

google/page-rank algorithm

- A p = p
- thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is row-normalized)

Kleinberg/google - conclusions

SVD helps in graph analysis:

hub/authority scores: strongest left- and right- eigenvectors of the adjacency matrix

random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix

Conclusions – so far

- SVD: a valuable tool
- given a document-term matrix, it finds ‘concepts’ (LSI)
- ... and can reduce dimensionality (KL)

Conclusions cont’d

- ... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains)
- ... and can solve optimally over- and under-constraint linear systems (least squares)

Brin, S. and L. Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.

ReferencesEmbeddings

- Given a metric distance matrix D, embed the objects in a k-dimensional vector space using a mapping F such that
- D(i,j) is close to D’(F(i),F(j))
- Isometric mapping:
- exact preservation of distance
- Contractive mapping:
- D’(F(i),F(j)) <= D(i,j)
- d’ is some Lp measure

PCA

- Intuition: find the axis that shows the greatest variation, and project all points into this axis

f2

e1

e2

f1

SVD: The mathematical formulation

- Normalize the dataset by moving the origin to the center of the dataset
- Find the eigenvectors of the data (or covariance) matrix
- These define the new space
- Sort the eigenvalues in “goodness” order

f2

e1

e2

f1

SVD Cont’d

- Advantages:
- Optimal dimensionality reduction (for linear projections)
- Disadvantages:
- Computationally expensive… but can be improved with random sampling
- Sensitive to outliers and non-linearities

FastMap

What if we have a finite metric space (X, d )?

Faloutsos and Lin (1995) proposed FastMap as metric analogue to the KL-transform (PCA). Imagine that the points are in a Euclidean space.

- Select two pivot pointsxa and xb that are far apart.
- Compute a pseudo-projection of the remaining points along the “line” xaxb .
- “Project” the points to an orthogonal subspace and recurse.

Selecting the Pivot Points

The pivot points should lie along the principal axes, and hence should be far apart.

- Select any point x0.
- Let x1 be the furthest from x0.
- Let x2 be the furthest from x1.
- Return (x1, x2).

x2

x0

x1

Pseudo-Projections

xb

Given pivots (xa , xb ), for any third point y, we use the law of cosines to determine the relation of y along xaxb .

The pseudo-projection for y is

This is first coordinate.

db,y

da,b

y

cy

da,y

xa

“Project to orthogonal plane”

xb

cz-cy

Given distances along xaxb we can compute distances within the “orthogonal hyperplane” using the Pythagorean theorem.

Using d ’(.,.), recurse until k features chosen.

dy,z

z

y

xa

y’

z’

d’y’,z’

Random Projections

- Based on the Johnson-Lindenstrauss lemma:
- For:
- 0< e < 1/2,
- any (sufficiently large) set S of M points in Rn
- k = O(e-2lnM)
- There exists a linear map f:SRk, such that
- (1- e) D(S,T) < D(f(S),f(T)) < (1+ e)D(S,T) for S,T in S
- Random projection is good with constant probability

Random Projection: Application

- Set k = O(e-2lnM)
- Select k random n-dimensional vectors
- (an approach is to select k gaussian distributed vectors with variance 0 and mean value 1: N(1,0) )
- Project the original points into the k vectors.
- The resulting k-dimensional space approximately preserves the distances with high probability

Random Projection

- A very useful technique,
- Especially when used in conjunction with another technique (for example SVD)
- Use Random projection to reduce the dimensionality from thousands to hundred, then apply SVD to reduce dimensionality farther

Download Presentation

Connecting to Server..