learning embeddings for similarity based retrieval
Download
Skip this Video
Download Presentation
Learning Embeddings for Similarity-Based Retrieval

Loading in 2 Seconds...

play fullscreen
1 / 92

Learning Embeddings for Similarity-Based Retrieval - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Learning Embeddings for Similarity-Based Retrieval. Vassilis Athitsos Computer Science Department Boston University. Overview. Background on similarity-based retrieval and embeddings. BoostMap. Embedding optimization using machine learning. Query-sensitive embeddings.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Learning Embeddings for Similarity-Based Retrieval' - judith-pruitt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning embeddings for similarity based retrieval

Learning Embeddings for Similarity-Based Retrieval

Vassilis Athitsos

Computer Science Department

Boston University

overview
Overview
  • Background on similarity-based retrieval and embeddings.
  • BoostMap.
    • Embedding optimization using machine learning.
  • Query-sensitive embeddings.
    • Ability to preserve non-metric structure.
problem definition

x1

x2

x3

xn

Problem Definition

database

(n objects)

problem definition1

x1

x2

x3

xn

Problem Definition

database

(n objects)

  • Goals:
    • find the k nearest neighbors of query q.

q

problem definition2

x1

x3

x2

xn

Problem Definition

database

(n objects)

  • Goals:
    • find the k nearest neighbors of query q.
  • Brute force time is linear to:
    • n (size of database).
    • time it takes to measure a single distance.

x2

q

xn

problem definition3

x1

x3

x2

xn

Problem Definition

database

(n objects)

  • Goals:
    • find the k nearest neighbors of query q.
  • Brute force time is linear to:
    • n (size of database).
    • time it takes to measure a single distance.

q

applications
Nearest neighbor classification.

Similarity-based retrieval.

Image/video databases.

Biological databases.

Time series.

Web pages.

Browsing music or movie catalogs.

faces

letters/digits

Applications

handshapes

expensive distance measures1
Comparing d-dimensional vectors is efficient:

O(d) time.

Comparing strings of length d with the edit distance is more expensive:

O(d2) time.

Reason: alignment.

x1

y1

x2

y2

y3

x3

x4

y4

xd

yd

Expensive Distance Measures

i m m i g r a t i o n

i m i t a t i o n

expensive distance measures2
Comparing d-dimensional vectors is efficient:

O(d) time.

x1

y1

x2

y2

y3

x3

x4

y4

xd

yd

Expensive Distance Measures
  • Comparing strings of length d with the edit distance is more expensive:
    • O(d2) time.
  • Reason: alignment.

i m m i g r a t i o n

i m i t a t i o n

slide14

Shape Context Distance

  • Proposed by Belongie et al. (2001).
    • Error rate: 0.63%, with database of 20,000 images.
    • Uses bipartite matching (cubic complexity!).
    • 22 minutes/object, heavily optimized.
    • Result preview: 5.2 seconds, 0.61% error rate.
slide15

More Examples

  • DNA and protein sequences:
    • Smith-Waterman.
  • Time series:
    • Dynamic Time Warping.
  • Probability distributions:
    • Kullback-Leibler Distance.
  • These measures are non-Euclidean, sometimes non-metric.
indexing problem
Indexing Problem
  • Vector indexing methods NOT applicable.
    • PCA.
    • R-trees, X-trees, SS-trees.
    • VA-files.
    • Locality Sensitive Hashing.
metric methods
Metric Methods
  • Pruning-based methods.
    • VP-trees, MVP-trees, M-trees, Slim-trees,…
    • Use triangle inequality for tree-based search.
  • Filtering methods.
    • AESA, LAESA…
    • Use the triangle inequality to compute upper/lower bounds of distances.
  • Suffer from curse of dimensionality.
  • Heuristic in non-metric spaces.
  • In many datasets, bad empirical performance.
slide18

x1

x2

x3

xn

x1

x2

x3

x4

xn

Embeddings

database

Rd

embedding

F

slide19

x1

x2

x3

xn

x1

x2

x3

x4

xn

q

Embeddings

database

Rd

embedding

F

query

slide20

x1

x2

x3

xn

x1

x2

x3

x4

xn

q

q

Embeddings

database

Rd

embedding

F

query

slide21

x2

x3

x1

xn

x4

x3

x2

x1

xn

q

q

  • Measure distances between vectors (typically much faster).

Embeddings

database

Rd

embedding

F

query

slide22

x2

x3

x1

xn

x4

x3

x2

x1

xn

q

q

  • Measure distances between vectors (typically much faster).
  • Caveat: the embedding must preserve similarity structure.

Embeddings

database

Rd

embedding

F

query

slide25

Reference Object Embeddings

database

r1

r2

r3

x

F(x) = (D(x, r1), D(x, r2), D(x, r3))

slide26

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)

F(Las Vegas).....= ( 262, 1232, 2405)

F(Oklahoma City).= (1345, 437, 1291)

F(Washington DC).= (2657, 1207, 853)

F(Jacksonville)..= (2422, 1344, 141)

existing embedding methods
Existing Embedding Methods
  • FastMap, MetricMap, SparseMap, Lipschitz embeddings.
    • Use distances to reference objects (prototypes).
  • Question: how do we directly optimize an embedding for nearest neighbor retrieval?
    • FastMap & MetricMap assume Euclidean properties.
    • SparseMap optimizes stress.
      • Large stress may be inevitable when embedding non-metric spaces into a metric space.
    • In practice often worse than random construction.
boostmap
BoostMap
  • BoostMap: A Method for Efficient Approximate Similarity Rankings.Athitsos, Alon, Sclaroff, and Kollios,CVPR 2004.
  • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios,PAMI 2007(to appear).
key features of boostmap
Key Features of BoostMap
  • Maximizes amount of nearest neighbor structure preserved by the embedding.
  • Based on machine learning, not on geometric assumptions.
    • Principled optimization, even in non-metric spaces.
  • Can capture non-metric structure.
    • Query-sensitive version of BoostMap.
  • Better results in practice, in all datasets we have tried.
slide30

F

Rd

original space X

Ideal Embedding Behavior

a

q

For any query q: we want F(NN(q)) = NN(F(q)).

slide31

F

Rd

original space X

Ideal Embedding Behavior

a

q

For any query q: we want F(NN(q)) = NN(F(q)).

slide32

F

Rd

original space X

Ideal Embedding Behavior

a

q

For any query q: we want F(NN(q)) = NN(F(q)).

slide33

F

Rd

original space X

Ideal Embedding Behavior

b

a

q

For any query q: we want F(NN(q)) = NN(F(q)).

For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b).

embeddings seen as classifiers

b

a

q

Embeddings Seen As Classifiers

For triples (q, a, b) such that:

- q is a query object

- a = NN(q)

- b is a database object

Classification task: is q

closer to a or to b?

slide35

b

a

q

Embeddings Seen As Classifiers

For triples (q, a, b) such that:

- q is a query object

- a = NN(q)

- b is a database object

Classification task: is q

closer to a or to b?

  • Any embedding F defines a classifier F’(q, a, b).
    • F’ checks if F(q) is closer to F(a) or to F(b).
slide36

b

a

q

Classifier Definition

For triples (q, a, b) such that:

- q is a query object

- a = NN(q)

- b is a database object

Classification task: is q

closer to a or to b?

  • Given embedding F: X  Rd:
    • F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.
  • F’(q, a, b) > 0 means “q is closer to a.”
  • F’(q, a, b) < 0 means “q is closer to b.”
slide37

F

Rd

original space X

Key Observation

b

a

q

  • If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)).
    • If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.
slide38

F

Rd

original space X

Key Observation

b

a

q

  • Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.
slide39

Optimization Criterion

  • Goal: construct an embedding F optimized for k-nearest neighbor retrieval.
  • Method: maximize accuracy of F’ on triples (q, a, b) of the following type:
    • q is any object.
    • a is a k-nearest neighbor of q in the database.
    • b is in database, but NOT a k-nearest neighbor of q.
  • If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors.
slide40

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.
    • Better than a random classifier (50% error rate).
slide41

Lincoln

Detroit

LA

Chicago

New

York

Cleveland

Chicago

LA

Detroit

New

York

slide42

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.
    • Better than a random classifier (50% error rate).
  • We can define lots of different classifiers.
    • Every object in the database can be a reference object.
slide43

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.
    • Better than a random classifier (50% error rate).
  • We can define lots of different classifiers.
    • Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

slide44

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.
    • Better than a random classifier (50% error rate).
  • We can define lots of different classifiers.
    • Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

Answer: use AdaBoost.

  • AdaBoost is a machine learning method designed for exactly this problem.
slide45

Fn

F2

F1

Using AdaBoost

original space X

Real line

  • Output: H = w1F’1 + w2F’2 + … + wdF’d .
    • AdaBoost chooses 1D embeddings and weighs them.
    • Goal: achieve low classification error.
    • AdaBoost trains on triples chosen from the database.
from classifier to embedding
From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

What embedding should we use?

What distance measure should we use?

from classifier to embedding1
From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x)).

from classifier to embedding2

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi|

d

From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x)).

Distance

measure

from classifier to embedding3

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi|

d

From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x)).

Distance

measure

Claim:

Let q be closer to a than to b. H misclassifies

triple (q, a, b) if and only if, under distance

measure D, F maps q closer to b than to a.

proof

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

proof1

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

proof2

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

proof3

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

proof4

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

proof5

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

significance of proof
Significance of Proof
  • AdaBoost optimizes a direct measure of embedding quality.
  • We optimize an indexing structure for similarity-based retrieval using machine learning.
    • Take advantage of training data.
how do we use it
How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.
how do we use it1
How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.
  • Given a query object q:
    • Embedding step:
      • Compute distances from query to reference objects  F(q).
how do we use it2
How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.
  • Given a query object q:
    • Embedding step:
      • Compute distances from query to reference objects  F(q).
    • Filter step:
      • Find top p matches of F(q) in vector space.
how do we use it3
How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.
  • Given a query object q:
    • Embedding step:
      • Compute distances from query to reference objects  F(q).
    • Filter step:
      • Find top p matches of F(q) in vector space.
    • Refine step:
      • Measure exact distance from q to top p matches.
slide61

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

  • Embedding step:
    • Compute distances from query to reference objects  F(q).
  • Filter step:
    • Find top p matches of F(q) in vector space.
  • Refine step:
    • Measure exact distance from q to top p matches.
slide62

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

  • Embedding step:
    • Compute distances from query to reference objects  F(q).
  • Filter step:
    • Find top p matches of F(q) in vector space.
  • Refine step:
    • Measure exact distance from q to top p matches.
slide63

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

  • Embedding step:
    • Compute distances from query to reference objects  F(q).
  • Filter step:
    • Find top p matches of F(q) in vector space.
  • Refine step:
    • Measure exact distance from q to top p matches.
slide64

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

  • Embedding step:
    • Compute distances from query to reference objects  F(q).
  • Filter step:
    • Find top p matches of F(q) in vector space.
  • Refine step:
    • Measure exact distance from q to top p matches.
slide65

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

  • Embedding step:
    • Compute distances from query to reference objects  F(q).
  • Filter step:
    • Find top p matches of F(q) in vector space.
  • Refine step:
    • Measure exact distance from q to top p matches.
slide66

Evaluating Embedding Quality

What is the nearest neighbor classification error?

How many exact distance computations do we need?

  • Embedding step:
    • Compute distances from query to reference objects  F(q).
  • Filter step:
    • Find top p matches of F(q) in vector space.
  • Refine step:
    • Measure exact distance from q to top p matches.
slide67

nearest

neighbor

Database (80,640 images)

query

Results on Hand Dataset

Chamfer distance: 112 seconds per query

slide68

Results on Hand Dataset

Database: 80,640 synthetic images of hands.

Query set: 710 real images of hands.

slide69

Results on Hand Dataset

Database: 80,640 synthetic images of hands.

Query set: 710 real images of hands.

results on mnist dataset
Results on MNIST Dataset
  • MNIST: 60,000 database objects, 10,000 queries.
  • Shape context (Belongie 2001):
    • 0.63% error, 20,000 distances, 22 minutes.
    • 0.54% error, 60,000 distances, 66 minutes.
query sensitive embeddings
Query-Sensitive Embeddings
  • Richer models.
    • Capture non-metric structure.
    • Better embedding quality.
  • References:
    • Athitsos, Hadjieleftheriou, Kollios, and Sclaroff, SIGMOD 2005.
    • Athitsos, Hadjieleftheriou, Kollios, and Sclaroff, TODS, June 2007.
capturing non metric structure
Capturing Non-Metric Structure
  • A human is not similar to a horse.
  • A centaur is similar both to a human and a horse.
  • Triangle inequality is violated:
    • Using human ratings of similarity (Tversky, 1982).
    • Using k-median Hausdorff distance.
capturing non metric structure1
Capturing Non-Metric Structure
  • Mapping to a metric space presents dilemma:
    • If D(F(centaur), F(human)) = D(F(centaur), F(horse)) = C, then D(F(human), F(horse)) <= 2C.
  • Query-sensitive embeddings:
    • Have the modeling power to preserve non-metric structure.
local importance of coordinates

xn1

x11

q1

x21

x22

q2

xn2

x12

xn3

x13

q3

x23

q4

x14

xn4

x24

xnd

qd

x1d

x2d

Local Importance of Coordinates
  • How important is each coordinate in comparing embeddings?

Rd

database

x1

embedding

F

x2

xn

query

q

slide76

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)

F(Las Vegas).....= ( 262, 1232, 2405)

F(Oklahoma City).= (1345, 437, 1291)

F(Washington DC).= (2657, 1207, 853)

F(Jacksonville)..= (2422, 1344, 141)

slide77

General Intuition

1

2

original space X

3

  • Classifier: H = w1F’1 + w2F’2 + … + wjF’j.
  • Observation: accuracy of weak classifiers depends on query.
    • F’1 is perfect for (q, a, b) where q = reference object 1.
    • F’1 is good for queries close to reference object 1.
  • Question: how can we capture that?
slide78

V: area of influence (interval of real numbers).

F’(q, a, b) if F(q) is in V

  • QF,V(q, a, b) =

“I don’t know” if F(q) not in V

Query-Sensitive Weak Classifiers

1

2

original space X

3

slide79

V: area of influence (interval of real numbers).

F’(q, a, b) if F(q) is in V

  • QF,V(q, a, b) =

“I don’t know” if F(q) not in V

  • If V includes all real numbers, QF,V = F’.

Query-Sensitive Weak Classifiers

1

2

original space X

j

slide80

Fd

F2

F1

Applying AdaBoost

original space X

Real line

  • AdaBoost forms classifiers QFi,Vi.
    • Fi: 1D embedding.
    • Vi: area of influence for Fi.
  • Output: H = w1 QF1,V1 + w2 QF2,V2 + … + wd QFd,Vd.
slide81

Fd

F2

F1

Applying AdaBoost

original space X

Real line

  • Empirical observation:
    • At late stages of the training, query-sensitive weak classifiers are still useful, whereas query-insensitive classifiers are not.
from classifier to embedding4
From Classifier to Embedding

H(q, a, b) = i=1wi QFi,Vi(q, a, b)

d

AdaBoost

output

What embedding should we use?

What distance measure should we use?

from classifier to embedding5
From Classifier to Embedding

H(q, a, b) = i=1wi QFi,Vi(q, a, b)

d

AdaBoost

output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x))

D(F(q), F(x)) = i=1wi SFi,Vi (q) |Fi(q) – Fi(x)|

d

Distance

measure

from classifier to embedding6
From Classifier to Embedding

H(q, a, b) = i=1wi QFi,Vi(q, a, b)

d

AdaBoost

output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x))

D(F(q), F(x)) = i=1wi SFi,Vi(q) |Fi(q) – Fi(x)|

d

Distance

measure

  • Distance measure is query-sensitive.
    • Weighted L1 distance, weights depend on q.
    • SF,V(q) = 1 if F(q) is in V, 0 otherwise.
centaurs revisited
Centaurs Revisited
  • Reference objects: human, horse, centaur.
    • For centaur queries, use weights (0,0,1).
    • For human queries, use weights (1,0,0).
  • Query-sensitive distances are non-metric.
    • Combine efficiency of L1 distance and ability to capture non-metric structure.
slide86

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)

F(Las Vegas).....= ( 262, 1232, 2405)

F(Oklahoma City).= (1345, 437, 1291)

F(Washington DC).= (2657, 1207, 853)

F(Jacksonville)..= (2422, 1344, 141)

recap of advantages
Recap of Advantages
  • Capturing non-metric structure.
  • Finding most informative reference objects for each query.
  • Richer model overall.
    • Choosing a weak classifier now also involves choosing an area of influence.
slide88

Dynamic Time Warping on

Time Series

Database: 31818 time series.

Query set: 1000 time series.

slide89

Dynamic Time Warping on

Time Series

Database: 32768 time series.

Query set: 50 time series.

boostmap recap theory
BoostMap Recap - Theory
  • Machine-learning method for optimizing embeddings.
    • Explicitly maximizes amount of nearest neighbor structure preserved by embedding.
    • Optimization method is independent of underlying geometry.
    • Query-sensitive version can capture non-metric structure.
boostmap recap practice
BoostMap Recap - Practice
  • BoostMap can significantly speed up nearest neighbor retrieval and classification.
    • Useful in real-world datasets:
      • Hand shape classification.
      • Optical character recognition (MNIST, UNIPEN).
    • In all four datasets, better results than other methods.
      • In three benchmark datasets, better than methods custom-made for those distance measures.
    • Domain-independent formulation.
      • Distance measures are used as a black box.
      • Application to proteins/DNA matching…
ad