Learning embeddings for similarity based retrieval
This presentation is the property of its rightful owner.
Sponsored Links
1 / 92

Learning Embeddings for Similarity-Based Retrieval PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Learning Embeddings for Similarity-Based Retrieval. Vassilis Athitsos Computer Science Department Boston University. Overview. Background on similarity-based retrieval and embeddings. BoostMap. Embedding optimization using machine learning. Query-sensitive embeddings.

Download Presentation

Learning Embeddings for Similarity-Based Retrieval

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning embeddings for similarity based retrieval

Learning Embeddings for Similarity-Based Retrieval

Vassilis Athitsos

Computer Science Department

Boston University


Overview

Overview

  • Background on similarity-based retrieval and embeddings.

  • BoostMap.

    • Embedding optimization using machine learning.

  • Query-sensitive embeddings.

    • Ability to preserve non-metric structure.


Problem definition

x1

x2

x3

xn

Problem Definition

database

(n objects)


Problem definition1

x1

x2

x3

xn

Problem Definition

database

(n objects)

  • Goals:

    • find the k nearest neighbors of query q.

q


Problem definition2

x1

x3

x2

xn

Problem Definition

database

(n objects)

  • Goals:

    • find the k nearest neighbors of query q.

  • Brute force time is linear to:

    • n (size of database).

    • time it takes to measure a single distance.

x2

q

xn


Problem definition3

x1

x3

x2

xn

Problem Definition

database

(n objects)

  • Goals:

    • find the k nearest neighbors of query q.

  • Brute force time is linear to:

    • n (size of database).

    • time it takes to measure a single distance.

q


Applications

Nearest neighbor classification.

Similarity-based retrieval.

Image/video databases.

Biological databases.

Time series.

Web pages.

Browsing music or movie catalogs.

faces

letters/digits

Applications

handshapes


Expensive distance measures

Comparing d-dimensional vectors is efficient:

O(d) time.

x1

y1

x2

y2

x3

y3

x4

y4

xd

yd

Expensive Distance Measures


Expensive distance measures1

Comparing d-dimensional vectors is efficient:

O(d) time.

Comparing strings of length d with the edit distance is more expensive:

O(d2) time.

Reason: alignment.

x1

y1

x2

y2

y3

x3

x4

y4

xd

yd

Expensive Distance Measures

i m m i g r a t i o n

i m i t a t i o n


Expensive distance measures2

Comparing d-dimensional vectors is efficient:

O(d) time.

x1

y1

x2

y2

y3

x3

x4

y4

xd

yd

Expensive Distance Measures

  • Comparing strings of length d with the edit distance is more expensive:

    • O(d2) time.

  • Reason: alignment.

i m m i g r a t i o n

i m i t a t i o n


Matching handwritten digits

Matching Handwritten Digits


Matching handwritten digits1

Matching Handwritten Digits


Matching handwritten digits2

Matching Handwritten Digits


Learning embeddings for similarity based retrieval

Shape Context Distance

  • Proposed by Belongie et al. (2001).

    • Error rate: 0.63%, with database of 20,000 images.

    • Uses bipartite matching (cubic complexity!).

    • 22 minutes/object, heavily optimized.

    • Result preview: 5.2 seconds, 0.61% error rate.


Learning embeddings for similarity based retrieval

More Examples

  • DNA and protein sequences:

    • Smith-Waterman.

  • Time series:

    • Dynamic Time Warping.

  • Probability distributions:

    • Kullback-Leibler Distance.

  • These measures are non-Euclidean, sometimes non-metric.


Indexing problem

Indexing Problem

  • Vector indexing methods NOT applicable.

    • PCA.

    • R-trees, X-trees, SS-trees.

    • VA-files.

    • Locality Sensitive Hashing.


Metric methods

Metric Methods

  • Pruning-based methods.

    • VP-trees, MVP-trees, M-trees, Slim-trees,…

    • Use triangle inequality for tree-based search.

  • Filtering methods.

    • AESA, LAESA…

    • Use the triangle inequality to compute upper/lower bounds of distances.

  • Suffer from curse of dimensionality.

  • Heuristic in non-metric spaces.

  • In many datasets, bad empirical performance.


Learning embeddings for similarity based retrieval

x1

x2

x3

xn

x1

x2

x3

x4

xn

Embeddings

database

Rd

embedding

F


Learning embeddings for similarity based retrieval

x1

x2

x3

xn

x1

x2

x3

x4

xn

q

Embeddings

database

Rd

embedding

F

query


Learning embeddings for similarity based retrieval

x1

x2

x3

xn

x1

x2

x3

x4

xn

q

q

Embeddings

database

Rd

embedding

F

query


Learning embeddings for similarity based retrieval

x2

x3

x1

xn

x4

x3

x2

x1

xn

q

q

  • Measure distances between vectors (typically much faster).

Embeddings

database

Rd

embedding

F

query


Learning embeddings for similarity based retrieval

x2

x3

x1

xn

x4

x3

x2

x1

xn

q

q

  • Measure distances between vectors (typically much faster).

  • Caveat: the embedding must preserve similarity structure.

Embeddings

database

Rd

embedding

F

query


Learning embeddings for similarity based retrieval

Reference Object Embeddings

database


Learning embeddings for similarity based retrieval

Reference Object Embeddings

database

r1

r2

r3


Learning embeddings for similarity based retrieval

Reference Object Embeddings

database

r1

r2

r3

x

F(x) = (D(x, r1), D(x, r2), D(x, r3))


Learning embeddings for similarity based retrieval

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)

F(Las Vegas).....= ( 262, 1232, 2405)

F(Oklahoma City).= (1345, 437, 1291)

F(Washington DC).= (2657, 1207, 853)

F(Jacksonville)..= (2422, 1344, 141)


Existing embedding methods

Existing Embedding Methods

  • FastMap, MetricMap, SparseMap, Lipschitz embeddings.

    • Use distances to reference objects (prototypes).

  • Question: how do we directly optimize an embedding for nearest neighbor retrieval?

    • FastMap & MetricMap assume Euclidean properties.

    • SparseMap optimizes stress.

      • Large stress may be inevitable when embedding non-metric spaces into a metric space.

    • In practice often worse than random construction.


Boostmap

BoostMap

  • BoostMap: A Method for Efficient Approximate Similarity Rankings.Athitsos, Alon, Sclaroff, and Kollios,CVPR 2004.

  • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios,PAMI 2007(to appear).


Key features of boostmap

Key Features of BoostMap

  • Maximizes amount of nearest neighbor structure preserved by the embedding.

  • Based on machine learning, not on geometric assumptions.

    • Principled optimization, even in non-metric spaces.

  • Can capture non-metric structure.

    • Query-sensitive version of BoostMap.

  • Better results in practice, in all datasets we have tried.


Learning embeddings for similarity based retrieval

F

Rd

original space X

Ideal Embedding Behavior

a

q

For any query q: we want F(NN(q)) = NN(F(q)).


Learning embeddings for similarity based retrieval

F

Rd

original space X

Ideal Embedding Behavior

a

q

For any query q: we want F(NN(q)) = NN(F(q)).


Learning embeddings for similarity based retrieval

F

Rd

original space X

Ideal Embedding Behavior

a

q

For any query q: we want F(NN(q)) = NN(F(q)).


Learning embeddings for similarity based retrieval

F

Rd

original space X

Ideal Embedding Behavior

b

a

q

For any query q: we want F(NN(q)) = NN(F(q)).

For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b).


Embeddings seen as classifiers

b

a

q

Embeddings Seen As Classifiers

For triples (q, a, b) such that:

- q is a query object

- a = NN(q)

- b is a database object

Classification task: is q

closer to a or to b?


Learning embeddings for similarity based retrieval

b

a

q

Embeddings Seen As Classifiers

For triples (q, a, b) such that:

- q is a query object

- a = NN(q)

- b is a database object

Classification task: is q

closer to a or to b?

  • Any embedding F defines a classifier F’(q, a, b).

    • F’ checks if F(q) is closer to F(a) or to F(b).


Learning embeddings for similarity based retrieval

b

a

q

Classifier Definition

For triples (q, a, b) such that:

- q is a query object

- a = NN(q)

- b is a database object

Classification task: is q

closer to a or to b?

  • Given embedding F: X  Rd:

    • F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.

  • F’(q, a, b) > 0 means “q is closer to a.”

  • F’(q, a, b) < 0 means “q is closer to b.”


Learning embeddings for similarity based retrieval

F

Rd

original space X

Key Observation

b

a

q

  • If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)).

    • If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.


Learning embeddings for similarity based retrieval

F

Rd

original space X

Key Observation

b

a

q

  • Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.


Learning embeddings for similarity based retrieval

Optimization Criterion

  • Goal: construct an embedding F optimized for k-nearest neighbor retrieval.

  • Method: maximize accuracy of F’ on triples (q, a, b) of the following type:

    • q is any object.

    • a is a k-nearest neighbor of q in the database.

    • b is in database, but NOT a k-nearest neighbor of q.

  • If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors.


Learning embeddings for similarity based retrieval

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.

    • Better than a random classifier (50% error rate).


Learning embeddings for similarity based retrieval

Lincoln

Detroit

LA

Chicago

New

York

Cleveland

Chicago

LA

Detroit

New

York


Learning embeddings for similarity based retrieval

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.

    • Better than a random classifier (50% error rate).

  • We can define lots of different classifiers.

    • Every object in the database can be a reference object.


Learning embeddings for similarity based retrieval

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.

    • Better than a random classifier (50% error rate).

  • We can define lots of different classifiers.

    • Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?


Learning embeddings for similarity based retrieval

1D Embeddings as Weak Classifiers

  • 1D embeddings define weak classifiers.

    • Better than a random classifier (50% error rate).

  • We can define lots of different classifiers.

    • Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

Answer: use AdaBoost.

  • AdaBoost is a machine learning method designed for exactly this problem.


Learning embeddings for similarity based retrieval

Fn

F2

F1

Using AdaBoost

original space X

Real line

  • Output: H = w1F’1 + w2F’2 + … + wdF’d .

    • AdaBoost chooses 1D embeddings and weighs them.

    • Goal: achieve low classification error.

    • AdaBoost trains on triples chosen from the database.


From classifier to embedding

From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

What embedding should we use?

What distance measure should we use?


From classifier to embedding1

From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x)).


From classifier to embedding2

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi|

d

From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x)).

Distance

measure


From classifier to embedding3

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi|

d

From Classifier to Embedding

H = w1F’1 + w2F’2 + … + wdF’d

AdaBoost output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x)).

Distance

measure

Claim:

Let q be closer to a than to b. H misclassifies

triple (q, a, b) if and only if, under distance

measure D, F maps q closer to b than to a.


Proof

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)


Proof1

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)


Proof2

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)


Proof3

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)


Proof4

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)


Proof5

i=1

i=1

i=1

d

d

d

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)


Significance of proof

Significance of Proof

  • AdaBoost optimizes a direct measure of embedding quality.

  • We optimize an indexing structure for similarity-based retrieval using machine learning.

    • Take advantage of training data.


How do we use it

How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.


How do we use it1

How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.

  • Given a query object q:

    • Embedding step:

      • Compute distances from query to reference objects  F(q).


How do we use it2

How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.

  • Given a query object q:

    • Embedding step:

      • Compute distances from query to reference objects  F(q).

    • Filter step:

      • Find top p matches of F(q) in vector space.


How do we use it3

How Do We Use It?

Filter-and-refine retrieval:

  • Offline step: compute embedding F of entire database.

  • Given a query object q:

    • Embedding step:

      • Compute distances from query to reference objects  F(q).

    • Filter step:

      • Find top p matches of F(q) in vector space.

    • Refine step:

      • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

  • Embedding step:

    • Compute distances from query to reference objects  F(q).

  • Filter step:

    • Find top p matches of F(q) in vector space.

  • Refine step:

    • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

  • Embedding step:

    • Compute distances from query to reference objects  F(q).

  • Filter step:

    • Find top p matches of F(q) in vector space.

  • Refine step:

    • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

  • Embedding step:

    • Compute distances from query to reference objects  F(q).

  • Filter step:

    • Find top p matches of F(q) in vector space.

  • Refine step:

    • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

  • Embedding step:

    • Compute distances from query to reference objects  F(q).

  • Filter step:

    • Find top p matches of F(q) in vector space.

  • Refine step:

    • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

Evaluating Embedding Quality

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

  • Embedding step:

    • Compute distances from query to reference objects  F(q).

  • Filter step:

    • Find top p matches of F(q) in vector space.

  • Refine step:

    • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

Evaluating Embedding Quality

What is the nearest neighbor classification error?

How many exact distance computations do we need?

  • Embedding step:

    • Compute distances from query to reference objects  F(q).

  • Filter step:

    • Find top p matches of F(q) in vector space.

  • Refine step:

    • Measure exact distance from q to top p matches.


Learning embeddings for similarity based retrieval

nearest

neighbor

Database (80,640 images)

query

Results on Hand Dataset

Chamfer distance: 112 seconds per query


Learning embeddings for similarity based retrieval

Results on Hand Dataset

Database: 80,640 synthetic images of hands.

Query set: 710 real images of hands.


Learning embeddings for similarity based retrieval

Results on Hand Dataset

Database: 80,640 synthetic images of hands.

Query set: 710 real images of hands.


Results on mnist dataset

Results on MNIST Dataset

  • MNIST: 60,000 database objects, 10,000 queries.

  • Shape context (Belongie 2001):

    • 0.63% error, 20,000 distances, 22 minutes.

    • 0.54% error, 60,000 distances, 66 minutes.


Results on mnist dataset1

Results on MNIST Dataset


Query sensitive embeddings

Query-Sensitive Embeddings

  • Richer models.

    • Capture non-metric structure.

    • Better embedding quality.

  • References:

    • Athitsos, Hadjieleftheriou, Kollios, and Sclaroff, SIGMOD 2005.

    • Athitsos, Hadjieleftheriou, Kollios, and Sclaroff, TODS, June 2007.


Capturing non metric structure

Capturing Non-Metric Structure

  • A human is not similar to a horse.

  • A centaur is similar both to a human and a horse.

  • Triangle inequality is violated:

    • Using human ratings of similarity (Tversky, 1982).

    • Using k-median Hausdorff distance.


Capturing non metric structure1

Capturing Non-Metric Structure

  • Mapping to a metric space presents dilemma:

    • If D(F(centaur), F(human)) = D(F(centaur), F(horse)) = C, then D(F(human), F(horse)) <= 2C.

  • Query-sensitive embeddings:

    • Have the modeling power to preserve non-metric structure.


Local importance of coordinates

xn1

x11

q1

x21

x22

q2

xn2

x12

xn3

x13

q3

x23

q4

x14

xn4

x24

xnd

qd

x1d

x2d

Local Importance of Coordinates

  • How important is each coordinate in comparing embeddings?

Rd

database

x1

embedding

F

x2

xn

query

q


Learning embeddings for similarity based retrieval

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)

F(Las Vegas).....= ( 262, 1232, 2405)

F(Oklahoma City).= (1345, 437, 1291)

F(Washington DC).= (2657, 1207, 853)

F(Jacksonville)..= (2422, 1344, 141)


Learning embeddings for similarity based retrieval

General Intuition

1

2

original space X

3

  • Classifier: H = w1F’1 + w2F’2 + … + wjF’j.

  • Observation: accuracy of weak classifiers depends on query.

    • F’1 is perfect for (q, a, b) where q = reference object 1.

    • F’1 is good for queries close to reference object 1.

  • Question: how can we capture that?


Learning embeddings for similarity based retrieval

  • V: area of influence (interval of real numbers).

    F’(q, a, b) if F(q) is in V

  • QF,V(q, a, b) =

    “I don’t know” if F(q) not in V

Query-Sensitive Weak Classifiers

1

2

original space X

3


Learning embeddings for similarity based retrieval

  • V: area of influence (interval of real numbers).

    F’(q, a, b) if F(q) is in V

  • QF,V(q, a, b) =

    “I don’t know” if F(q) not in V

  • If V includes all real numbers, QF,V = F’.

Query-Sensitive Weak Classifiers

1

2

original space X

j


Learning embeddings for similarity based retrieval

Fd

F2

F1

Applying AdaBoost

original space X

Real line

  • AdaBoost forms classifiers QFi,Vi.

    • Fi: 1D embedding.

    • Vi: area of influence for Fi.

  • Output: H = w1 QF1,V1 + w2 QF2,V2 + … + wd QFd,Vd.


Learning embeddings for similarity based retrieval

Fd

F2

F1

Applying AdaBoost

original space X

Real line

  • Empirical observation:

    • At late stages of the training, query-sensitive weak classifiers are still useful, whereas query-insensitive classifiers are not.


From classifier to embedding4

From Classifier to Embedding

H(q, a, b) = i=1wi QFi,Vi(q, a, b)

d

AdaBoost

output

What embedding should we use?

What distance measure should we use?


From classifier to embedding5

From Classifier to Embedding

H(q, a, b) = i=1wi QFi,Vi(q, a, b)

d

AdaBoost

output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x))

D(F(q), F(x)) = i=1wi SFi,Vi (q) |Fi(q) – Fi(x)|

d

Distance

measure


From classifier to embedding6

From Classifier to Embedding

H(q, a, b) = i=1wi QFi,Vi(q, a, b)

d

AdaBoost

output

BoostMap

embedding

F(x) = (F1(x), …, Fd(x))

D(F(q), F(x)) = i=1wi SFi,Vi(q) |Fi(q) – Fi(x)|

d

Distance

measure

  • Distance measure is query-sensitive.

    • Weighted L1 distance, weights depend on q.

    • SF,V(q) = 1 if F(q) is in V, 0 otherwise.


Centaurs revisited

Centaurs Revisited

  • Reference objects: human, horse, centaur.

    • For centaur queries, use weights (0,0,1).

    • For human queries, use weights (1,0,0).

  • Query-sensitive distances are non-metric.

    • Combine efficiency of L1 distance and ability to capture non-metric structure.


Learning embeddings for similarity based retrieval

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)

F(Las Vegas).....= ( 262, 1232, 2405)

F(Oklahoma City).= (1345, 437, 1291)

F(Washington DC).= (2657, 1207, 853)

F(Jacksonville)..= (2422, 1344, 141)


Recap of advantages

Recap of Advantages

  • Capturing non-metric structure.

  • Finding most informative reference objects for each query.

  • Richer model overall.

    • Choosing a weak classifier now also involves choosing an area of influence.


Learning embeddings for similarity based retrieval

Dynamic Time Warping on

Time Series

Database: 31818 time series.

Query set: 1000 time series.


Learning embeddings for similarity based retrieval

Dynamic Time Warping on

Time Series

Database: 32768 time series.

Query set: 50 time series.


Boostmap recap theory

BoostMap Recap - Theory

  • Machine-learning method for optimizing embeddings.

    • Explicitly maximizes amount of nearest neighbor structure preserved by embedding.

    • Optimization method is independent of underlying geometry.

    • Query-sensitive version can capture non-metric structure.


Boostmap recap practice

BoostMap Recap - Practice

  • BoostMap can significantly speed up nearest neighbor retrieval and classification.

    • Useful in real-world datasets:

      • Hand shape classification.

      • Optical character recognition (MNIST, UNIPEN).

    • In all four datasets, better results than other methods.

      • In three benchmark datasets, better than methods custom-made for those distance measures.

    • Domain-independent formulation.

      • Distance measures are used as a black box.

      • Application to proteins/DNA matching…


Learning embeddings for similarity based retrieval

END


  • Login