Learning Embeddings for Similarity-Based Retrieval

Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University

Overview • Background on similarity-based retrieval and embeddings. • BoostMap. • Embedding optimization using machine learning. • Query-sensitive embeddings. • Ability to preserve non-metric structure.

x1 x2 x3 xn Problem Definition database (n objects)

x1 x2 x3 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. q

x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. x2 q xn

x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. q

Nearest neighbor classification. Similarity-based retrieval. Image/video databases. Biological databases. Time series. Web pages. Browsing music or movie catalogs. faces letters/digits Applications handshapes

Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 x3 y3 x4 y4 xd yd Expensive Distance Measures

Comparing d-dimensional vectors is efficient: O(d) time. Comparing strings of length d with the edit distance is more expensive: O(d2) time. Reason: alignment. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures i m m i g r a t i o n i m i t a t i o n

Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures • Comparing strings of length d with the edit distance is more expensive: • O(d2) time. • Reason: alignment. i m m i g r a t i o n i m i t a t i o n

Matching Handwritten Digits

Shape Context Distance • Proposed by Belongie et al. (2001). • Error rate: 0.63%, with database of 20,000 images. • Uses bipartite matching (cubic complexity!). • 22 minutes/object, heavily optimized. • Result preview: 5.2 seconds, 0.61% error rate.

More Examples • DNA and protein sequences: • Smith-Waterman. • Time series: • Dynamic Time Warping. • Probability distributions: • Kullback-Leibler Distance. • These measures are non-Euclidean, sometimes non-metric.

Indexing Problem • Vector indexing methods NOT applicable. • PCA. • R-trees, X-trees, SS-trees. • VA-files. • Locality Sensitive Hashing.

Metric Methods • Pruning-based methods. • VP-trees, MVP-trees, M-trees, Slim-trees,… • Use triangle inequality for tree-based search. • Filtering methods. • AESA, LAESA… • Use the triangle inequality to compute upper/lower bounds of distances. • Suffer from curse of dimensionality. • Heuristic in non-metric spaces. • In many datasets, bad empirical performance.

x1 x2 x3 xn x1 x2 x3 x4 xn Embeddings database Rd embedding F

x1 x2 x3 xn x1 x2 x3 x4 xn q Embeddings database Rd embedding F query

x1 x2 x3 xn x1 x2 x3 x4 xn q q Embeddings database Rd embedding F query

x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). Embeddings database Rd embedding F query

x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). • Caveat: the embedding must preserve similarity structure. Embeddings database Rd embedding F query

Reference Object Embeddings database

Reference Object Embeddings database r1 r2 r3

Reference Object Embeddings database r1 r2 r3 x F(x) = (D(x, r1), D(x, r2), D(x, r3))

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

Existing Embedding Methods • FastMap, MetricMap, SparseMap, Lipschitz embeddings. • Use distances to reference objects (prototypes). • Question: how do we directly optimize an embedding for nearest neighbor retrieval? • FastMap & MetricMap assume Euclidean properties. • SparseMap optimizes stress. • Large stress may be inevitable when embedding non-metric spaces into a metric space. • In practice often worse than random construction.

BoostMap • BoostMap: A Method for Efficient Approximate Similarity Rankings.Athitsos, Alon, Sclaroff, and Kollios,CVPR 2004. • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios,PAMI 2007(to appear).

Key Features of BoostMap • Maximizes amount of nearest neighbor structure preserved by the embedding. • Based on machine learning, not on geometric assumptions. • Principled optimization, even in non-metric spaces. • Can capture non-metric structure. • Query-sensitive version of BoostMap. • Better results in practice, in all datasets we have tried.

F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

F Rd original space X Ideal Embedding Behavior b a q For any query q: we want F(NN(q)) = NN(F(q)). For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b).

b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Any embedding F defines a classifier F’(q, a, b). • F’ checks if F(q) is closer to F(a) or to F(b).

b a q Classifier Definition For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Given embedding F: X  Rd: • F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. • F’(q, a, b) > 0 means “q is closer to a.” • F’(q, a, b) < 0 means “q is closer to b.”

F Rd original space X Key Observation b a q • If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)). • If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.

F Rd original space X Key Observation b a q • Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.

Optimization Criterion • Goal: construct an embedding F optimized for k-nearest neighbor retrieval. • Method: maximize accuracy of F’ on triples (q, a, b) of the following type: • q is any object. • a is a k-nearest neighbor of q in the database. • b is in database, but NOT a k-nearest neighbor of q. • If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors.

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate).

Lincoln Detroit LA Chicago New York Cleveland Chicago LA Detroit New York

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object.

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier?

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier? Answer: use AdaBoost. • AdaBoost is a machine learning method designed for exactly this problem.

Fn F2 F1 Using AdaBoost original space X Real line • Output: H = w1F’1 + w2F’2 + … + wdF’d . • AdaBoost chooses 1D embeddings and weighs them. • Goal: achieve low classification error. • AdaBoost trains on triples chosen from the database.

From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output What embedding should we use? What distance measure should we use?

From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)).

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure Claim: Let q be closer to a than to b. H misclassifies triple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

Learning Embeddings for Similarity-Based Retrieval

Learning Embeddings for Similarity-Based Retrieval

Presentation Transcript

Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval

Chapter 7 Similarity Based Retrieval

Learning More Powerful Test Statistics for Click-Based Retrieval Evaluation

Learning with Similarity Functions

Learning Near-Isometric Linear Embeddings

An Active Learning Framework for Content-Based Information Retrieval

Learning Near-Isometric Linear Embeddings

Learning Near-Isometric Linear Embeddings

A Similarity Retrieval System for Multimodal Functional Brain Images

Matching Similarity for Keyword - based Clustering

Learning Techniques for Information Retrieval

Feature Based Similarity

Feature Based Similarity

Genetic Learning for Information Retrieval

Similarity-based matching for face authentication

Neighborhood sequences for comparing similarity vectors in image retrieval

Content-Based Similarity Search

Feature Sets Based Similarity Measures for Image Retrieval

Similarity based deduplication

Adaptive tree similarity learning for image retrieval

Learning with Similarity Functions

Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval