learning embeddings for similarity based retrieval n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning Embeddings for Similarity-Based Retrieval PowerPoint Presentation
Download Presentation
Learning Embeddings for Similarity-Based Retrieval

Loading in 2 Seconds...

play fullscreen
1 / 92
judith-pruitt

Learning Embeddings for Similarity-Based Retrieval - PowerPoint PPT Presentation

148 Views
Download Presentation
Learning Embeddings for Similarity-Based Retrieval
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University

  2. Overview • Background on similarity-based retrieval and embeddings. • BoostMap. • Embedding optimization using machine learning. • Query-sensitive embeddings. • Ability to preserve non-metric structure.

  3. x1 x2 x3 xn Problem Definition database (n objects)

  4. x1 x2 x3 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. q

  5. x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. x2 q xn

  6. x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. q

  7. Nearest neighbor classification. Similarity-based retrieval. Image/video databases. Biological databases. Time series. Web pages. Browsing music or movie catalogs. faces letters/digits Applications handshapes

  8. Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 x3 y3 x4 y4 xd yd Expensive Distance Measures

  9. Comparing d-dimensional vectors is efficient: O(d) time. Comparing strings of length d with the edit distance is more expensive: O(d2) time. Reason: alignment. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures i m m i g r a t i o n i m i t a t i o n

  10. Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures • Comparing strings of length d with the edit distance is more expensive: • O(d2) time. • Reason: alignment. i m m i g r a t i o n i m i t a t i o n

  11. Matching Handwritten Digits

  12. Matching Handwritten Digits

  13. Matching Handwritten Digits

  14. Shape Context Distance • Proposed by Belongie et al. (2001). • Error rate: 0.63%, with database of 20,000 images. • Uses bipartite matching (cubic complexity!). • 22 minutes/object, heavily optimized. • Result preview: 5.2 seconds, 0.61% error rate.

  15. More Examples • DNA and protein sequences: • Smith-Waterman. • Time series: • Dynamic Time Warping. • Probability distributions: • Kullback-Leibler Distance. • These measures are non-Euclidean, sometimes non-metric.

  16. Indexing Problem • Vector indexing methods NOT applicable. • PCA. • R-trees, X-trees, SS-trees. • VA-files. • Locality Sensitive Hashing.

  17. Metric Methods • Pruning-based methods. • VP-trees, MVP-trees, M-trees, Slim-trees,… • Use triangle inequality for tree-based search. • Filtering methods. • AESA, LAESA… • Use the triangle inequality to compute upper/lower bounds of distances. • Suffer from curse of dimensionality. • Heuristic in non-metric spaces. • In many datasets, bad empirical performance.

  18. x1 x2 x3 xn x1 x2 x3 x4 xn Embeddings database Rd embedding F

  19. x1 x2 x3 xn x1 x2 x3 x4 xn q Embeddings database Rd embedding F query

  20. x1 x2 x3 xn x1 x2 x3 x4 xn q q Embeddings database Rd embedding F query

  21. x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). Embeddings database Rd embedding F query

  22. x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). • Caveat: the embedding must preserve similarity structure. Embeddings database Rd embedding F query

  23. Reference Object Embeddings database

  24. Reference Object Embeddings database r1 r2 r3

  25. Reference Object Embeddings database r1 r2 r3 x F(x) = (D(x, r1), D(x, r2), D(x, r3))

  26. F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

  27. Existing Embedding Methods • FastMap, MetricMap, SparseMap, Lipschitz embeddings. • Use distances to reference objects (prototypes). • Question: how do we directly optimize an embedding for nearest neighbor retrieval? • FastMap & MetricMap assume Euclidean properties. • SparseMap optimizes stress. • Large stress may be inevitable when embedding non-metric spaces into a metric space. • In practice often worse than random construction.

  28. BoostMap • BoostMap: A Method for Efficient Approximate Similarity Rankings.Athitsos, Alon, Sclaroff, and Kollios,CVPR 2004. • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios,PAMI 2007(to appear).

  29. Key Features of BoostMap • Maximizes amount of nearest neighbor structure preserved by the embedding. • Based on machine learning, not on geometric assumptions. • Principled optimization, even in non-metric spaces. • Can capture non-metric structure. • Query-sensitive version of BoostMap. • Better results in practice, in all datasets we have tried.

  30. F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

  31. F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

  32. F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

  33. F Rd original space X Ideal Embedding Behavior b a q For any query q: we want F(NN(q)) = NN(F(q)). For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b).

  34. b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

  35. b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Any embedding F defines a classifier F’(q, a, b). • F’ checks if F(q) is closer to F(a) or to F(b).

  36. b a q Classifier Definition For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Given embedding F: X  Rd: • F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. • F’(q, a, b) > 0 means “q is closer to a.” • F’(q, a, b) < 0 means “q is closer to b.”

  37. F Rd original space X Key Observation b a q • If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)). • If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.

  38. F Rd original space X Key Observation b a q • Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.

  39. Optimization Criterion • Goal: construct an embedding F optimized for k-nearest neighbor retrieval. • Method: maximize accuracy of F’ on triples (q, a, b) of the following type: • q is any object. • a is a k-nearest neighbor of q in the database. • b is in database, but NOT a k-nearest neighbor of q. • If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors.

  40. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate).

  41. Lincoln Detroit LA Chicago New York Cleveland Chicago LA Detroit New York

  42. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object.

  43. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier?

  44. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier? Answer: use AdaBoost. • AdaBoost is a machine learning method designed for exactly this problem.

  45. Fn F2 F1 Using AdaBoost original space X Real line • Output: H = w1F’1 + w2F’2 + … + wdF’d . • AdaBoost chooses 1D embeddings and weighs them. • Goal: achieve low classification error. • AdaBoost trains on triples chosen from the database.

  46. From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output What embedding should we use? What distance measure should we use?

  47. From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)).

  48. D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure

  49. D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure Claim: Let q be closer to a than to b. H misclassifies triple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

  50. i=1 i=1 i=1 d d d Proof H(q, a, b) = = wiF’i(q, a, b) = wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|) = (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)