Learning Embeddings for Similarity-Based Retrieval

1 / 92

# Learning Embeddings for Similarity-Based Retrieval - PowerPoint PPT Presentation

##### Learning Embeddings for Similarity-Based Retrieval

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University

2. Overview • Background on similarity-based retrieval and embeddings. • BoostMap. • Embedding optimization using machine learning. • Query-sensitive embeddings. • Ability to preserve non-metric structure.

3. x1 x2 x3 xn Problem Definition database (n objects)

4. x1 x2 x3 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. q

5. x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. x2 q xn

6. x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. q

7. Nearest neighbor classification. Similarity-based retrieval. Image/video databases. Biological databases. Time series. Web pages. Browsing music or movie catalogs. faces letters/digits Applications handshapes

8. Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 x3 y3 x4 y4 xd yd Expensive Distance Measures

9. Comparing d-dimensional vectors is efficient: O(d) time. Comparing strings of length d with the edit distance is more expensive: O(d2) time. Reason: alignment. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures i m m i g r a t i o n i m i t a t i o n

10. Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures • Comparing strings of length d with the edit distance is more expensive: • O(d2) time. • Reason: alignment. i m m i g r a t i o n i m i t a t i o n

11. Matching Handwritten Digits

12. Matching Handwritten Digits

13. Matching Handwritten Digits

14. Shape Context Distance • Proposed by Belongie et al. (2001). • Error rate: 0.63%, with database of 20,000 images. • Uses bipartite matching (cubic complexity!). • 22 minutes/object, heavily optimized. • Result preview: 5.2 seconds, 0.61% error rate.

15. More Examples • DNA and protein sequences: • Smith-Waterman. • Time series: • Dynamic Time Warping. • Probability distributions: • Kullback-Leibler Distance. • These measures are non-Euclidean, sometimes non-metric.

16. Indexing Problem • Vector indexing methods NOT applicable. • PCA. • R-trees, X-trees, SS-trees. • VA-files. • Locality Sensitive Hashing.

17. Metric Methods • Pruning-based methods. • VP-trees, MVP-trees, M-trees, Slim-trees,… • Use triangle inequality for tree-based search. • Filtering methods. • AESA, LAESA… • Use the triangle inequality to compute upper/lower bounds of distances. • Suffer from curse of dimensionality. • Heuristic in non-metric spaces. • In many datasets, bad empirical performance.

18. x1 x2 x3 xn x1 x2 x3 x4 xn Embeddings database Rd embedding F

19. x1 x2 x3 xn x1 x2 x3 x4 xn q Embeddings database Rd embedding F query

20. x1 x2 x3 xn x1 x2 x3 x4 xn q q Embeddings database Rd embedding F query

21. x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). Embeddings database Rd embedding F query

22. x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). • Caveat: the embedding must preserve similarity structure. Embeddings database Rd embedding F query

23. Reference Object Embeddings database

24. Reference Object Embeddings database r1 r2 r3

25. Reference Object Embeddings database r1 r2 r3 x F(x) = (D(x, r1), D(x, r2), D(x, r3))

26. F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

27. Existing Embedding Methods • FastMap, MetricMap, SparseMap, Lipschitz embeddings. • Use distances to reference objects (prototypes). • Question: how do we directly optimize an embedding for nearest neighbor retrieval? • FastMap & MetricMap assume Euclidean properties. • SparseMap optimizes stress. • Large stress may be inevitable when embedding non-metric spaces into a metric space. • In practice often worse than random construction.

28. BoostMap • BoostMap: A Method for Efficient Approximate Similarity Rankings.Athitsos, Alon, Sclaroff, and Kollios,CVPR 2004. • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios,PAMI 2007(to appear).

29. Key Features of BoostMap • Maximizes amount of nearest neighbor structure preserved by the embedding. • Based on machine learning, not on geometric assumptions. • Principled optimization, even in non-metric spaces. • Can capture non-metric structure. • Query-sensitive version of BoostMap. • Better results in practice, in all datasets we have tried.

30. F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

31. F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

32. F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

33. F Rd original space X Ideal Embedding Behavior b a q For any query q: we want F(NN(q)) = NN(F(q)). For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b).

34. b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

35. b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Any embedding F defines a classifier F’(q, a, b). • F’ checks if F(q) is closer to F(a) or to F(b).

36. b a q Classifier Definition For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Given embedding F: X  Rd: • F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. • F’(q, a, b) > 0 means “q is closer to a.” • F’(q, a, b) < 0 means “q is closer to b.”

37. F Rd original space X Key Observation b a q • If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)). • If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.

38. F Rd original space X Key Observation b a q • Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.

39. Optimization Criterion • Goal: construct an embedding F optimized for k-nearest neighbor retrieval. • Method: maximize accuracy of F’ on triples (q, a, b) of the following type: • q is any object. • a is a k-nearest neighbor of q in the database. • b is in database, but NOT a k-nearest neighbor of q. • If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors.

40. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate).

41. Lincoln Detroit LA Chicago New York Cleveland Chicago LA Detroit New York

42. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object.

43. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier?

44. 1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier? Answer: use AdaBoost. • AdaBoost is a machine learning method designed for exactly this problem.

45. Fn F2 F1 Using AdaBoost original space X Real line • Output: H = w1F’1 + w2F’2 + … + wdF’d . • AdaBoost chooses 1D embeddings and weighs them. • Goal: achieve low classification error. • AdaBoost trains on triples chosen from the database.

46. From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output What embedding should we use? What distance measure should we use?

47. From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)).

48. D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure

49. D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure Claim: Let q be closer to a than to b. H misclassifies triple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

50. i=1 i=1 i=1 d d d Proof H(q, a, b) = = wiF’i(q, a, b) = wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|) = (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|) = D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)