1 / 61

MindReader: Querying databases through multiple examples

MindReader: Querying databases through multiple examples. Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh Supercomputing Center) Christos Faloutsos (Carnegie Mellon University). Outline. Background & Introduction Query by Example

Download Presentation

MindReader: Querying databases through multiple examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MindReader:Querying databases through multipleexamples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh Supercomputing Center) Christos Faloutsos (Carnegie Mellon University)

  2. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  3. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database

  4. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database : good : very good

  5. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database Weight Height : good : very good

  6. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database Weight Height : good : very good

  7. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database • The examples have “oblique” correlation Weight Height : good : very good

  8. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database • The examples have “oblique” correlation • We can “guess” the implied query Weight Height : good : very good

  9. Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database • The examples have “oblique” correlation • We can “guess” the implied query Weight q Height : good : very good

  10. Query-by-Example: the question Assume that • user gives multiple examples • user optionally assigns scores to the examples • samples have spatial correlation

  11. Query-by-Example: the question Assume that • user gives multiple examples • user optionally assigns scores to the examples • samples have spatial correlation How can we “guess” the implied query?

  12. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  13. Our Approach • Automatically derive distance measure from the given examples • Two important notions: 1. diagonal query: isosurfaces of queries have ellipsoid shapes 2. multiple-level scores: user can specify “goodness scores” on samples

  14. Isosurfaces of Distance Functions q q q generalized ellipsoid distance Euclidean weighted Euclidean

  15. Distance Function Formulas • Euclidean D(x, q) = (x – q)2 • Weighted Euclidean D(x, q) = Simi(xi– qi)2 • Generalized ellipsoid distance D(x, q) = (x – q)TM (x – q)

  16. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  17. Relevance Feedback • Popular method in IR • Query is modified based on relevance judgment from the user • Two major approaches 1. query-point movement 2. re-weighting

  18. Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point Q0

  19. Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point : retrieved data Q0

  20. Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point : retrieved data : relevance judgments Q0

  21. Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point : retrieved data : relevance judgments Q1: new query point Q1 Q0

  22. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system

  23. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant

  24. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 f1

  25. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 f1

  26. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 f1

  27. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 “good” feature f1 “bad” feature

  28. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant • For each feature, weight wi = 1/si • is assigned f2 “good” feature f1 “bad” feature

  29. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant • For each feature, weight wi = 1/si • is assigned Implied Query f2 “good” feature f1 “bad” feature

  30. Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant • For each feature, weight wi = 1/sj • is assigned • MARS didn’t provide any • justification for this formula Implied Query f2 “good” feature f1 “bad” feature

  31. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  32. What’s New in MindReader? MindReader • does not use ad-hoc heuristics • cf. Rocchio’s expression, re-weighting in MARS • can handle multiple levels of scores • can derive generalized ellipsoid distance

  33. What’s New in MindReader? MindReader can derive generalized ellipsoid distances q

  34. Isosurfaces of Distance Functions q q q Euclidean weighted Euclidean generalized ellipsoid distance

  35. Isosurfaces of Distance Functions q q q Euclidean Rocchio weighted Euclidean generalized ellipsoid distance

  36. Isosurfaces of Distance Functions q q q Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance

  37. Isosurfaces of Distance Functions q q q Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance MindReader

  38. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  39. Method: distance function Generalized ellipsoid distance function • D(x, q) = (x – q)TM (x – q), or • D(x, q) = Sj Skmjk(xj – qj)(xk – qk) • q: query point vector • x: data point vector • M = [mjk]: symmetric distance matrix

  40. Method: definitions • N: no. of samples • n: no. of dimensions (features) • xi: n-dsample data vectors xi= [xi1, …, xin]T • X: N×nsample data matrix X = [x1, …, xN]T • v:N-dscore vector v = [v1, …, vN]

  41. Method: problem formulation Problem formulation Given • Nsample n-dvectors • multiple-level scores (optional) Estimate • optimaldistance matrixM • optimalnew query pointq

  42. Method: optimality • How do we measure “optimality”? • minimization of “penalty” • What is the “penalty”? • weighted sum of distances between query point and sample vectors • Therefore, • minimizeSi (xi – q)TM (xi – q) • under the constraintdet(M) = 1

  43. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  44. Theorems: theorem 1 • Solved with Lagrange multipliers • Theorem 1: optimal query point • q = x = [x1, …, xn]T= XTv / S vi • optimal query point is the weighted average of sample data vectors

  45. Theorems: theorem 2 & 3 • Theorem 2: optimal distance matrix • M = (det(C))1/nC–1 • C = [cjk] is the weighted covariance matrix • cjk = S vi(xik - xk)(xij - xj) • Theorem 3 • If we restrictMto diagonal matrix, our method is equal to standard deviation method • MindReader includes MARS!

  46. Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion

  47. Experiments 1. Estimation of optimal distance function • Can MindReader estimate target distance matrixMhidden appropriately? • Based on synthetic data • Comparison with standard deviation method 2. Query-point movement 3. Application to real data sets • GIS data

  48. Experiment 1: target data Two-dimensional normal distribution

  49. Experiment 1: idea • Assume that the user has “hidden”distanceMhiddenin his mind • Simulate iterative query refinement • Q: How fast can we discover “hidden” distance? • Query point is fixed to (0, 0)

  50. Experiment 1: iteration steps 1. Make initial samples: computek-NNs with Euclidean distance 2. For each object x, calculate itsscore that reflects the hidden distanceMhidden 3. MindReader estimates the matrixM 4. Retrieve k-NNs with the derived matrixM 5. If the result is improved, go to step 2

More Related