610 likes | 734 Views
MindReader introduces an innovative Query-by-Example (QBE) method for effectively querying databases using multiple examples from users. This approach enables users, such as doctors, to easily browse patient databases and select examples to refine their queries. By incorporating relevance feedback mechanisms, MindReader can adapt based on user preferences, intelligently guessing implied queries. Our presentation discusses the background, methodologies, experimental results, and the significant advancements made with this approach, highlighting its potential in improving data accessibility and accuracy.
E N D
MindReader:Querying databases through multipleexamples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh Supercomputing Center) Christos Faloutsos (Carnegie Mellon University)
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database : good : very good
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database Weight Height : good : very good
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database Weight Height : good : very good
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database • The examples have “oblique” correlation Weight Height : good : very good
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database • The examples have “oblique” correlation • We can “guess” the implied query Weight Height : good : very good
Query-by-Example: an example Searching “mildly overweighted” patients • The doctor selects examples by browsing patient database • The examples have “oblique” correlation • We can “guess” the implied query Weight q Height : good : very good
Query-by-Example: the question Assume that • user gives multiple examples • user optionally assigns scores to the examples • samples have spatial correlation
Query-by-Example: the question Assume that • user gives multiple examples • user optionally assigns scores to the examples • samples have spatial correlation How can we “guess” the implied query?
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
Our Approach • Automatically derive distance measure from the given examples • Two important notions: 1. diagonal query: isosurfaces of queries have ellipsoid shapes 2. multiple-level scores: user can specify “goodness scores” on samples
Isosurfaces of Distance Functions q q q generalized ellipsoid distance Euclidean weighted Euclidean
Distance Function Formulas • Euclidean D(x, q) = (x – q)2 • Weighted Euclidean D(x, q) = Simi(xi– qi)2 • Generalized ellipsoid distance D(x, q) = (x – q)TM (x – q)
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
Relevance Feedback • Popular method in IR • Query is modified based on relevance judgment from the user • Two major approaches 1. query-point movement 2. re-weighting
Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point Q0
Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point : retrieved data Q0
Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point : retrieved data : relevance judgments Q0
Relevance Feedback— Query-point Movement — • Query point is moved towards “good” examples — Rocchio’s formula in IR Q0: query point : retrieved data : relevance judgments Q1: new query point Q1 Q0
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 f1
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 f1
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 f1
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant f2 “good” feature f1 “bad” feature
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant • For each feature, weight wi = 1/si • is assigned f2 “good” feature f1 “bad” feature
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant • For each feature, weight wi = 1/si • is assigned Implied Query f2 “good” feature f1 “bad” feature
Relevance Feedback—Re-weighting — • Standard Deviation Method inMARS(UIUC) image retrieval system • Assumption: the deviation is high the feature is notimportant • For each feature, weight wi = 1/sj • is assigned • MARS didn’t provide any • justification for this formula Implied Query f2 “good” feature f1 “bad” feature
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
What’s New in MindReader? MindReader • does not use ad-hoc heuristics • cf. Rocchio’s expression, re-weighting in MARS • can handle multiple levels of scores • can derive generalized ellipsoid distance
What’s New in MindReader? MindReader can derive generalized ellipsoid distances q
Isosurfaces of Distance Functions q q q Euclidean weighted Euclidean generalized ellipsoid distance
Isosurfaces of Distance Functions q q q Euclidean Rocchio weighted Euclidean generalized ellipsoid distance
Isosurfaces of Distance Functions q q q Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance
Isosurfaces of Distance Functions q q q Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance MindReader
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
Method: distance function Generalized ellipsoid distance function • D(x, q) = (x – q)TM (x – q), or • D(x, q) = Sj Skmjk(xj – qj)(xk – qk) • q: query point vector • x: data point vector • M = [mjk]: symmetric distance matrix
Method: definitions • N: no. of samples • n: no. of dimensions (features) • xi: n-dsample data vectors xi= [xi1, …, xin]T • X: N×nsample data matrix X = [x1, …, xN]T • v:N-dscore vector v = [v1, …, vN]
Method: problem formulation Problem formulation Given • Nsample n-dvectors • multiple-level scores (optional) Estimate • optimaldistance matrixM • optimalnew query pointq
Method: optimality • How do we measure “optimality”? • minimization of “penalty” • What is the “penalty”? • weighted sum of distances between query point and sample vectors • Therefore, • minimizeSi (xi – q)TM (xi – q) • under the constraintdet(M) = 1
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
Theorems: theorem 1 • Solved with Lagrange multipliers • Theorem 1: optimal query point • q = x = [x1, …, xn]T= XTv / S vi • optimal query point is the weighted average of sample data vectors
Theorems: theorem 2 & 3 • Theorem 2: optimal distance matrix • M = (det(C))1/nC–1 • C = [cjk] is the weighted covariance matrix • cjk = S vi(xik - xk)(xij - xj) • Theorem 3 • If we restrictMto diagonal matrix, our method is equal to standard deviation method • MindReader includes MARS!
Outline • Background & Introduction • Query by Example • Our Approach • Relevance Feedback • What’s New in MindReader? • Proposed Method • Problem Formulation • Theorems • Experimental Results • Discussion & Conclusion
Experiments 1. Estimation of optimal distance function • Can MindReader estimate target distance matrixMhidden appropriately? • Based on synthetic data • Comparison with standard deviation method 2. Query-point movement 3. Application to real data sets • GIS data
Experiment 1: target data Two-dimensional normal distribution
Experiment 1: idea • Assume that the user has “hidden”distanceMhiddenin his mind • Simulate iterative query refinement • Q: How fast can we discover “hidden” distance? • Query point is fixed to (0, 0)
Experiment 1: iteration steps 1. Make initial samples: computek-NNs with Euclidean distance 2. For each object x, calculate itsscore that reflects the hidden distanceMhidden 3. MindReader estimates the matrixM 4. Retrieve k-NNs with the derived matrixM 5. If the result is improved, go to step 2