1 / 25

Approximate Selection Queries over Imprecise Data

Approximate Selection Queries over Imprecise Data. Iosif Lazaridis and Sharad Mehrotra University of California, Irvine ICDE Conference, March 2004 Boston, MA, USA. Talk Outline. Regular vs. Approximate Selection Queries Quality-Aware Queries (QaQs) Optimization for QaQs Performance Study

jase
Download Presentation

Approximate Selection Queries over Imprecise Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Selection Queries over Imprecise Data Iosif Lazaridis and Sharad Mehrotra University of California, Irvine ICDE Conference, March 2004 Boston, MA, USA

  2. Talk Outline • Regular vs. Approximate Selection Queries • Quality-Aware Queries (QaQs) • Optimization for QaQs • Performance Study • Conclusions

  3. Exact set E σλT Selection predicate λ Regular Selection Queries Set of precise objects T

  4. x(t) (x, y) a x b Imprecise Objects • An imprecise object o corresponds to a precise object ωo which can be retrieved (at cost) via a probe operation

  5. Approximate answer A σλT ? ? Selection predicate λ Approximate Selection Queries Set of imprecise objects T

  6. Formal Problem Setting • Let T be a set of imprecise objects • Let λbe a selection predicate which maps an imprecise object to set {YES, NO, MAYBE} • The exact set is: E = {ωo | oT  λ(ωo)=YES} • The goal is to produce an approximate answer A with associated “quality guarantees” • A will potentially contain both precise and imprecise objects

  7. Quality Metrics • Set-based quality • Precision: fraction of objects in A that are also in E p = |A E | / |A| • Recall: fraction of objects in E that are also in A r =|A E | / |E| • Value-Based Quality • Each imprecise object o has laxity l(o) • Each precise object ωo has laxity 0 • Answer Laxity lmax = maxxAl(x)

  8. Total Set T M N Y Mns Ms Answer A AMs AY AY pG = AMs + AY AY rG = + Y Mns Ms- A Quality Guarantees Laxity Guarantee is: lmax = maxxAl(x)

  9. Quality-Aware Query (QaQ) • Input consists of: • Set T • Predicate λ • Quality Requirements pq, rq, lqmax • Answer A should be such that: pG  pq, rG  rq and lqmax  lmax

  10. QaQ Selection Operator • Requires O(1) memory/processing per input object • Each object o is read, and λ(o) is evaluated • Three choices for each object o: • Forward it to A • Ignore it • Probe it, get ωo then Forward or Ignoreωo

  11. Handling Objects • Ignore NO objects • YES objects • If l(o) > lqmaxProbe or Ignore • Else Forward • MAYBE objects • If l(o) >lqmaxProbe or Ignore • Else all three choices are feasible

  12. Ensuring Correctness • No object with laxity l(o)>lqmax may be forwarded • The precision guarantee pGmay not be lower than pq • If no other YES objects remain to be seen, then pq will be violated • If |AY| / (|Y |+|Ms-A|) < rq then an object o cannot be ignored • If no other YES objects remain to be seen, then rqwill be violated

  13. QaQ Evaluation Cost • R: number of objects read (R |T|) • Y, M: number of objects that were YES/MAYBE at the input • Yf, Yp: number of YES objects that are forwarded/probed (Yf+Yp  Y) • Mf, Mp: number of MAYBE objects that are forwarded/probed (Mf+Mp  M) • Mpy: number of probed MAYBE objects that become YES Cost W = Rcr + (Yp+Mp)cp + (Yf+Mf)cwi+(Yp+Mpy)cwp read probe write

  14. lqmax 7 Forward The “Map” NO MAYBE YES Probe with probability ppy or Ignore l(o) 1 2 3 6 Probe Ignore s5 s3 4 5 Forward with probability pfm or Ignore Probe s(o)=0 0<s(o)<1 s(o)=1 s(o): probability MAYBEYES

  15. Query Optimization • Free parameters ppy, s3, s5, pfm • Estimate # of YES, NO, MAYBE objects • Estimate # of YES, MAYBE objects above lqmaxlaxity requirement • Requires some knowledge of distribution of l(o) • Distribution of s(o) • Minimize cost W subject to pq, rq, lqmax • 4-parameter optimization problem

  16. Query Evaluation • Get selectivity estimates • Solve optimization problem for ppy, s3, s5, pfm, thus instantiating the “Map” • Read one object at a time, handle it according to the “Map” • Make sure correctness criteria are enforced! • Finish when rG  rq

  17. Performance Study • Size of input |T| = 10,000 • Laxity ranges in [0,100] • Probe cost = 100 x read/write unit cost. • We vary: • Precision, Recall, Laxity Requirement • Query selectivity • Input Uncertainty (ratio of YES/MAYBE objects) • Costs are normalized by dividing with |T |

  18. Competing Algorithms • We devised two simple heuristics: • STINGY avoids probes: it ignores MAYBE objects and objects exceeding the lqmax threshold. • STINGY is conservative, but sometimes it is forced to probe to meet the quality guarantees. • GREEDY forwards all MAYBE objects and probes all objects that exceed the lqmax threshold. • GREEDY tries to produce the result quickly by not ignoring objects, but sometimes it uses too many probes and forwards too many objects

  19. Varying Laxity • Input has 20% YES, 20% MAYBE objects • 90% Precision and 50% Recall is requested • As the laxity requirement becomes looser, the cost is reduced since imprecise objects can be forwarded without a probe

  20. Varying Precision • Input has 20% YES, 20% MAYBE objects • 50% Recall and laxity=50 is requested • Cost increases as Precision requirement increases, as objects can’t be forwarded unprobed

  21. Varying Recall • Input has 20% YES, 20% MAYBE objects • 90% Precision and laxity=50 is requested • Cost increases as Recall requirement increases • When Recall requirement is low, only part of the input needs to be read • As Recall requirement tends to 100%, all the input must be read and no objects can be ignored

  22. Varying Selectivity • Input has 20% YES, 20% MAYBE objects • 90% Precision, 50% Recall, and laxity=50 is requested • Cost increases as selectivity increases, since more objects need to be output

  23. Varying Input Uncertainty • Input has 20% YES, 20% MAYBE objects • 90% Precision, 50% Recall, and laxity=50 is requested • When MAYBE objects are few, no probe cost needs to be paid: the few MAYBE objects can be ignored • When MAYBE objects are many, they cannot be ignored (Recall might be violated), or forwarded (Precision violated). Hence, they are probed, increasing the cost

  24. Conclusions • Quality-Aware Queries (QaQs) • Query: predicate + quality requirement • Response: answer + quality guarantee • Quality Metrics for Set-Based Answers • On-line algorithm for evaluating QaQs • Works better than simple heuristics • Takes into account input characteristics/user requirements • Combines data read/write + probing cost • Future Work: • Indexes, Joins

  25. Thank You! ?????

More Related