1 / 34

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data. Thomas Bernecker, Tobias Emrich , Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle. Outline. Background Uncertain Data Model Reverse k-nearest neighbour queries

jaket
Download Presentation

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle

  2. Outline • Background • Uncertain Data Model • Reverse k-nearest neighbour queries • Reverse k-nearest neighbour queries on uncertain objects • Framework for Probabilistic RkNN Processing • Approximation • Spatial Filter • Probabilistic Filter • Verification • Evaluation + Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  3. Background DatamodelFramework RkNN QueriesSummary PRkNN Queries User ratingsfor „Life of Brian“ Uncertain Attribute a PDFX Action Uncertain Attribute b Humor • Objects are described by a multi-dimensional probability distribution • Object Independence Assumption • Queries are answered according to possible worlds semantic • Object PDFs can be spatially bounded • Continuous or discrete representation Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3

  4. Background DatamodelFramework RkNN QueriesSummary PRkNN Queries RkNN(q) = {o  DB | q  kNN(o)} o2 o1 • Whatisitgoodfor? • Market segmentation • Outlierdetection • Incrementalalgorithms • … o3 o4 o5 q o6 R1NN(q) = {o7} R2NN(q) = {o7, o5,o4} o7 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  5. Background DatamodelFramework RkNN QueriesSummary PRkNN Queries „Is O‘ R1NN of Q?“ O2 O‘ O1 Q Note: The queryobjectmaybe uncertain.as well! Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  6. Background DatamodelFramework RkNN QueriesSummary PRkNN Queries „Is O‘ R1NN of Q?“ => In some worlds it is O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  7. Background DatamodelFramework RkNN QueriesSummary PRkNN Queries „Is O‘ R1NN of Q?“ => In other worlds it is not O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  8. Background DatamodelFramework RkNN QueriesSummary PRkNN Queries Definition of Probabilistic RkNN PRkNN(Q, τ) = {O  DB | P(O  RkNN(Q)) ≥ τ} {O  DB | P(Q  kNN(O)) ≥ τ} O2 O‘ P(Q  1NN(O‘)) = 21/24 e.g. O‘  PR1NN(Q, 0.5) O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  9. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification Framework for PRkNN query processing • Approximation (Indexing) • Simplification of spatial-probabilistic keys • Spatial Filter • Filter objects according to simple spatial keys • Probabilistic Filter • Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) • Filter objects according to lower/upper probability bounds • Verification • Computation of the exact probability (very expensive) • Monte-Carlo Sampling (many samples required) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  10. Background ApproximationFramework Spatial FilterSummary Probabilistic Filter Verification R*-Tree for indexing objects (global index) Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  11. Background ApproximationFramework Spatial FilterSummary Probabilistic Filter Verification AR*-Tree for indexing instances (local index) 0.3 0.15 1.0 0.15 0.15 0.25 0.15 0.1 0.1 0.2 0.45 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  12. BackgroundApproximationFrameworkSpatial FilterSummary Probabilistic Filter Verification Pruning based on rectangular approximations only [1]. [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 For any O‘ intersecting this region, Q may possibly be closer than O. For any O‘ in this region, O is closer than Q. Task Find k objects O  DB\O‘ which are closer to O‘ than to Q O Q B For any O‘ in this region, O is not closer than Q. Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  13. BackgroundApproximationFrameworkSpatial FilterSummary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? O Q O‘ B Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  14. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at least x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  15. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at most x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  16. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • Exemplary statements • O1 is closer to O’ with at least 20% and at most 50% • O2is closer to O’ with at least 60% and at most 80% • Correctly deriving these bounds is not trivial (see paper) • How many objects O  DB are closer to O‘ than Q? • Consider the following uncertain generating function • x-term: probability of the object to be closer to O’ than Q • z-term: probability of the object to be further from O’ than Q • y-term: uncertainty • => (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z) • Expansion yields0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  17. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  18. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  19. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  20. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  21. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  22. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  23. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • Example PRkNN queries • PR1NN (Q, 50%)  O‘ is not part of the result • PR2NN (Q, 40%)  O‘ is part of the result • PR2NN (Q, 80%)  O‘ has to be further investigated 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O  DB that are closer to O‘ than Q 0 1 2 Maximum # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  24. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O  DB that are closer to O‘ than Q 0 1 2 Maximum # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data • Example PRkNN queries • PR1NN (Q, 50%)  O‘ is not part of the result • PR2NN (Q, 40%)  O‘ is part of the result • PR2NN (Q, 80%)  O‘ has to be further investigated 24

  25. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O  DB that are closer to O‘ than Q 0 1 2 Maximum # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data • Example PRkNN queries • PR1NN (Q, 50%)  O‘ is not part of the result • PR2NN (Q, 40%)  O‘ is part of the result • PR2NN (Q, 80%)  O‘ has to be further investigated 25

  26. BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O  DB that are closer to O‘ than Q 0 1 2 Maximum # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data • Example PRkNN queries • PR1NN (Q, 50%)  O‘ is not part of the result • PR2NN (Q, 40%)  O‘ is part of the result • PR2NN (Q, 80%)  O‘ has to be further investigated 26

  27. BackgroundApproximationFramework Spatial FilterSummary Probabilistic FilterVerification Options for Verification • Consideration of all possible worlds (exponential) • Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial) • Monte-Carlo based (linear in the number of samples) [2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  28. Background EvaluationFramework ConclusionSummary Spatial Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  29. Background EvaluationFramework ConclusionSummary Probabilitsic Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  30. Background EvaluationFramework ConclusionSummary Comparison to other algorithms

  31. BackgroundEvaluationFramework ConclusionSummary • Framework for PRkNN query processing • Deriving probabilistic pruning bounds for single objects • Accumulate theses bounds using uncertain generating functions • Cost model for choosing the optimal value for tree depth • Comparison to existing algorithms for PRNN processing

  32. Thanks! • Questions? Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

  33. Dependency on k

  34. Problem of dependency O’ O1, O2 Q

More Related