1 / 29

Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl

Database Group. A Cost Model for Interval Intersection Queries on RI-Trees. SSDBM 2002 , Edinburgh. Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl. Institute for Computer Science University of Munich, Germany. Outline of the Talk. 1. Introduction 2. RI-Tree 3. Cost Model

bond
Download Presentation

Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Group A Cost Model for Interval Intersection Queries on RI-Trees SSDBM 2002, Edinburgh Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl Institute for Computer Science University of Munich, Germany

  2. Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

  3. Box query Window query Interval query t • 3D Objects: • CAD documents • Digital mockup • Haptic rendering • … • 2D Objects: • Geographic data • VLSI design • Bitemporal data • … • 1D Objects: • Temporal data • Approximate values • Interval constraints • … t Extended Objects in Databases

  4. Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework maintenance query processing index_create() index_drop() index_insert() index_delete() index_update() index_open() index_fetch() index_close() Integration of Access Methods

  5. Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. User-defined Index Structure Relational Implementation Mapping to built-in indexes (B+-trees); SQL-based query processing Physical Implementation Block-Manager, Caches, Locking, Logging, … Integration of Access Methods Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework

  6. Extensible Optimization Framework Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. User-defined Index Structure optimization Relational Implementation Mapping to built-in indexes (B+-trees); SQL-based query processing stats_collect() stats_delete() predicate_sel() index_io_cost() Physical Implementation Block-Manager, Caches, Locking, Logging, … Integration of Access Methods Declarative Embedding Object-relational DML and DDL

  7. Extensible Optimization Framework Object-relational interface for selectivity estimation and cost prediction functions. User-defined Cost Model Relational Implementation Mapping to built-in statistics facilities; SQL-based evaluation of cost model Physical Implementation Block-Manager, Caches, Locking, Logging, … Integration of Access Methods Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. User-defined Index Structure Relational Implementation Mapping to built-in indexes (B+-trees); SQL-based query processing

  8. Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

  9. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 alice chris bob dave 8 3a 12c 15a 4 12 5c 15a 1b 7b 13d 13d 2 6 10 14 1 3 5 7 9 11 13 15 • Foundation: Interval Tree[Edelsbrunner 1980] • primary structure: binary search tree on possible endpoints • secondary structure: sorted lists of stored endpoints each interval is registered at exactly one node Relational Interval Tree (RI-Tree) [Kriegel, Pötke, Seidl VLDB 2000]

  10. root = 2h–1 8 4 12 2 6 10 14 8 1 3 5 7 9 11 13 15 4 12 2 6 10 14 1 3 5 7 9 11 13 15 1 2h – 1  first step: virtualize the primary structure • no materialization of the binary tree • storage cost O(1): parameter root • fixed data space: root = 2h–1 covers [1..2h – 1] RI-Tree: Virtual Primary Structure

  11. 8 12c 4 12 5c 15a 2 6 10 14 1 3 5 7 9 11 13 15 3a 15a 1b 7b 13d 13d node lower id node upper id 4 8 8 13 1 3 5 13 b a c d 4 8 8 13 7 12 15 13 b c a d lowerIndex (node,lower,id) upperIndex (node,upper,id) RI-Tree: Relational Secondary Structure  second step: manage secondary structure by two B+-trees • storage of n intervals:O(n/b) disk blocks of size b • insert and delete:O(logbn) disk block accesses in the indexes

  12. 16 = root h = 5 4 20 28 24 = fork 3 26 2 22 1 23 25 RI-Tree: Interval Intersection Query t 22 = lower upper = 25

  13. h = 5 4 3 2 1 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower RI-Tree: Interval Intersection Query t 22 = lower upper = 25 16 = root 20

  14. h = 5 4 3 2 1 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower • union all • select id fromupperIndex i • where i.node betweent.lowerandt.upper RI-Tree: Interval Intersection Query t 22 = lower upper = 25 24 = fork 22 23 25

  15. h = 5 4 3 2 1 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower • union all • select id fromupperIndex i • where i.node betweent.lowerandt.upper • union all • select id fromlowerIndex i, rightNodes right • where i.node = right.node and i.lower <= t.upper RI-Tree: Interval Intersection Query t 22 = lower upper = 25 28 26

  16. 16 = root h = 5 4 20 28 24 = fork 3 26 2 22 1 23 25 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower • union all • select id fromupperIndex i • where i.node betweent.lowerandt.upper • union all • select id fromlowerIndex i, rightNodes right • where i.node = right.node and i.lower <= t.upper RI-Tree: Interval Intersection Query t 22 = lower upper = 25  I/O complexity:O(h·logbn + r/b)

  17. Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

  18. t t t h = 5 root root root 4 3 2 1 upperIndex(node, upper, id) Gapsleft(t) lowerIndex(node, lower, id) Gapsright(t) B O( h·logbn + r/b ) outputI/O(T,t) = s (T,t)·B joinI/O(T,t) = I/O Cost Model for Interval Intersections T

  19. Quantile-based: • (equi-count histogram) analogously to rleft + better adaption to the data distribution + exploits built-in statistics of the ORDBMS Selectivity Estimation • Histogram-based: • (equi-width histogram) – replication of intervals intersection multiple buckets – statistics management requires user-defined code

  20. Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

  21. Experimental EvaluationDatasets UNI REAL

  22. Experimental Evaluation Computation of Statistics

  23. Experimental EvaluationSelectivity Estimation UNI REAL

  24. Experimental EvaluationSelectivity Estimation

  25. Experimental EvaluationCost Estimation UNI REAL

  26. Experimental EvaluationCost Estimation UNI REAL

  27. Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

  28. Conclusions and Future Work Conclusions: • Relational access methods:– employ an ORDBMS as virtual machine– extensible indexing and optimizing framework • Indexing extended objects:– Relational Interval Tree • Development of cost models:– estimation of selectivity and I/O cost Future Work: • Cost models:– general interval relationships– interval sequences

  29. ? ? ? Any questions? ? ? ? ?

More Related