1 / 23

Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases

Data Management and Exploration Prof. Dr. Thomas Seidl. Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases. Jost Enderle , Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim. Outline.

Download Presentation

Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management and Exploration Prof. Dr. Thomas Seidl Efficiently Processing Querieson Interval-and-Value Tuplesin Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim

  2. Outline • Interval-and-Value (IaV) Data and Applications • Relational Interval Tree (RI-tree) • Managing Interval-and-Value Tuples Using RI-tree • Experimental Results Queries on Interval-and-Value Tuples in RDBs

  3. Interval-and-Value Data: Example Contracts table: storing period and budget of contracts CREATE TABLE contracts (// key:c_no VARCHAR(10),// simple-valued attribute:c_budget DECIMAL(10,2),// interval:c_period ROW ( c_start DATE, c_end DATE)) Queries on Interval-and-Value Tuples in RDBs

  4. Interval-and-Value Data: Query • Sample query on contracts table // Find all contractsSELECT c_no FROM contracts// within certain budget rangeWHERE c_budget BETWEEN 500 AND 2000 // running during certain time interval AND c_period OVERLAPS (DATE ‘2003-03-01’, DATE ‘2004-01-31’) • Special Cases of this general Range-Interval query: • Value-Interval Query // value range is a single point • Range-Stabbing Query // query interval is a single point • Value-Stabbing Query // both restrictions hold Queries on Interval-and-Value Tuples in RDBs

  5. No DB Motivation of Relational Indexing • Main Memory Structures • no persistency, no disk block structure • Secondary Storage Structures + persistency, high block-oriented efficiency • integration into DBMS kernel typically not supported (GiST?) • Relational Storage Structures + basic idea: don‘t extend, just use RDBMS (virtual storage machine) + sound formal fundament, little implementation effort + immediate industrial strength (availability, robustness, ACID, …) + high efficiency by exploiting built-in indexing structures (B+-tree) Disk SQL Queries on Interval-and-Value Tuples in RDBs

  6. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 C4 C3 C2 root = 2h-1 8 C1 4 12 34 52 132 2 6 10 14 52 154 13 73 101 151 1 3 5 7 9 11 13 15 4, 1, C3 8, 3, C4 8, 5, C2 12, 10, C1 4, 1, C3 8, 5, C2 12, 10, C1 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 8, 15, C4 12, 15, C1 12, 10, C1 4, 7, C3 8, 13, C2 12, 15, C1 8, 13, C2 12, 15, C1 12, 15, C1 Relational Interval Tree [Kriegel, Pötke, Seidl: VLDB 2000] based on [Edelsbrunner 1980] • Two relational indexes (B+-trees) store the interval bounds lowerIndex (node,start,id): upperIndex (node,end,id): • Supported by any RDBMS: No modification of built-in B+-trees • Optimal complexities for space, updates, and intersection queries Queries on Interval-and-Value Tuples in RDBs

  7. Single Interval Query Processing Two steps to process an interval query 1. Transform interval query into a set of range queries • The generated queries are collected in transient tables (no I/Os) 2. Perform a single SQL query • Join the transient query tables with the relational indexes start end Queries on Interval-and-Value Tuples in RDBs

  8. 48 16 8 40 24 56 32 4 36 12 44 20 52 60 28 54 22 58 26 2 34 6 38 42 10 14 46 18 50 62 30 1 33 35 3 37 5 39 7 9 41 11 43 45 13 47 15 49 17 51 19 21 53 23 55 57 25 27 59 29 61 31 63 Preprocessing: Generate Query Ranges start end • Generate a set of range queries for lowerIndex and upperIndex • At nodes left of start:report entries i with i.end ³ start (32,48,52) • At nodes right of end: report entries i with i.start £ end (56) • For nodes between start and end: report all entries (54 - 55) upperIndex 32 48 52 lowerIndex 54 to 55 56 Queries on Interval-and-Value Tuples in RDBs

  9. Processing by a Single SQL Query • Join transient query tables with B+-tree indexes SELECT idFROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :start UNION ALL SELECT idFROM lowerIndex AS i JOIN :rightQueries USING (node)WHERE i.start <= :end UNION ALL SELECT idFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :end • No duplicates are produced → UNION ALL • Blocked output of index range scans is guaranteed Queries on Interval-and-Value Tuples in RDBs

  10. Extending the RI-tree for IaV Support (1) • Add value predicate to RI-tree query SELECT id // lower subqueryFROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :startAND i.value BETWEEN :Value1 and :Value2 UNION ALL ... // upper subquery UNION ALL SELECT id // inner subqueryFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :endAND value BETWEEN :Value1 and :Value2 • Integrate simple value attribute into lower-/upperIndex • old schema: (node, bound, id) • new schema: ? → depends on type of query to support Queries on Interval-and-Value Tuples in RDBs

  11. Extending the RI-tree for IaV Support (2) • Viable schemas for new lower-/upperIndexes • (value, node, bound, id) • (node, value, bound, id) estimate access cost for each query type • (node, bound, value, id) • Observations (see paper for details): • Value queries best supported by (value, node, bound, id) index • simple attribute predicates = point queries • evaluation requires same number of disk accesses as original proceeding • Range Queries: choice of index not obvious • inner subquery of Range-Stabbing Queries best supported by(node, value, bound, id) • otherwise: depends on stored data and values of query variables • Question: Can Range Queries be further enhanced? Queries on Interval-and-Value Tuples in RDBs

  12. Improving Range Query Processing (1) • Problem of composite indexes for multiple attributes • queries may contain range predicates on two or more of the indexed attributes • tuples satisfying first predicate lie in contiguous disk area • tuples satisfying both/all predicates are scattered within this area • Common solution: using space-filling curves • mapping multi-dimensional data to one-dimensional values • similar values of original data are mapped on similar index data • ranges of indexed attributes will be found in adjacent disk areas • Application on RI-tree scenario • combining some attributes of lower-/upperIndex • depends on type of query to support Queries on Interval-and-Value Tuples in RDBs

  13. Improving Range Query Processing (2) Identifying viable schemas for new lower-/upperIndexes • find subqueries containing several range predicates • for Range Queries: lower and upper subqueries (bound, value) • for Range-Interval Queries:inner subquery (node, value) • combine respective attributes (x,y)within space-filling curve {x,y} • useful combinations forlower-/upperIndex: • (node, {value, bound}) • ({node, value}, bound) Queries on Interval-and-Value Tuples in RDBs

  14. Improving Range Query Processing (3) • Observations: • lower and upper subqueries of Range Queries will profit by a(node, {value, bound}) index • inner subquery of Range-Interval Queries will profit by a({node, value}, bound) index • Value Queries will not profit by “space-filling indexes” • Intermediate result • space-filling indexes can reduce disk accesses in certain cases • there is no “universal” index supporting all queries to the same extent • different subqueries will profit by different indexes Queries on Interval-and-Value Tuples in RDBs

  15. Employing index mixes Identifying best indexes for each query type • Value Queries: best supported by (value, node, bound, id) index • Range Queries: depends on data and space-filling curve (if used) • different subqueries best supported by different indexes • subqueries may be evaluated separately using best index • drawback: higher cost for index updates and storage requirements Queries on Interval-and-Value Tuples in RDBs

  16. Adapting the RI-tree Algorithms (1) Example: Evaluate a contracts query using „space-filling index“ Contracts table: • Node and Z-order value calculated for each tuple • B-tree index on (node, Z(budget, start), no) Queries on Interval-and-Value Tuples in RDBs

  17. Adapting the RI-tree Algorithms (2) Range-Interval Query: value range (1,12); interval (3,6) Evaluation ofupper subquerywith Z-order index Queries on Interval-and-Value Tuples in RDBs

  18. Access Cost with Varying Table Sizes Value-Stabbing Queries Value-Interval Queries Queries on Interval-and-Value Tuples in RDBs

  19. Access Cost with Varying Table Sizes Range-Stabbing Queries Range-Interval Queries Queries on Interval-and-Value Tuples in RDBs

  20. Access cost for varying length of ranges Stabbing Queries Interval Queries Queries on Interval-and-Value Tuples in RDBs

  21. Access cost for varying length of ranges Range Queries Queries on Interval-and-Value Tuples in RDBs

  22. Comparison with competing techniques Queries on Interval-and-Value Tuples in RDBs

  23. Conclusions • Processing Interval-and-Value Tuples in SQL databases • Extensions of the Relational Interval Tree • Various types of queries • Range vs. Value Queries • Interval vs. Stabbing Queries • Experiments demonstrate high performance • Future work: • Extend proposed techniques to more complex queries (joins) • Cost models to predict benefits for evolving query workload Queries on Interval-and-Value Tuples in RDBs

More Related