1 / 38

Efficient Computation of Temporal Aggregates with Range Predicates

Efficient Computation of Temporal Aggregates with Range Predicates. D. Zhang * , A. Markowetz ** , V. J. Tsotras * , D. Gunopulos * and B. Seeger ** * University of California, Riverside ** Philipps Universit ä t Marburg, Germany. Outline. Introduction & Motivation Problem Decomposition

golda
Download Presentation

Efficient Computation of Temporal Aggregates with Range Predicates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Computation of TemporalAggregates with Range Predicates D. Zhang*, A. Markowetz**, V. J. Tsotras*, D. Gunopulos* and B. Seeger** * University of California, Riverside ** Philipps Universität Marburg, Germany

  2. Outline • Introduction & Motivation • Problem Decomposition • The MVSB-tree • Performance Results • Conclusions

  3. Introduction & Motivation • Consider a collection of temporal records. • Each record: key k , value v , time interval [t1 , t2]. • E.g.: employees and their salaries over time. • Temporal Aggregation: aggregate values over time. • Focus on SUM/COUNT/AVG. Introduction & Motivation

  4. Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

  5. Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] E.g. the sum at t2 is 13. ‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01]) Introduction & Motivation

  6. Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] E.g. the sum at t2 is 13. ‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01]) E.g. the sum over [t1 , t2] is 28. Introduction & Motivation

  7. Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’. E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19. Introduction & Motivation

  8. Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’. E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19. • Find AVG salary over past ten years of all employees whose last names start with ‘B’. Introduction & Motivation

  9. Previous approaches would need a separate index for each possible key range. (inefficient) • Alternative: • index the records; • selection query: ‘find all records intersecting [k1, k2]x [t1, t2]’. • Query time is O(n). • Our solution: O(logbn). Introduction & Motivation

  10. Problem Decomposition • Decompose RTA into LKST and LKLT queries. LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. E.g. LKST(k2, t2)=11. Problem Decomposition

  11. LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. E.g. LKLT(k2, t2)=20. Problem Decomposition

  12. - + = RTA([k1, k2]x[t1, t2]) Problem Decomposition

  13. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - + Problem Decomposition

  14. = RTA([k1, k2]x[t1, t2]) - LKST(k1, t2) - + Problem Decomposition

  15. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + Problem Decomposition

  16. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) Problem Decomposition

  17. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + - LKLT(k1, t2) Problem Decomposition

  18. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) Problem Decomposition

  19. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) Problem Decomposition

  20. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k1, t1) Problem Decomposition

  21. = RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) - LKLT(k1, t1) Problem Decomposition

  22. RTA([k1, k2]x[t1, t2]) = LKST(k2, t2) - LKST(k1, t2) + LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k2, t1) + LKLT(k1, t1) • The RTA query is decomposed to LKST and LKLT. Problem Decomposition

  23. Index Design • Both LKST and LKLT are point queries: ‘given k, t, return value’. • An index for LKST and LKLT should: • store points in key-time space; • maintain a value for each point; • support point queries. Index Design

  24. a record: at t1, inserted as: at t2, updated as: Model • Assume updates come in increasing time order (transaction-time model). Index Design

  25. at t1 at t2 The LKST index The effect of inserting record (k, [t1, t2], v): Index Design

  26. at t2 The LKLT index The effect of inserting record (k, [t1, t2], v): no update at t1 Index Design

  27. Update Operation • Common update operation for both: insert (k, t):v. • That is: add v to all points in [k, t] x [kmax, tmax]. • Conclusion: an index supporting point query and the above update can be used for LKLT and LKST. Index Design

  28. The MVSB-tree • A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree

  29. Insertion The MVSB-tree

  30. Insertion (cont.) • To handle overflow, copy records with end=tmax to a new page. The MVSB-tree

  31. copy root2: [4, tmax) root1: [1, 4) Insertion (cont.) • To handle overflow, copy records with end=tmax to a new page. • Strong overflow: limit the number of records in a new page. The MVSB-tree

  32. Point Query (k , t ) • Follows a single path: the nodes containing (k , t ). • Aggregates the values found in this path. The MVSB-tree

  33. Point Query (k , t ) • Follows a single path: the nodes containing (k , t ). • Aggregates the values found in this path. • E.g.: PointQuery(23, 7) = 5+2 = 7. The MVSB-tree

  34. Efficiency • Theorem: with 2 MVSBT indices, we achieve: • RTA query: O(logbn); • Update: O(logbK); • Space: O( * logbK). • n = number of updates; • K= number of different keys; • b = page capacity (in records). The MVSB-tree

  35. Performance Results • Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; • Datasets: created using the TimeIT [KS98] software and transformed to add record keys. • Each dataset has a million records (10k unique keys; on average 100 intervals per key). • Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results

  36. Index Sizes Performance Results

  37. Query Speedup • Query time is averaged over 100 queries of the same query rectangle size. Performance Results

  38. Conclusions • We addressed the range-temporal aggregation (RTA) problem; • New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; • Query time reduced from O(n) to O(logbn) with small space overhead; • Open problems: • Min/Max range-temporal aggregation; • Valid-time environment; • Multi-dimensional aggregation over objects with extents.

More Related