270 likes | 386 Views
Index Based Processing of Semi-Restrictive Temporal Joins. Donghui Zhang, Vassilis J. Tsotras University of California, Riverside. Contents. Background Join problem definition Straightforward approaches Proposed join algorithms Performance study Conclusions. Background.
E N D
Index Based Processing of Semi-Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside TIME 2002, Manchester, UK
Contents • Background • Join problem definition • Straightforward approaches • Proposed join algorithms • Performance study • Conclusions TIME 2002, Manchester, UK
Background • Temporal record: (key, time interval) and some attributes. • TE-Join: two records qualify for join if • their time intervals intersect; and • their keys are equal. TIME 2002, Manchester, UK
Background • Our earlier work [ICDE02] solved a general TE-Join (GTE-Join), where portions from each relation are joined: • the portion is selected via a range-interval selection: record keys should be in range r and time intervals should intersect interval i. • interesting because (1) temporal relations are large; (2) TE-Join is a special case, when r and i are (-, +). TIME 2002, Manchester, UK
Problem Definition • Semi-restrictive joins: records join if their keys are equal (GE-Join), or their intervals intersect (GT-join), but not both. • GE-Join: select a subset from X, a subset from Y, and join records from the subsets if their keys are equal. • GT-Join: select a subset from X, a subset from Y, and join records from the subsets if their intervals intersect. TIME 2002, Manchester, UK
Problem Definition • GT-Join example: find employees whose last names start with ‘B’ and who co-worked during 1995 with the employees whose last names start with ‘S’. • GE-Join example: find the 1998 IBM employees who were UC Riverside students in 1995. TIME 2002, Manchester, UK
GT-Join Solutions... TIME 2002, Manchester, UK
Straightforward Solutions for GT-Join • Unsynchronized join. • Synchronized join using B+-trees. • Synchronized join using R-trees. TIME 2002, Manchester, UK
Straightforward Solutions for GT-Join • Unsynchronized join: separate the selection and join phases; not efficient because: • storing the intermediate result can be large; • selection in one relation ignores data distribution of the other relation. TIME 2002, Manchester, UK
Straightforward Solutions for GT-Join • Synchronized using B+-trees. • If cluster on start: • Not efficient: y needs to be checked against every record whose start is before end of y. • Cluster on end is similar. TIME 2002, Manchester, UK
Straightforward Solutions for GT-Join • Synchronized using R-trees. • Store each record as a two-dimensional interval in the R-tree; • Use existing R-tree join algorithms [BKS93, HJR97]; • Modifications: (1) integrate the selection condition; (2) join index records as long as they intersect in time dimension and ignore key dimension. • However, not efficient since R-trees do not handle long intervals well. TIME 2002, Manchester, UK
Our Solutions • Synchronized join using temporal indices. • Multi-version B+-tree (MVBT) [BGO+96]: asymptotically optimal space, update, query. • We propose three synchronized, MVBT-based join algorithms. (apply to other temporal indices as well) TIME 2002, Manchester, UK
Review of MVBT • A “forest” of trees: different trees may overlap. • Root nodes correspond to contiguous, non-intersecting time intervals. • A record may be stored in multiple pages. • Efficient range-interval selection algorithms. TIME 2002, Manchester, UK
Top-down GT-Join • Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT). • STT for two trees: • Note that special care is needed to avoid duplicates, since a record has multiple copies. • initially, join root nodes; • to join two nodes, join their children; • eventually, join elements in leaf pages. TIME 2002, Manchester, UK
Link-based GT-Join • find pairs of data pages that (1) intersect with the right border of the query rectangle; and (2) intersect with each other in time dimension; • keep such pairs in priority queue; • sweep left synchronously. • In each leaf page, store a pointer to its predecessor. • For GT-Join: D TIME 2002, Manchester, UK
Plane Sweep GT-Join • Similar to link-based. • Maintain two priority queues, one for each MVBT. • At each step, access the leaf page with the largest end time and add records to buffer. • To add records to buffer, join with existing records from the other MVBT. • Throw away useless records. TIME 2002, Manchester, UK
GE-Join Solutions... TIME 2002, Manchester, UK
GE-Join Solutions... Similarly, we have: • unsynchronized • synchronized using B+-trees • synchronized using R-trees • top-down using MVBT • link-based using MVBT Note: some of them, especially the link-based algorithm, are quite different due to different join condition. TIME 2002, Manchester, UK
Implemented Algorithms Common to both GT-Join and GE-Join: TIME 2002, Manchester, UK
Implemented Algorithms Specific to GT-Join: Specific to GE-Join: TIME 2002, Manchester, UK
Experimental Setup • Implemented in GNU C++. • Sun Enterprise 250 Server machine with two UltraSPARC-II processors using Solaris 2.8. • Page size = 8KB. • Buffer size = 10MB; LRU buffer. • Each data set: 10 million records. • R/I ratio: length of query key range divided by length of query time interval. It describes the shape of query rectangle. TIME 2002, Manchester, UK
GT-Join Performance R/I ratio = 10. TIME 2002, Manchester, UK
GT-Join Performance R/I ratio = 0.1. TIME 2002, Manchester, UK
GE-Join Performance R/I ratio = 10. TIME 2002, Manchester, UK
GE-Join Performance R/I ratio = 0.1. TIME 2002, Manchester, UK
Conclusions • We addressed index-based GT-Join and GE-Join. • Joins using traditional indices (B+-tree, R-tree) are not efficient. • We proposed various synchronized approaches based on temporal indices (MVBT). • Experiments: • for GT-Join, link-based and plane-sweep are the best; • for GE-Join, link-based and sort-merge are the best; • overall, link-based is the best: multi-fold improvement over B+-tree/R-tree joins. TIME 2002, Manchester, UK
Thank you! TIME 2002, Manchester, UK