1 / 29

Evaluating Window Joins over Punctuated Streams

Evaluating Window Joins over Punctuated Streams. Many slides taken from talk by Luping Ding and Elke A. Rundensteiner, CIKM04 Database Systems Research Group Worcester Polytechnic Institute. Stream Data Processing. Online Transaction Management. Sensor Network Monitoring.

Download Presentation

Evaluating Window Joins over Punctuated Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Window Joins over Punctuated Streams Many slides taken from talk by Luping Ding and Elke A. Rundensteiner, CIKM04 Database Systems Research Group Worcester Polytechnic Institute CIKM'04

  2. Stream Data Processing • Online Transaction Management • Sensor Network Monitoring • Network Usage Analysis • Online Auction Register Continuous Queries Streaming Data Stream Query Engine Streaming Result CIKM'04

  3. New Challenges in Stream Context • Potentially infinite data streams vs. stateful operators. e.g., join, distinct, … • Problem: potentially unbounded state • Reason: no hint on which data is no longer useful CIKM'04

  4. Example -Symmetric Hash Join [WA93] • Memory overflow resolution – state relocation • Example: XJoin [UF00], Hash-Merge Join [MLA04] • Problems • Join state still grows with no bound • Delivery of some join results may be highly deferred Memory Overflow Memory SA SB probe insert A B CIKM'04

  5. Avoiding Unbounded State • Solution: exploit constraints to detect no-longer-useful data • Sliding window [MWA+03] • Identify a bounded set of input data based on time • K-constraint [BW03] • Models clustered or ordered data arrival pattern • Punctuation [TMSF03] • Dynamically announce termination of certain value CIKM'04

  6. Sliding Window [KNV03] Wa Wb … … Timeline Stream A Stream B CIKM'04

  7. Punctuation • Meta-knowledge embedded inside data streams • An ordered set of patterns corresponding to attributes of tuples • Wildcard (*), constant (9), list ({1,2,3}), range ([1, 20]), empty () • Semantics: tuples after a punctuation p will NOT match p … Bid 180 Marlie 820.00 Nov-13-03 11:02:00 No more tuple will contain Item_id 180. 182 Ultrasale 1000.00 Nov-13-03 11:05:00 180 Jocelyn 850.00 Nov-13-03 11:14:00 180 * * * 181 pcfan 50.00 Nov-13-03 11:36:00 … CIKM'04

  8. Punctuation-Aware Join [DMR+04] A B A C 1 200.00 Joinitem_id SA 2 63.00 SB … … 175 175 80.00 80.00 175 175 100.00 100.00 … … No more tuple will have A = 175. 175 * 181 50.00 180 135.00 175 175 20.00 20.00 158 310.00 Stream A Stream B … … … … CIKM'04

  9. Features of Punctuation • Purge rule. For any tuple ta from stream A, if there exists a punctuation Pb that has already been received from stream B such that match (ta, ,,Pb), ta will not be joining with any future arriving tuples from stream B. ta doesn’t need to be maintained in the A state after being processed. • Propagation rule. The join operator can also propagate punctuations to the output stream in order to help downstream operators. CIKM'04

  10. Based on punctuation semantics, we derive the following theorem as the foundation of our punctuation propagation algorithm. • Theorem 3.1. Let pa and pb be punctuations retrieved from streams A and B at time TSa and TSb respectively specifying the same punctuated value val of join attribute att. Then no output tuples with val being the value of attribute att will be generated after time max(TSa, TSb). CIKM'04

  11. Sliding Window Join • Suppose Ta and Tb are time windows for streams A and B respectively. We define the invalidation rule from the join state based on the sliding window: • Let tuple tabe the latest tuple with timestamp TSafrom stream A that has been processed.The tuple in the B state with timestamp TSbsuch that TSb+ Tb < TSais called a time-expired tuple and can be invalidated. The same invalidation rule applies to tuples in the A state. CIKM'04

  12. Basic Window join TSa-Tb TSb-Ta Tb … Ta … TSa TSb Stream A Stream B timeline CIKM'04

  13. Optimization Opportunities • Maintainsmaller state thaneitherpure window join or pure punctuation-exploiting join • Bid tuples that have been joined don’t need to be maintained in state (Punctuation) • Drop tuples without affecting precision of result • Bid tuples out of 24-hour window of corresponding Auction tuple don’t need to be processed • Aggregate result for some Auction tuples can be produced in less than 24 hours CIKM'04

  14. Features of PWJoin algorithm Punctuation-exploiting Window Join is composed of three operations: • Probing state to find matching tuples for producing join results. • Purging no-longer-joining tuples by punctuations. • Invalidating expired tuples by windows. Among these operations. CIKM'04

  15. Window and Punctuation Occur Simultaneously SELECT A.item_id, Count (*) FROM Auction [Range 24 Hours] A, Bid B WHERE A.item_id = B.item_id GROUP BY A.item_id Auction Stream Group-byitem_id(count(*)) Joinitem_id Bid Stream Out1 (item_id) Out2 (item_id, count) Contains punctuations on item_id Applies a 24-hour window on Auction stream CIKM'04

  16. PWJoin Basics and Issue Receive a new tuple ta from stream A Probe B state Invalidate tuples from B state Insert ta into A state • Issue: how to design PWJoin state to facilitate all search-based operations? • Invalidate conducts time-based search • Probe and Purge needs value-based search Receive a new punct pa from stream A Purge tuples from B state Insert pa into A state CIKM'04

  17. PWJoin State with Two-dimensional Index Time List I-Node Index (Hash Table) Punctuation Time List Window Begin 8 8 none 10 10 punctuated 8 8 10 tuple T-Node NextValueListTNode 4 NextTimeListTNode 8 Window End Key Head Tail PunctFlag I-Node CIKM'04

  18. PWJoin Algorithm • Invalidate: Once a new tuple t is retrieved from stream A, its timestamp is used to invalidate expired tuples from the head of the time list of stream B. • Probe: probe I-Node index and join with tuples in value list of matching I-Node. After invalidation is done, the join value of t is used to probe the I-Node index of the B state. If the matching I-Node iNode is found, the corresponding value list is located by following the Head pointer of iNode. Tuple t then joins with all tuples in this value list by following the NextValueListTNode pointer of each T-Node. Finally, the PunctFlag of iNode is checked. If it is “punctuated”, t is discarded. If it is “none”, t is inserted into the A state. CIKM'04

  19. PWJoin Algorithm Purge: probe I-Node index and delete tuples in value list of matching I-Node. When a new punctuation p is retrieved from stream A, p is used to probe the I-Node index of the B state. If the matching I-Node iNode is found, all tuples in the corresponding value list are deleted. iNode is removed from the I-Node index as well. If the PunctFlag of iNode is “punctuated”, p is discarded. If iNode is not found or iNode’s PunctFlag is “none”, p is used to probe the I-Node index of the A state and set the PunctFlag of the matching I-Node iNodea as “punctuated”. If iNodea does not exist, a new I-Node is created with its PunctFlag marked as true and inserted into the I-Node index of the A state. CIKM'04

  20. Punctuation Propagation [CIKM04] • An operator may propagate punctuations to benefit downstream operators Auction Stream Group-byitem_id(count(*)) Joinitem_id Bid Stream Item_id Bidder_id Bid_price propagate punctuations on item_id be unblocked by punctuations propagated by join operator 180 * * CIKM'04

  21. Optimizations Enabled by Combined Constraints Early Punctuation Propagation Tuple Dropping a1 a1 a6 a6 a1 a1 a2 a3 a2 a3 a3 a3 a3 a3 a7 a7 a4 a4 a3 a3 a2 a2 a1 a1 a8 a8 a3 a3 propagation point 2 a2 a2 a6 a6 a3 a3 a10 a10 a3 propagation point 1 a3 Stream S1 Stream S2 Stream S1 Stream S2 CIKM'04

  22. Achieving Optimizations by Combined Constraints • Early propagation • Invalidate punctuations in punctuation time list as invalidating tuples • Expired punctuations can be propagated • Tuple dropping • When early propagation happens, set PunctFlag of matching I-Node as “propagated” • Drop new tuples that matches an I-Node whose PunctFlag is “propagated” CIKM'04

  23. Memory Cost Analysis |Sb|T = |Sb|Tinsert - |Sb|Tpurge= |Sb|Tarrive - |Sb|Tpurge = bTb - bTb(paT/NKb,T) b – tuple input rate of stream B pa – punctuation input rate of stream A NKb,T - # of distinct join values occurred in stream B up to T’th time unit Tb – time window on stream B Saving by Punctuation Window Join CIKM'04

  24. PWJoin vs. WJoin – Memory and Tuple Output Rate Stream A, B: punct-asc-100-40 CIKM'04

  25. PWJoin vs. PJoin – Punctuation Output Rate Stream A: punct-asc-100-40, Stream B: punct-random-30-40 Window: 1 second CIKM'04

  26. Conclusion • PWJoin algorithm • Designed storage structure for PWJoin state • Memory cost analysis of PWJoin CIKM'04

  27. Thanks • WPI Database Research Group many slides are from davis.wpi.edu/~dsrg/CAPE/slides CIKM'04

  28. References • [CIKM04], L. Ding and E.A. Rundensteiner. Evaluating Window Joins over Punctuated Streams. CIKM04. • [KNV03] J. Kang, J. F. Naughton and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. ICDE’03. • [UF00] T. Urhan and M. Franklin, XJoin: A Reactively Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin, 23(2), 2000. • [HH99] P. Haas and J. Hellerstein, Ripple Joins for Online Aggregation. SIGMOD’99. • [GO03] L. Golab and M. T. Ozsu, Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. VLDB’03. • [GGO04] L. Golab, S. Garg and M. T. Ozsu, On Indexing Sliding Windows over On-line Data Streams, EDBT’04. • [RDS+04] E. A. Rundensteiner, L. Ding, T. Sutherland, Y. Zhu, B. Pielech and N. Mehta, CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. VLDB Demo, 2004. • [BW04] S. Babu and J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams • [TMS+03] P. A. Tucker, D. Maier, T. Sheard and L. Fegaras. Exploiting Punctuation Semantics in Continuous Data Streams. TKDE, 15(3), 2003. • [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, Joining Punctuated Streams. EDBT’04. • [MWA+03] R. Motwani, J. Widom, A. Arasu et al. Query Processing, Resource Management, and Approximation in a Data Stream Management System. CIDR’03. CIKM'04

  29. Thanks! CIKM'04

More Related