1 / 30

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries. Database Systems Research Laboratory Worcester Polytechnic Institute. Bin Liu, Yali Zhu and Elke A. Rundensteiner. Decision-Make Applications. Analyze relationship among stock price, reports, and news?. Decision

ida
Download Presentation

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Database Systems Research Laboratory Worcester Polytechnic Institute Bin Liu, Yali Zhu and Elke A. Rundensteiner SIGMOD'06

  2. Decision-Make Applications ... Analyze relationship among stock price, reports, and news? Decision Support System A equi-Join of stock price, reports, and news on stock symbols Complex queries such as multi-joins are common! Motivating Example Real Time Data Integration Server Stock Price, Volumes,... ... ... Reviews, External Reports, News, ... • Produce As Many Results As Possible at Run-Time • (i.e., 9:00am-4:00pm) • Require Complete Query Results • (i.e., for offline analysis after 4:00pm) SIGMOD'06

  3. a4 b1 a4 b2 a2 b3 State A State B a1 b1 A B a2 b2 a3 A B Challenges • As Many Run-Time Results As Possible • Demand main memory based query processing • Push-Based Processing with Complex Queries • Demand main memory space to store operator states • Operator states may monotonically increase over time • Run-Time Main Memory Overflow? b3 a4 a4 b3 SIGMOD'06

  4. Problem : Memory Overflow • High Demand on Main Memory : • High input rates and large windows result in huge states • Bursty streams cause temporary accumulation of tuples • Long-running queries exhibit monotonic state increases • Potential Solutions : • Query Optimization • Distributed Processing • Load Shedding • Memory Management SIGMOD'06

  5. A B C Secondary Storage A B C A B C • New incoming tuples processed only against partial states State Spill • Push Operator States Temporarily into Disks • Operator states spilled are temporarily inactive SIGMOD'06

  6. State of Art : State Flushing • Three-staged Processing : Hash • Xjoin [UF00] • Two Algorithms : Hash + Merge • Hash-Merge Join [MLA04] • Single-input, Distributed Environment • Flux [SHCF03] Observation: Single OperatorFocus !!! SIGMOD'06

  7. ? • Increase memory consumption of Join2: • May quickly fill main memory • May require state spill again • Causes more work downstream Join2 D Join1 • But states in Join2 may not contribute to final output : • Low selectivity A B C Problem : What about Multi-Operator Plans ? • Observation: • Interdependency among Pipelined Operators • Spilling of bottom operators affects its downstream operators ! Maximize Run-time Throughput of Join1 !! SIGMOD'06

  8. Outline • Basics on State Spill • Plan-level Spill Strategies • Experimental Evaluation SIGMOD'06

  9. 2 1 2 1 4 3 4 3 2 Granularity : State Partitioning • Divide Input Streams into Large Number of Partitions • At run-time, only need to choose partitions to spill [DNS92,SH03] • Avoid expensive run-time repartitioning • Does not affect partitions that are not spilled m1 m2 • Example : • 300 partitions • M1 has odd IDs • M2 has even IDs Join Join Split Split Split A B C SIGMOD'06

  10. Partition Granularity : Choose State? • Multiple States Exist from Different Inputs • Select States from One Input Only To disk ... • Select States with Same ID To disk ... • Avoid across-machine processing • Simplify spill management • Streamline cleanup process Partition Group Granularity! SIGMOD'06

  11. 00 0 (PA1 , PB1 , PC1 ) • The Results Have Been Generated V0 = PA10 PB10 PC10 V1 = PA11 PB11 PC11 ... Vk = PA1k PB1k PC1k Clean Up Stage • Partition Groups Could be Pushed Multiple Times , (PA11, PB11, PC11) , (PA12, PB12, PC12) ,..., (PA1k, PB1k, PC1k) • Incremental View Maintenance Algorithm [ZMH+95] • Treat Multiple Join as Materialized View • Partition Groups as Source Updates SIGMOD'06

  12. After Merge • Combined States: PA10  PA11,PB10  PB11,PC10  PC11 • Final Result: V = (PA10  PA11) (PB10  PB11) (PC10  PC11) • Missing Results: = V - V0 - V1 • V-V0 = PA11 PB10 PC10  (PA10  PA11) PB11 PC10  (PA10  PA11) (PB10  PB11) PC11 Merge Disk Resident States • To Merge Two Partition Groups with Same ID • i.e., (PA10, PB10, PC10) and (PA11, PB11, PC11) • V0 = PA10 PB10 PC10, V1 = PA11 PB11 PC11 SIGMOD'06

  13. State Spill Strategies SIGMOD'06

  14. Which Partitions to Push? • Throughput-Oriented State Spill • Productivity of a partition group : • Poutput: Number of output tuples generated from partition group • Psize: Size of partition group in terms of number of tuples • Productivity:Poutput/Psize SIGMOD'06

  15. Globally Choose Partition Groups • Rank Partitions Based on Productivity:Poutput/Psize • Choose globally least productive partitions to spill Disk Join3 Direct Extension : Local Output Method … State Spill E Join2 D Join1 A B C SIGMOD'06

  16. Bottom Up Pushing Strategy • Spill States from Bottom Operators First • Choose partitions from Join1 until it reaches threshold k% • If not done, choose partitions from Join2, and so on Partition Selection: Randomly or using local productivity Join3 E Join2 • Minimize intermediate results in upstream operators (memory) • Minimize number of state spill processes D Join1 A B C Less spill process  Higher overall query throughput ? SIGMOD'06

  17. 2 1 2 10 p11 p21 1 p12 p22 ... ... OP1 OP2 It may worthwhile to push P21 instead of P11! Partition Interdependency • Smaller Number of Spill Processes  High Throughput !! • Partition pushed in bottom operator may be parent for productive partitions in its downstream operators • Global Strategy : Account for Dependency Relationships ! SIGMOD'06

  18. Update Poutput values of partitions in Join3 Join3 Split2 SplitE • Apply Split2 to each tuple and find corresponding partitions from Join2, and update its Poutput value Join2 E Split1 SplitD Join1 D • And so on … SplitA SplitB SplitC A B C “True” Global Output Strategy • Poutput: Contribution to Final Query Output • Employ lineage tracing algorithm to update Poutput statistics k SIGMOD'06

  19. Global Output with Penalty • Incorporate Intermediate Result Sizes P11: Psize = 10, Poutput=20 P12: Psize = 10, Poutput=20 1 2 2 p11 p2i 2 ... 1 ... ... p12 p2j 20 OP1 OP2 • Intermediate Result Factor Pinter • Productivity value: Poutput/(Psize + Pinter) SIGMOD'06

  20. 4 2 3 3 4 1 2 p11 p21 ... p31 ... p41 ... p12 ... p2j p3j p4j OP1 OP2 OP3 OP4 3+4 4 2+3+4 Global Penalty : Tracing Pinter • Penalty Pinter : Contribution to Intermediate Result Sizes • Apply Similar Lineage Tracing Algorithm for Pinter 2 3 4 SIGMOD'06

  21. CAPE System Overview [LZ+05, TLJ+05] Query Processor Distribution Manager Connection Manager Local Statistics Gatherer Local Adaptation Controller Query Plan Manager Runtime Monitor CAPE-Continuous Query Processing Engine Global Adaptation Controller Repository Data Distributor Data Receiver Repository Streaming Data Network End User Application Server Stream Generator SIGMOD'06

  22. Experimental Setup : Queries and Data • Inputs: A, B, C, D, and E data streams • Query : Join1:A1=B1=C1, Join2:C2=D1, Join3:D2=E1 • Query Operators : Use symmetric hash join • Each input stream is partitioned into 300 partitions • Query is partitioned and run in two machines • Memory threshold for spill : 60MB • Push 30% of states in each state spill • Average tuple inter-arrival time 50ms from each input SIGMOD'06

  23. Experimental Setup • High Performance PC cluster • Dual 2.4GHz CPUs, 2G Memory, Gigabit Network • 3 Machines for Stream Generator, Application Server, and Distribution Manager. • Each Query Processor on Separate Machine • Generated Data Streams with Integer Join Column Values • Data value V appears R times for every K input tuples • Tuple Range : K • Range Join Ratio : R • Average Join Rate : Average number of tuples with same join value per input SIGMOD'06

  24. Amount of State Pushed Each Adaptation Percentage: # of Tuples Pushed / Total # of Tuples Percentage Spilled per Adaptation Run-Time Query Throughput Run-Time Main Memory Usage (Input Rate: 30ms/Input, Tuple Range:30K, Join Ratio:3, Adaptation threshold: 200MB) SIGMOD'06

  25. Experiment : Throughput & Memory Query with Average Join Rate: Join1: 3, Join2: 1, Join3: 1 SIGMOD'06

  26. Experiment : Throughput Comparison Query with Average Join Rate: Join1: 1, Join2: 3, Join3: 3 Query with Average Join Rate: Join1: 3, Join2: 2, Join3: 3 SIGMOD'06

  27. Experimental Summary • Productivity metric improves run-time throughput • Global-output-with-penality is overall winner • Global output (with and without penality) outperform alternates in runtime throughput • Global output (with and without penality) have similar (good) cleanup costs • Bottom-up strategy has lowest # of adaptations, yet poor performer and high cleanup costs SIGMOD'06

  28. Conclusions • Identified Problem of Plan-Spill • State spill using “productivity” viable • Proposed Plan-Level Spill Policies • Dependencies considered for multi-operator plans • Evaluated Spill Policies • Global spill solutions improve throughput SIGMOD'06

  29. Thank You ! Questions ? SIGMOD'06

  30. Acknowledgments • DSRG students contributed to CAPE code base, including Luping Ding, Bin Liu, Tim Sutherland, Brad Pielech, Rimma Nehme, Mariana Jbantova, Brad Momberger, Song Wang, Natasha Bogdanova • Thanks to National Science Foundation for partial support via IDM and equipment grants, to WPI for RDC grant, and to NEC for student support SIGMOD'06

More Related