1 / 13

Load Shedding Techniques for Data Stream Systems

Load Shedding Techniques for Data Stream Systems. Brian Babcock Mayur Datar Rajeev Motwani Stanford University. Differences from Previous Talk. Our focus: Aggregation queries No quality of service specifications Instead, focus on accuracy of query answers

cailin-barr
Download Presentation

Load Shedding Techniques for Data Stream Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Load Shedding Techniques for Data Stream Systems Brian BabcockMayur DatarRajeev MotwaniStanford University

  2. Differences from Previous Talk • Our focus: Aggregation queries • No quality of service specifications • Instead, focus on accuracy of query answers • Compensate for dropped data by scaling answers • Random drops only (no semantic drops)

  3. Sliding Window Aggregate Queries(SUM and COUNT) Filters, UDFs, and Joins w/ Relations Operator Sharing Problem Setting Q1 Q2 Q3 Σ Σ Σ     R S1 S2

  4. Std Dev σMean μ Processing Time tSelectivity s Stream Rate r Inputs to the Problem Q1 Q2 Q3 Σ Σ Σ     R S1 S2

  5. Σ3 Scaleanswer by 1/p 2 1 Load = rt1 + p(rs1t2 + rs1s2t3) S Load Shedding via Random Drops (time, selectivity) (t3, s3) Load = rt1 + rs1t2 + rs1s2t3 (t2, s2) Sampling Rate p (t1, s1) Need Load ≤ 1 Stream Rate r

  6. Problem Statement • Relative error is metric of choice: |Estimate - Actual| Actual • Goal: Minimize the maximum relative error across queries, subject to Load ≤ 1 • Want low error with high probability

  7. Query-dependentconstant Relative errorfor query i Sampling ratefor query i Relating Load Shedding and Error • Equation derived from Hoeffding bounds • Constant Ci depends on: • Variance of aggregated attribute • Sliding window size

  8. Calculate Ratio of Sampling Rates • Minimize maximum relative error → Equal relative error across queries • Express all sampling rates in terms of common variable λ

  9. Placing Load Shedders Target .8λ Target.6λ Σ Σ   Sampling Rate .75 = .6λ /.8λ  Sampling Rate .8λ

  10. Conclusion • Load shedding helps cope with bursts • Minimizing relative error is natural objective for aggregate queries • Algorithm for load shedding: • Relate target sampling rates for all queries • Place random drop operators based on target sampling rates • Adjust sampling rates to achieve desired load

  11. Thanks for listening! • Questions?

  12. RelativeError Sliding window size Sampling ratefor query Variance of aggregated attribute Choosing Target Sampling Rates

  13. Tuple w/ value x: • x / (p1p2) • 0 with pr. p1p2with pr. 1-p1p2 Measuring Inaccuracy • Key point: Product of sampling rates determines quality of approximate answer Scale answer by 1/(p1p2) Σ3 Sampling Rate p2 2 Sampling Rate p1 1

More Related