1 / 27

Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding for Aggregation Queries over Data Streams. Nesime Tatbul Stan Zdonik. Talk Outline. Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental results Related work Conclusions and Future directions. QoS. QoS. QoS.

leone
Download Presentation

Window-aware Load Shedding for Aggregation Queries over Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Window-aware Load Shedding forAggregation Queries over Data Streams Nesime Tatbul Stan Zdonik

  2. Talk Outline • Background • Load shedding in Aurora • Windowed aggregation queries • Window-aware load shedding • Experimental results • Related work • Conclusions and Future directions Nesime Tatbul, Brown University

  3. QoS . . . . . . QoS QoS Load Shedding in Aurora Aurora Query Network • Problem: When load > capacity, latency QoS degrades. • Solution: Insert drop operators into the query plan. • Result: Deliver “approximate answers” with low latency. Nesime Tatbul, Brown University

  4. Subset-based Approximation • For all queries, the delivered tuple set must be a subset of the original query answer set. • Maximum subset measure (e.g., [SIGMOD’03, VLDB’04]) • Loss-tolerance QoS of Aurora [VLDB’03] • For each query, the number of consecutive result tuples missed (i.e., gap tolerance) must be below a certain threshold. Nesime Tatbul, Brown University

  5. Why Subset Results? • The application may expect to get correct values. • Missing tuples get updated with more recent ones. • Preserving subset guarantee anywhere in the query plan helps understand how the error propagates. • It depends on the application. Nesime Tatbul, Brown University

  6. Windows • Finite excerpts of a data stream • Two parameters: size (ω) and slide (δ) • Example:StockQuote(symbol, time, price) (“IBM”, 10:00, 20) (“INTC”, 10:00, 15) (“MSFT”, 10:00, 22) (“IBM”, 10:05, 18) (“MSFT”, 10:05, 21) (“IBM”, 10:10, 18) (“MSFT”, 10:10, 20) (“IBM”, 10:15, 20) (“INTC”, 10:15, 20) (“MSFT”, 10:15, 20) . . size = 10 min slide by 5 min Nesime Tatbul, Brown University

  7. Aggregate Aggregate Filter Filter Filter symbol=“IBM” ω = 5 min δ = 5 min diff = high-low diff > 5 ω= 60 min δ = 60 min count count > 0 Windowed Aggregation • Apply an aggregate function on the window • Average, Sum, Count, Min, Max • User-defined function • Can have grouping, nesting, and sharing • Example: Nesime Tatbul, Brown University

  8. Average ω = 3 δ = 3 .. 25 20 .. 30 15 30 20 10 30 .. 30 15 30 20 10 30 .. 30 15 30 20 10 30 Drop p = 0.5 Average ω = 3 δ = 3 .. 15 15 .. 301530 20 10 30 Drop p = 0.5 .. 2520 Average ω = 3 δ = 3 .. 2520 Dropping from an Aggregation QueryTuple-based Approach • Drop before : non-subset result of nearly the same size • Drop after : subset result of smaller size Nesime Tatbul, Brown University

  9. Average ω = 3 δ = 3 .. 25 20 .. 30 15 30 20 10 30 .. 30 15 30 20 10 30 Window Drop ω = 3, δ = 3 p = 0.5 .. 30 15 30 20 10 30 Average ω = 3 δ = 3 .. 25 20 Dropping from an Aggregation QueryWindow-based Approach • Drop before : subset result of smaller size “window-aware load shedding” Nesime Tatbul, Brown University

  10. Window Drop ω, δ, p, B Window Drop Operator • Functionality: • Attach window keep/drop specifications into tuples. • Discard tuples that can be early-dropped. • Key parameters: ω: window size δ: window slide p: drop probability (one per group) B: batch size (one per group) Nesime Tatbul, Brown University

  11. Window Drop for Nested Aggregation(Pipeline Arrangements) Window Drop ω, δ, B Aggregate1 ω1, δ1 Aggregaten ωn, δn B • Example: 1 2 3 4 5 6 7 8 9 10 11 … ω1=3, δ1=2 ω2=3, δ2=3 1 3 5 7 9 … 7 10 … 1 4 1 2 3 4 5 6 7 8 9 10 11 … ω=5, δ=3 1 4 7 10 … Nesime Tatbul, Brown University

  12. Aggregate1 ω1, δ1 B1 Window Drop ω, δ, B Aggregaten ωn, δn Bn Window Drop for Shared Aggregation (Fan-out Arrangements) • Example: common start every 6 units 7 10 … 1 4 ω1=3, δ1=3 extent = 0 1 2 3 4 5 6 7 8 9 10 11 … ω2=3, δ2=2 max extent 1 3 5 7 9 … extent = 1 ω=7, δ=6 1 2 3 4 5 6 7 8 9 10 11 … … 1 7 Nesime Tatbul, Brown University

  13. ω1=3, δ1=3 ω1=3, δ1=3 ω1=3, δ1=3 ω2=3, δ2=2 ω2=3, δ2=2 ω2=3, δ2=2 Window Drop for Nesting + Sharing(Composite Arrangements) • Apply the two basic rules in any order. Aggregate1 ω1, δ1 B1 Window Drop ω, δ, B Aggregate0 ω0, δ0 Aggregate2 ω2, δ2 B2 • Fan-out - Pipeline • Pipeline - Fan-out ω=6, δ=3 ω0=4, δ0=1 ω1=3, δ1=3 ω0=4, δ0=1 ω=7, δ=6 ω=6, δ=2 ω0=4, δ0=1 ω2=3, δ2=2 ≡ ω=10, δ=6 ω0=4, δ0=1 ω=10, δ=6 ω0=4, δ0=1 Nesime Tatbul, Brown University

  14. Window Drop in Action • Mark tuples & apply early drops at the Window Drop. • Decode the marks at the Aggregate. drop keep .. 30 1530 20 10 30 .. 0 -1 -1 4 Window Drop ω = 3, δ = 3 p = 0.5 .. 30 15 30 20 10 30 dropped fake skip open .. 30 20 10 30 .. 0 -1 -1 4 .. 20 .. 1 Average ω = 3, δ = 3 Nesime Tatbul, Brown University

  15. Fake Tuples • Fake tuples carry no data, but only window specs. • They arise when: • the tuple content is not needed as the tuple only indicates a window drop • the tuple to mark is • originally missing, or • filtered out during query processing • Overhead is low. Nesime Tatbul, Brown University

  16. Early Drops • Sliding windows may overlap. • Window drop can discard a tuple if and only if all of its windows are going to be dropped. • At steady state, each tuple belongs to ~ ω/δwindows (i.e., for early drops, B ≥ω/δ). • If B < ω/δ, then no need to push window drop further upstream than the first aggregate in the pipeline. Nesime Tatbul, Brown University

  17. Talk Outline • Background • Load shedding in Aurora • Windowed aggregation queries • Window-aware load shedding • Experimental results • Related work • Conclusions and Future directions Nesime Tatbul, Brown University

  18. Experiments • Setup • Single-node Borealis • Streams: (time, value) pairs • Aggregation queries with “delays” • Basic measure: % total loss • Goals • Performance on nested and shared query plans • Effect of window parameters • Processing overhead Nesime Tatbul, Brown University

  19. Delay 500ms Delay 5ms Aggregate 10, 10 Delay 50ms Aggregate 100, 100 Performance for Nested Plans Window Drop Random Drop Nesime Tatbul, Brown University

  20. Performance for Shared Plans Random Drop Window Drop Aggregate 10, 10 Delay 50ms Delay 5ms Aggregate 20, 20 Delay 100ms Nesime Tatbul, Brown University

  21. Delay 5ms Window Drop ω = 100, δ Delay 5ms Aggregate ω= 100, δ Effect of Window Slide Nesime Tatbul, Brown University

  22. Window Drop ω, δ = ω Filter Delay 10ms Aggregate ω, δ = ω Processing Overhead Throughput Ratio (Window-Drop (p=0) / No Window-Drop) Nesime Tatbul, Brown University

  23. Related Work • Load shedding for aggregation queries • Statistical approach of Babcock et al [ICDE’04] • Punctuation-based stream processing • Tucker et al [TKDE’03], Li et al [SIGMOD’05] • Other load shedding work (examples) • General: Aurora [VLDB’03], Data Triage [ICDE’05] • On joins: Das et al [SIGMOD’03], STREAM [VLDB’04] • With optimization: NiagaraCQ [ICDE’03, SIGMOD’04] • Approximate query processing on aggregates • Synopsis-based approaches, online aggregation [SIGMOD’97] Nesime Tatbul, Brown University

  24. Conclusions • We provide a Window-aware Load Shedding technique for a general class of aggregation queries over data streams with: • sliding windows • user-defined aggregates • nesting, sharing, grouping • Window Drop preserves window integrity, enabling: • easy control of error propagation • subset guarantee for query results • early load reduction that minimizes error Nesime Tatbul, Brown University

  25. Future Directions • Prediction-based load shedding • Can we guess the missing values? • Statistical approaches vs. Subset-based approaches • Window-awareness on joins • Memory-constrained environments • Distributed environments Nesime Tatbul, Brown University

  26. Questions? Nesime Tatbul, Brown University

  27. Decoding Window Specifications Nesime Tatbul, Brown University

More Related