Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding forAggregation Queries over Data Streams Nesime Tatbul Stan Zdonik

Talk Outline • Background • Load shedding in Aurora • Windowed aggregation queries • Window-aware load shedding • Experimental results • Related work • Conclusions and Future directions Nesime Tatbul, Brown University

QoS . . . . . . QoS QoS Load Shedding in Aurora Aurora Query Network • Problem: When load > capacity, latency QoS degrades. • Solution: Insert drop operators into the query plan. • Result: Deliver “approximate answers” with low latency. Nesime Tatbul, Brown University

Subset-based Approximation • For all queries, the delivered tuple set must be a subset of the original query answer set. • Maximum subset measure (e.g., [SIGMOD’03, VLDB’04]) • Loss-tolerance QoS of Aurora [VLDB’03] • For each query, the number of consecutive result tuples missed (i.e., gap tolerance) must be below a certain threshold. Nesime Tatbul, Brown University

Why Subset Results? • The application may expect to get correct values. • Missing tuples get updated with more recent ones. • Preserving subset guarantee anywhere in the query plan helps understand how the error propagates. • It depends on the application. Nesime Tatbul, Brown University

Windows • Finite excerpts of a data stream • Two parameters: size (ω) and slide (δ) • Example:StockQuote(symbol, time, price) (“IBM”, 10:00, 20) (“INTC”, 10:00, 15) (“MSFT”, 10:00, 22) (“IBM”, 10:05, 18) (“MSFT”, 10:05, 21) (“IBM”, 10:10, 18) (“MSFT”, 10:10, 20) (“IBM”, 10:15, 20) (“INTC”, 10:15, 20) (“MSFT”, 10:15, 20) . . size = 10 min slide by 5 min Nesime Tatbul, Brown University

Aggregate Aggregate Filter Filter Filter symbol=“IBM” ω = 5 min δ = 5 min diff = high-low diff > 5 ω= 60 min δ = 60 min count count > 0 Windowed Aggregation • Apply an aggregate function on the window • Average, Sum, Count, Min, Max • User-defined function • Can have grouping, nesting, and sharing • Example: Nesime Tatbul, Brown University

Average ω = 3 δ = 3 .. 25 20 .. 30 15 30 20 10 30 .. 30 15 30 20 10 30 .. 30 15 30 20 10 30 Drop p = 0.5 Average ω = 3 δ = 3 .. 15 15 .. 301530 20 10 30 Drop p = 0.5 .. 2520 Average ω = 3 δ = 3 .. 2520 Dropping from an Aggregation QueryTuple-based Approach • Drop before : non-subset result of nearly the same size • Drop after : subset result of smaller size Nesime Tatbul, Brown University

Average ω = 3 δ = 3 .. 25 20 .. 30 15 30 20 10 30 .. 30 15 30 20 10 30 Window Drop ω = 3, δ = 3 p = 0.5 .. 30 15 30 20 10 30 Average ω = 3 δ = 3 .. 25 20 Dropping from an Aggregation QueryWindow-based Approach • Drop before : subset result of smaller size “window-aware load shedding” Nesime Tatbul, Brown University

Window Drop ω, δ, p, B Window Drop Operator • Functionality: • Attach window keep/drop specifications into tuples. • Discard tuples that can be early-dropped. • Key parameters: ω: window size δ: window slide p: drop probability (one per group) B: batch size (one per group) Nesime Tatbul, Brown University

Window Drop for Nested Aggregation(Pipeline Arrangements) Window Drop ω, δ, B Aggregate1 ω1, δ1 Aggregaten ωn, δn B • Example: 1 2 3 4 5 6 7 8 9 10 11 … ω1=3, δ1=2 ω2=3, δ2=3 1 3 5 7 9 … 7 10 … 1 4 1 2 3 4 5 6 7 8 9 10 11 … ω=5, δ=3 1 4 7 10 … Nesime Tatbul, Brown University

Aggregate1 ω1, δ1 B1 Window Drop ω, δ, B Aggregaten ωn, δn Bn Window Drop for Shared Aggregation (Fan-out Arrangements) • Example: common start every 6 units 7 10 … 1 4 ω1=3, δ1=3 extent = 0 1 2 3 4 5 6 7 8 9 10 11 … ω2=3, δ2=2 max extent 1 3 5 7 9 … extent = 1 ω=7, δ=6 1 2 3 4 5 6 7 8 9 10 11 … … 1 7 Nesime Tatbul, Brown University

ω1=3, δ1=3 ω1=3, δ1=3 ω1=3, δ1=3 ω2=3, δ2=2 ω2=3, δ2=2 ω2=3, δ2=2 Window Drop for Nesting + Sharing(Composite Arrangements) • Apply the two basic rules in any order. Aggregate1 ω1, δ1 B1 Window Drop ω, δ, B Aggregate0 ω0, δ0 Aggregate2 ω2, δ2 B2 • Fan-out - Pipeline • Pipeline - Fan-out ω=6, δ=3 ω0=4, δ0=1 ω1=3, δ1=3 ω0=4, δ0=1 ω=7, δ=6 ω=6, δ=2 ω0=4, δ0=1 ω2=3, δ2=2 ≡ ω=10, δ=6 ω0=4, δ0=1 ω=10, δ=6 ω0=4, δ0=1 Nesime Tatbul, Brown University

Window Drop in Action • Mark tuples & apply early drops at the Window Drop. • Decode the marks at the Aggregate. drop keep .. 30 1530 20 10 30 .. 0 -1 -1 4 Window Drop ω = 3, δ = 3 p = 0.5 .. 30 15 30 20 10 30 dropped fake skip open .. 30 20 10 30 .. 0 -1 -1 4 .. 20 .. 1 Average ω = 3, δ = 3 Nesime Tatbul, Brown University

Fake Tuples • Fake tuples carry no data, but only window specs. • They arise when: • the tuple content is not needed as the tuple only indicates a window drop • the tuple to mark is • originally missing, or • filtered out during query processing • Overhead is low. Nesime Tatbul, Brown University

Early Drops • Sliding windows may overlap. • Window drop can discard a tuple if and only if all of its windows are going to be dropped. • At steady state, each tuple belongs to ~ ω/δwindows (i.e., for early drops, B ≥ω/δ). • If B < ω/δ, then no need to push window drop further upstream than the first aggregate in the pipeline. Nesime Tatbul, Brown University

Talk Outline • Background • Load shedding in Aurora • Windowed aggregation queries • Window-aware load shedding • Experimental results • Related work • Conclusions and Future directions Nesime Tatbul, Brown University

Experiments • Setup • Single-node Borealis • Streams: (time, value) pairs • Aggregation queries with “delays” • Basic measure: % total loss • Goals • Performance on nested and shared query plans • Effect of window parameters • Processing overhead Nesime Tatbul, Brown University

Delay 500ms Delay 5ms Aggregate 10, 10 Delay 50ms Aggregate 100, 100 Performance for Nested Plans Window Drop Random Drop Nesime Tatbul, Brown University

Performance for Shared Plans Random Drop Window Drop Aggregate 10, 10 Delay 50ms Delay 5ms Aggregate 20, 20 Delay 100ms Nesime Tatbul, Brown University

Delay 5ms Window Drop ω = 100, δ Delay 5ms Aggregate ω= 100, δ Effect of Window Slide Nesime Tatbul, Brown University

Window Drop ω, δ = ω Filter Delay 10ms Aggregate ω, δ = ω Processing Overhead Throughput Ratio (Window-Drop (p=0) / No Window-Drop) Nesime Tatbul, Brown University

Related Work • Load shedding for aggregation queries • Statistical approach of Babcock et al [ICDE’04] • Punctuation-based stream processing • Tucker et al [TKDE’03], Li et al [SIGMOD’05] • Other load shedding work (examples) • General: Aurora [VLDB’03], Data Triage [ICDE’05] • On joins: Das et al [SIGMOD’03], STREAM [VLDB’04] • With optimization: NiagaraCQ [ICDE’03, SIGMOD’04] • Approximate query processing on aggregates • Synopsis-based approaches, online aggregation [SIGMOD’97] Nesime Tatbul, Brown University

Conclusions • We provide a Window-aware Load Shedding technique for a general class of aggregation queries over data streams with: • sliding windows • user-defined aggregates • nesting, sharing, grouping • Window Drop preserves window integrity, enabling: • easy control of error propagation • subset guarantee for query results • early load reduction that minimizes error Nesime Tatbul, Brown University

Future Directions • Prediction-based load shedding • Can we guess the missing values? • Statistical approaches vs. Subset-based approaches • Window-awareness on joins • Memory-constrained environments • Distributed environments Nesime Tatbul, Brown University

Questions? Nesime Tatbul, Brown University

Decoding Window Specifications Nesime Tatbul, Brown University

Window-aware Load Shedding for Aggregation Queries over Data Streams