1 / 21

Update-Pattern-Aware Modeling and Processing of Continuous Queries

Update-Pattern-Aware Modeling and Processing of Continuous Queries. Lukasz Golab University of Waterloo, Canada lgolab@uwaterloo.ca Joint work with M. Tamer Özsu. Introduction. Relational algebra and queries

mikasi
Download Presentation

Update-Pattern-Aware Modeling and Processing of Continuous Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update-Pattern-Aware Modeling and Processing of Continuous Queries Lukasz Golab University of Waterloo, Canada lgolab@uwaterloo.ca Joint work with M. Tamer Özsu

  2. Introduction • Relational algebra and queries • Each operator consumes one or more relation instances and outputs a relation instance • Blocking computations • Some operators have non-blocking variants • aggregation, join Lukasz Golab

  3. What is a continuous query? • Expression composed of non-blocking ``relational’’ operators that operate on streams • Streams may be bounded by sliding windows • Q(t) = answer of a continuous query Q at time t • = output of corresponding one-time relational query Q’ whose inputs are the current states of the streams, windows, and tables referenced in Q Lukasz Golab

  4. Example of a continuous query s s SUM Output Inputs Lukasz Golab

  5. What is an update pattern? • Update pattern does not refer to individual tuples • stream = append-only • Update pattern refers to changes in the answer of a continuous query (insertions/deletions) • Deletions? Aren’t streams append-only? • Queries over an append-only database don’t necessarily produce append-only output Lukasz Golab

  6. Non-append-only output • Select stocks whose price this hour is greater than their price in the previous hour • Select all stock prices reported in the last 5 minutes Company X 8am $1.00Company X 9am $1.50Company X 10am $1.25 Update Pattern? FIFO Update Pattern 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Lukasz Golab

  7. Monotonic queries • Query Q is monotonic (over an append-only database) if Q(t) Q(t`) for all t ≤ t` • Queries over sliding windows are non-monotonic because all of their results eventually expire as the windows slide forward • Some queries are non-monotonic over an append-only database (stream) • Stock quotes whose price is higher than last hour • But others become non-monotonic due to windowing • Select all stock quotes – monotonic Lukasz Golab

  8. Problem definition • Motivation • Two possible reasons for non-monotonic behaviour of continuous queries • Problem statement • Divide non-monotonic queries into classes • Analyze the update patterns of each class • Use the knowledge of update patterns in query processing and optimization Lukasz Golab

  9. Outline • Update patterns of sliding window queries • Classification • Advantages of update-pattern awareness • Modeling (query semantics) • Processing (query execution) Lukasz Golab

  10. Sliding window operators • When a tuple falls out of its window, it also expires from the output and from operator state x z y z y DISTINCT x z x z x z z x y z x z x z z x y oldest S1 f a d a c f d a a c S2 c f g d a undo Lukasz Golab

  11. Calculating expiration times • Time-based windows – predictable expiration times • Assign a timestamp, ts, upon arrival • Expiration time = ts + window_size FIFO • For joins: min(expiration times of the joined tuples) • Predictable, but is it stillFIFO? • Count-based windows, non-monotonic queries over infinite streams - unpredictable • Expiration time depends on stream arrival rates or the data arriving on the stream  neednegative tuples Lukasz Golab

  12. Classification of update patterns • Monotonic: answers never expire • selection, join, duplicate elimination, over infinite streams • Weakest non-monotonic: answers expire in FIFO order, negative tuples are not necessary • operators over time-based windows that don’t reorder incoming tuples during processing • Weak non-monotonic: order is not FIFO, but negative tuples are not needed • Time-based window join, duplicate elimination • Strict non-monotonic: unpredictable expiration order • negation, queries over count-based windows Lukasz Golab

  13. Outline • Update patterns of sliding window queries • Classification • Advantages of update pattern awareness • Modeling (query semantics) • Processing (query execution) Lukasz Golab

  14. Update-pattern-aware semantics of continuous queries • How are updates of relational tables different from insertions and deletions caused by the movement of the windows? • Join of two infinite streams is monotonic • Join of two windows is weak non-monotonic • Join of a window and a table: weakest (easier), weak (same), or strict non-monotonic (harder)? Lukasz Golab

  15. Update-pattern-aware modeling of continuous queries, cont. • Harder: allow arbitrary table updates • Strict non-monotonic because we can’t predict when and how the table will be changed • Easier: don’t allow retroactive updates • Non-retroactive relation (NRR) – table updates don’t affect previously arrived stream tuples • Weakest non-monotonic Lukasz Golab

  16. Example • Stream: stock quotes • Table: mapping between stock symbols and company names • Query: select company name and price over a (time-based) window • Company goes bankrupt: delete its previous quotes (relation) or not (NRR) • Company changes name: update the name in previous quotes (relation) or not (NRR) • New company: no prior stock quotes Lukasz Golab

  17. Update-pattern-aware query processing • Annotate query plan with update patterns of each sub-query • Use appropriate data structures for storing state • Use appropriate physical operators Delete Insert partition by expiration time Strict non-monotonic Weakest or weak non-monotonic DISTINCT DISTINCT Lukasz Golab

  18. Update-pattern-aware query optimization • Cost model • Per-unit-time cost of executing operators, maintaining state, and processing negative tuples • Update-pattern-aware heuristic • Strict NM pull-up, weakest NM push-down • operator and state implementations are simpler with weakest and weak NM Lukasz Golab

  19. Update-pattern-aware query optimization, cont. STR s STR STR WK WKS STR WKS WKS WKS s s s WKS WKS Stream 1 Stream 2 Stream 3 Stream 1 Stream 2 Stream 3 Lukasz Golab

  20. Summary • Monotonic vs. non-monotonic classification is not precise enough • Fails to distinguish between predictable (due to windowing) and unpredictable update patterns • Our update-pattern classification • Clarifies the semantics of continuous queries that reference tables alongside streams/windows • Forms the basis of our update-pattern-aware query processor Lukasz Golab

  21. Future work • Extend update-pattern-aware query optimization • Investigate the update patterns of periodically re-executed queries • Sub-divide queries over count-based windows • For now, strict non-monotonic Lukasz Golab

More Related