1 / 20

Efficiently Correlating Complex Events over Live and Archived Data Streams

Nihal Dindar , Peter M. Fischer, Merve Soner , Nesime Tatbul ETH Zurich, Switzerland . Efficiently Correlating Complex Events over Live and Archived Data Streams. What is a Pattern Correlation Query (PCQ) ? . Upon detecting a fall in the current price of stock X on the live stream,.

dawson
Download Presentation

Efficiently Correlating Complex Events over Live and Archived Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NihalDindar, Peter M. Fischer, MerveSoner, NesimeTatbul ETH Zurich, Switzerland Efficiently Correlating Complex Events over Live and Archived Data Streams

  2. What is a Pattern Correlation Query (PCQ) ? • Upon detecting a fallin the current price of stock X on the livestream, • look for a tick-shaped pattern for X within recent archive Price fall pattern (live match) recencyregion tick-shaped pattern (archive matches) Time

  3. PCQ = Live Archive • Fall pattern on live stream: • PATTERN(A+) • DEFINE A AS A.Price < PREV(A.Price) • Tick-shaped pattern on archive stream: • PATTERN(A+B+) • DEFINE A AS A.Price < PREV(A.Price) B AS B.Price > PREV(B.Price) AND LAST(B.Price) > FIRST(A.Price) • Correlation Criteria • WHERE symbol_l = symbol_a • RECENCY = 10 minutes

  4. Challenges • A clean, useful, optimizable semantics for PCQ • Needed definitions: archive of an event, recency e.g., • Efficient access and processing of fast growing archive data • Optimized processing of high-cost complex pattern matching queries to achieve scalability with potentially high live stream rates

  5. Related Work • Pattern matching systems for live streams • Academic: Cayuga, SASE+, ZStream • Commercial: Coral8, ESPER, Oracle CEP, StreamBase • Systems which combine live and historical data • Moirae, NiagaraST/Latte, TelegraphCQ • Summary: either live pattern matching or combined processing of live and historical data, but not both

  6. Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work

  7. Modeling PCQs Event ahappens before(->) event bif astarts before b starts and ends before b ends. A stream is totally ordered based on start and then end time of its events. Price fall pattern (live match) tick-shaped pattern (archive matches) recencyregionsize = P An event has start and end time. An event b has recency correlation with an event a, where a->b and a’s start time is inside b’srecency region. Time

  8. Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work

  9. Baseline PCQ Processing Strategy : The Lazy Approach Step 1: Look for live matches Step 2: Calculate the recency region Step 3: Look for archive matches Step 4: Apply the join condition and Join the live and archive matches Price fall pattern (live match) recencyregion tick-shaped pattern (archive matches) Time

  10. Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work

  11. Optimizing PCQs - Recent Input Buffer • is an in-memory data structure that mediates between live and archived event stores • caches the most recent stream tuples for efficient access • provides bulk inserts into the stream archive

  12. Optimizing PCQs - Query Result Caching • caches archive matches in order to avoid recomputing them for overlapping regions 1 2 3 Live Stream 1 2 3 4 5 Archive Stream Recency Region P Query Result Cache 3 2 1 5 4 Archive matches are retrieved from the Query Result Cache

  13. Optimizing PCQs - Join Source Ordering • Selectivity Criteria: to process the more selective pattern first • Processing Cost Criteria: to avoid the processing of hot spots Recency region for archive first 1 2 Recency region for live first 1 Recency region for live first Live Stream Archive Stream

  14. Optimizing PCQs: Architectural Overview

  15. Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work

  16. Experimental Results • Data: January 26 to 31, 2006 of stock-market data from NYSE • Query: (live pattern: fall), (archive pattern: tick-shaped) • Stock : Exxon Mobile (XOM), P covers several hours baseline

  17. Summary of Experimental Results • PCQs are expensive • Optimization pays off • Our optimizations provide big improvement baseline

  18. Outline • Introduction • Modeling PCQs • Processing PCQs • Optimizing PCQs • Experimental Results • Conclusions and Future Work

  19. Conclusions • We have investigated the problem of efficiently correlating complex events over live and archived data streams, providing: • an optimizablesemantics for Pattern Correlation Queries • Recent input buffering to deal with different access speed of live and archive data • Query result cache & join source ordering to reduce the quadratic complexity of PCQ processing for scaling with high stream rates

  20. Future Work • Optimizations for response time • Indexes on result cache • Introduction of other correlation criteria such as context similarity, temporal periodicity.

More Related