1 / 23

On Load Shedding in Complex Event Processing

On Load Shedding in Complex Event Processing. Authors: Yeye He Microsoft Research Siddharth Barman California Institute of Technology Jeffrey F. Naughton University of Wisconsin-Madison. Presenter (non-author): Arvind Arasu Microsoft Research .

eithne
Download Presentation

On Load Shedding in Complex Event Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Load Shedding in Complex Event Processing Authors: Yeye He Microsoft Research SiddharthBarman California Institute of Technology Jeffrey F. NaughtonUniversity of Wisconsin-Madison Presenter (non-author): Arvind Arasu Microsoft Research

  2. Overview • Background: Complex Event Processing (CEP) • A different stream processing model • Problem: Load shedding in CEP • Maximize utility under resource constraints • Focus of this work • A problem taxonomy, hardness, and approximations

  3. Overview • Background: Complex Event Processing (CEP) • A different stream processing model • Problem: Load shedding in CEP • Maximize utility under resource constraints • Focus of this work • A problem taxonomy, hardness, and approximations

  4. Background: CEP Data Model • CEP event data • Event stream S = (e1, e2, … ) • Each event eiis associated with an event type Ej • Each event ei has a time-stamp, t(ei) • Stream S is temporally ordered: t(ei) < t(ei+1), for all i a1 b2 c3 d4 a5 b6 c7 d8 The superscript of event to denote the time-stamp,e.g. t(a1) = 1 Each event is associated with a type, e.g. event a1is of type A A set of four event types = {A, B, C, D}

  5. Background: CEP Query Model • CEP sequence query • Q = SEQ(E1, E2, ..Em), where Ek are event types • A time-based query window T(Q) • Only consider conjunctive queries in this work • An event sequence (, … ) is a query match of Q, if • Types match: is of type Ek for all k [m] • In query window: t() - t() T(Q) a1 b2 c3 d4 a5 b6 c7 d8 • Q1 = SEQ4(A, B) in 4 min • Q2= SEQ4(B, C) in 4 min • Q3= SEQ4(C, D) in 4 min Q1 Q1 Q3 Q3 Q2 Q2 Outside time-window Q1

  6. The Load Shedding Problem • Event streams are often bursty • Not all events can be processed timely • Given resource constraints (CPU/memory) • Problem: Selectively “shed” data/processing • To preserve the most useful query results

  7. Query Utility in CEP • Use query utility to quantify usefulness • Utility weight w(Qi) of query Qi to model importance a1 b2 c3 d4 a5 b6 c7 d8 • Q1 = SEQ4(A, B) • Q2 = SEQ4(B, C) • Q3= SEQ4(C, D) • , W(Q1)=3 • , W(Q2)=2 • , W(Q3)=4 Q3 Q1 Q3 Q1 W=4 W=3 W=4 W=3 Q2 Q2 W=2 W=2

  8. Utility Maximizing Load Shedding • Given a set of queries {Qi} • Given expectedquery matches in unit time interval • Estimated using event arrival statistics • Find a type-level, global shedding strategy that • Maximize the expected utility • Respect resource constraints (Memory/CPU/Dual) • Integral: discard all events/queries of certain types • Fractional: discard randomly sampled events/queries of certain types

  9. Why Expected Utility? • Online algorithms with competitive ratio? • Hopeless! • No algorithm can have competitive ratio better than , where is the length of the event sequence • Prove by using an adversarial scenario

  10. An Adversarial Scenario • event types: • unit-weight queries: SEQ(), • Event sequence: () • is of type , • drawn from with equal probability • Memory budget = 2 events • Offline optimal: utility = 1 • pick one from based on X • Online optimal: expected utility = • Competitive ratio: Instead, we optimize utility in the expected sense

  11. Resource Constraint: Limited CPU • Not all queries can be processed by CPU • E.g., CPU need to process 3 unit-cost queries (per 4 time units) • Unit-cost for simplicity, queries can have arbitrary costs • Suppose CPU can only process 2 queries • Best strategy: discard Q2, keep Q1 and Q3 (highest gain queries) a1 b2 c3 d4 a5 b6 c7 d8 • Q1 = SEQ4(A, B), W(Q1)=3 • Q2= SEQ4(B, C), W(Q2)=2 • Q3= SEQ4(C, D), W(Q3)=4 Q3 W=4 Q1 W=3 Q3 W=4 Q1 W=3 Q2 W=2 Q2 W=2

  12. Resource Constraint: Limited Memory • Not all events can be kept in memory • E.g., need to keep 4 events in memory (in 4 time units) • Because query window = 4 • Suppose memory = 3 (per 4 time units) • Best strategy: keep B, C, D and discard A. U=+=6 • Discard D? U=+=5 • Discard B? U==4; Discard C? U==3 a1 b2 c3 d4 a5 b6 c7 d8 • Q1 = SEQ4(A, B), W(Q1)=3 • Q2 = SEQ4(B, C), W(Q2)=2 • Q3= SEQ4(C, D), W(Q3)=4 Q3 W=4 Q1 W=3 Q3 W=4 Q1 W=3 Q2 W=2 Q2 W=2

  13. Integral Memory-bound LS (IMLS) • Negative results • NP-hard • Unlikely to be approximated within • Unless 3SAT • Reduction from Densest k-Sub-Hypergraph [1] Hajiaghayi, et al. The minimum k-colored subgraphproblem in Haplotypingand DNA primer selection. Bioinformatics Research and Applications, 2006

  14. Integral Memory-bound LS (IMLS) • Positive results • A general bi-criteria approximation for utility loss minimization • optimal loss with budget • () bi-criteria approximation: utility loss is at most using memory • LP-rounding based algorithm

  15. Integral Memory-bound LS (IMLS) • Positive results (cont’d) • Another approximate special case: • If the memory can hold at least 1/f number of queries • memory capacity is reasonably large • An event can be in at most number of queries • A -approximation algorithm • For utility gain maximization • Use Knapsack-like approach

  16. Integral Memory-bound LS (IMLS) • Positive results (cont’d) • Pseudo-polynomial-time solvable special case • Multi-tenant CEP applications, co-locating on same server • Disjoint events for each application • Each application has no more than events • IMLS can be solved in time O() • : total # of events • : total # of queries • M: memory budget

  17. Fractional Memory-bound LS (FMLS) • Negative result: • NP-hard even if each query has exactly two events • Positive result: • relative-approximation for utility gain maximization • If memory requirement of each event type exceeds total budget • controls precision (, ) • max number of event in a query • Use a grid-based approach on Simplex [2] [2] de Klerk, et al. A PTAS for the minimization of polynomials of fixed degree over the simplex. Theoretical Computer Science, 2006

  18. Integral CPU-bound LS (ICLS) • Negative result • NP-complete • Positive result: • Admits an FPTAS: rounding off least significant bits • Use knapsack results • ICLS is an easy load shedding variant

  19. Fractional CPU-bound LS (FCLS) • Positive result • Can be written as a simple Linear Program • Polynomial time solvable • FCLS is the easiest load shedding variant

  20. Integral Dual-bound LS (IDLS) • Negative result: • NP-hard & inapproximable • same as IMLS • Positive result: • A tri-criteria approximation • optimal loss with memory budget & CPU budget • At mostutility loss using memory & CPU • LP-rounding based algorithm

  21. Fractional Dual-bound LS (FDLS) • Negative result: • NP-hard even if each query has exactly two events • Same as FMLS since FDLS is a special case • Approximation: open problem • Non-convex optimization subject to non-convex constraints • We didn’t find good techniques for this

  22. Conclusion and Future Work • Study the old problem of load shedding in the new context of CEP • Investigate six problem variants • Hardness & approximation (more results in the paper) • A rich problem with more to study • Delayed variants: instance-level optimization • Query language beyond positive event occurrence

  23. Thank you! Questions?

More Related