on load shedding in complex event processing
Download
Skip this Video
Download Presentation
On Load Shedding in Complex Event Processing

Loading in 2 Seconds...

play fullscreen
1 / 23

On Load Shedding in Complex Event Processing - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

On Load Shedding in Complex Event Processing. Authors: Yeye He Microsoft Research Siddharth Barman California Institute of Technology Jeffrey F. Naughton University of Wisconsin-Madison. Presenter (non-author): Arvind Arasu Microsoft Research .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' On Load Shedding in Complex Event Processing' - eithne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
on load shedding in complex event processing

On Load Shedding in Complex Event Processing

Authors:

Yeye He Microsoft Research

SiddharthBarman California Institute of Technology

Jeffrey F. NaughtonUniversity of Wisconsin-Madison

Presenter (non-author): Arvind Arasu Microsoft Research

overview
Overview
  • Background: Complex Event Processing (CEP)
    • A different stream processing model
  • Problem: Load shedding in CEP
    • Maximize utility under resource constraints
  • Focus of this work
    • A problem taxonomy, hardness, and approximations
overview1
Overview
  • Background: Complex Event Processing (CEP)
    • A different stream processing model
  • Problem: Load shedding in CEP
    • Maximize utility under resource constraints
  • Focus of this work
    • A problem taxonomy, hardness, and approximations
background cep data model
Background: CEP Data Model
  • CEP event data
    • Event stream S = (e1, e2, … )
    • Each event eiis associated with an event type Ej
    • Each event ei has a time-stamp, t(ei)
    • Stream S is temporally ordered: t(ei) < t(ei+1), for all i

a1

b2

c3

d4

a5

b6

c7

d8

The superscript of event to denote the time-stamp,e.g. t(a1) = 1

Each event is associated with a type, e.g. event a1is of type A

A set of four event types = {A, B, C, D}

background cep query model
Background: CEP Query Model
  • CEP sequence query
    • Q = SEQ(E1, E2, ..Em), where Ek are event types
    • A time-based query window T(Q)
    • Only consider conjunctive queries in this work
  • An event sequence (, … ) is a query match of Q, if
    • Types match: is of type Ek for all k [m]
    • In query window: t() - t() T(Q)

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B) in 4 min
  • Q2= SEQ4(B, C) in 4 min
  • Q3= SEQ4(C, D) in 4 min

Q1

Q1

Q3

Q3

Q2

Q2

Outside time-window

Q1

the load shedding problem
The Load Shedding Problem
  • Event streams are often bursty
  • Not all events can be processed timely
    • Given resource constraints (CPU/memory)
  • Problem: Selectively “shed” data/processing
    • To preserve the most useful query results
query utility in cep
Query Utility in CEP
  • Use query utility to quantify usefulness
      • Utility weight w(Qi) of query Qi to model importance

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B)
  • Q2 = SEQ4(B, C)
  • Q3= SEQ4(C, D)
  • , W(Q1)=3
  • , W(Q2)=2
  • , W(Q3)=4

Q3

Q1

Q3

Q1

W=4

W=3

W=4

W=3

Q2

Q2

W=2

W=2

utility maximizing load shedding
Utility Maximizing Load Shedding
  • Given a set of queries {Qi}
  • Given expectedquery matches in unit time interval
    • Estimated using event arrival statistics
  • Find a type-level, global shedding strategy that
    • Maximize the expected utility
    • Respect resource constraints (Memory/CPU/Dual)
    • Integral: discard all events/queries of certain types
    • Fractional: discard randomly sampled events/queries of certain types
why expected utility
Why Expected Utility?
  • Online algorithms with competitive ratio?
    • Hopeless!
  • No algorithm can have competitive ratio better than , where is the length of the event sequence
    • Prove by using an adversarial scenario
an adversarial scenario
An Adversarial Scenario
  • event types:
  • unit-weight queries: SEQ(),
  • Event sequence: ()
    • is of type ,
    • drawn from with equal probability
  • Memory budget = 2 events
  • Offline optimal: utility = 1
    • pick one from based on X
  • Online optimal: expected utility =
  • Competitive ratio:

Instead, we optimize utility in the expected sense

resource constraint limited cpu
Resource Constraint: Limited CPU
  • Not all queries can be processed by CPU
  • E.g., CPU need to process 3 unit-cost queries (per 4 time units)
    • Unit-cost for simplicity, queries can have arbitrary costs
  • Suppose CPU can only process 2 queries
    • Best strategy: discard Q2, keep Q1 and Q3 (highest gain queries)

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B), W(Q1)=3
  • Q2= SEQ4(B, C), W(Q2)=2
  • Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2

resource constraint limited memory
Resource Constraint: Limited Memory
  • Not all events can be kept in memory
  • E.g., need to keep 4 events in memory (in 4 time units)
    • Because query window = 4
  • Suppose memory = 3 (per 4 time units)
    • Best strategy: keep B, C, D and discard A. U=+=6
    • Discard D? U=+=5
    • Discard B? U==4; Discard C? U==3

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B), W(Q1)=3
  • Q2 = SEQ4(B, C), W(Q2)=2
  • Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2

integral memory bound ls imls
Integral Memory-bound LS (IMLS)
  • Negative results
    • NP-hard
    • Unlikely to be approximated within
      • Unless 3SAT
    • Reduction from Densest k-Sub-Hypergraph

[1] Hajiaghayi, et al. The minimum k-colored subgraphproblem in Haplotypingand DNA primer selection. Bioinformatics Research and Applications, 2006

integral memory bound ls imls1
Integral Memory-bound LS (IMLS)
  • Positive results
    • A general bi-criteria approximation for utility loss minimization
      • optimal loss with budget
      • () bi-criteria approximation: utility loss is at most using memory
    • LP-rounding based algorithm
integral memory bound ls imls2
Integral Memory-bound LS (IMLS)
  • Positive results (cont’d)
  • Another approximate special case:
    • If the memory can hold at least 1/f number of queries
      • memory capacity is reasonably large
    • An event can be in at most number of queries
  • A -approximation algorithm
    • For utility gain maximization
    • Use Knapsack-like approach
integral memory bound ls imls3
Integral Memory-bound LS (IMLS)
  • Positive results (cont’d)
  • Pseudo-polynomial-time solvable special case
    • Multi-tenant CEP applications, co-locating on same server
    • Disjoint events for each application
    • Each application has no more than events
  • IMLS can be solved in time O()
    • : total # of events
    • : total # of queries
    • M: memory budget
fractional memory bound ls fmls
Fractional Memory-bound LS (FMLS)
  • Negative result:
    • NP-hard even if each query has exactly two events
  • Positive result:
    • relative-approximation for utility gain maximization
      • If memory requirement of each event type exceeds total budget
      • controls precision (, )
      • max number of event in a query
      • Use a grid-based approach on Simplex [2]

[2] de Klerk, et al. A PTAS for the minimization of polynomials of fixed degree over the simplex. Theoretical Computer Science, 2006

integral cpu bound ls icls
Integral CPU-bound LS (ICLS)
  • Negative result
    • NP-complete
  • Positive result:
    • Admits an FPTAS: rounding off least significant bits
      • Use knapsack results
  • ICLS is an easy load shedding variant
fractional cpu bound ls fcls
Fractional CPU-bound LS (FCLS)
  • Positive result
    • Can be written as a simple Linear Program
    • Polynomial time solvable
  • FCLS is the easiest load shedding variant
integral dual bound ls idls
Integral Dual-bound LS (IDLS)
  • Negative result:
    • NP-hard & inapproximable
      • same as IMLS
  • Positive result:
    • A tri-criteria approximation
    • optimal loss with memory budget & CPU budget
    • At mostutility loss using memory & CPU
    • LP-rounding based algorithm
fractional dual bound ls fdls
Fractional Dual-bound LS (FDLS)
  • Negative result:
    • NP-hard even if each query has exactly two events
    • Same as FMLS since FDLS is a special case
  • Approximation: open problem
    • Non-convex optimization subject to non-convex constraints
    • We didn’t find good techniques for this
conclusion and future work
Conclusion and Future Work
  • Study the old problem of load shedding in the new context of CEP
  • Investigate six problem variants
    • Hardness & approximation (more results in the paper)
  • A rich problem with more to study
    • Delayed variants: instance-level optimization
    • Query language beyond positive event occurrence
slide23

Thank you!

Questions?

ad