on load shedding in complex event processing n.
Download
Skip this Video
Download Presentation
On Load Shedding in Complex Event Processing

Loading in 2 Seconds...

play fullscreen
1 / 23

On Load Shedding in Complex Event Processing - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

On Load Shedding in Complex Event Processing. Authors: Yeye He Microsoft Research Siddharth Barman California Institute of Technology Jeffrey F. Naughton University of Wisconsin-Madison. Presenter (non-author): Arvind Arasu Microsoft Research .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'On Load Shedding in Complex Event Processing' - eithne


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
on load shedding in complex event processing

On Load Shedding in Complex Event Processing

Authors:

Yeye He Microsoft Research

SiddharthBarman California Institute of Technology

Jeffrey F. NaughtonUniversity of Wisconsin-Madison

Presenter (non-author): Arvind Arasu Microsoft Research

overview
Overview
  • Background: Complex Event Processing (CEP)
    • A different stream processing model
  • Problem: Load shedding in CEP
    • Maximize utility under resource constraints
  • Focus of this work
    • A problem taxonomy, hardness, and approximations
overview1
Overview
  • Background: Complex Event Processing (CEP)
    • A different stream processing model
  • Problem: Load shedding in CEP
    • Maximize utility under resource constraints
  • Focus of this work
    • A problem taxonomy, hardness, and approximations
background cep data model
Background: CEP Data Model
  • CEP event data
    • Event stream S = (e1, e2, … )
    • Each event eiis associated with an event type Ej
    • Each event ei has a time-stamp, t(ei)
    • Stream S is temporally ordered: t(ei) < t(ei+1), for all i

a1

b2

c3

d4

a5

b6

c7

d8

The superscript of event to denote the time-stamp,e.g. t(a1) = 1

Each event is associated with a type, e.g. event a1is of type A

A set of four event types = {A, B, C, D}

background cep query model
Background: CEP Query Model
  • CEP sequence query
    • Q = SEQ(E1, E2, ..Em), where Ek are event types
    • A time-based query window T(Q)
    • Only consider conjunctive queries in this work
  • An event sequence (, … ) is a query match of Q, if
    • Types match: is of type Ek for all k [m]
    • In query window: t() - t() T(Q)

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B) in 4 min
  • Q2= SEQ4(B, C) in 4 min
  • Q3= SEQ4(C, D) in 4 min

Q1

Q1

Q3

Q3

Q2

Q2

Outside time-window

Q1

the load shedding problem
The Load Shedding Problem
  • Event streams are often bursty
  • Not all events can be processed timely
    • Given resource constraints (CPU/memory)
  • Problem: Selectively “shed” data/processing
    • To preserve the most useful query results
query utility in cep
Query Utility in CEP
  • Use query utility to quantify usefulness
      • Utility weight w(Qi) of query Qi to model importance

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B)
  • Q2 = SEQ4(B, C)
  • Q3= SEQ4(C, D)
  • , W(Q1)=3
  • , W(Q2)=2
  • , W(Q3)=4

Q3

Q1

Q3

Q1

W=4

W=3

W=4

W=3

Q2

Q2

W=2

W=2

utility maximizing load shedding
Utility Maximizing Load Shedding
  • Given a set of queries {Qi}
  • Given expectedquery matches in unit time interval
    • Estimated using event arrival statistics
  • Find a type-level, global shedding strategy that
    • Maximize the expected utility
    • Respect resource constraints (Memory/CPU/Dual)
    • Integral: discard all events/queries of certain types
    • Fractional: discard randomly sampled events/queries of certain types
why expected utility
Why Expected Utility?
  • Online algorithms with competitive ratio?
    • Hopeless!
  • No algorithm can have competitive ratio better than , where is the length of the event sequence
    • Prove by using an adversarial scenario
an adversarial scenario
An Adversarial Scenario
  • event types:
  • unit-weight queries: SEQ(),
  • Event sequence: ()
    • is of type ,
    • drawn from with equal probability
  • Memory budget = 2 events
  • Offline optimal: utility = 1
    • pick one from based on X
  • Online optimal: expected utility =
  • Competitive ratio:

Instead, we optimize utility in the expected sense

resource constraint limited cpu
Resource Constraint: Limited CPU
  • Not all queries can be processed by CPU
  • E.g., CPU need to process 3 unit-cost queries (per 4 time units)
    • Unit-cost for simplicity, queries can have arbitrary costs
  • Suppose CPU can only process 2 queries
    • Best strategy: discard Q2, keep Q1 and Q3 (highest gain queries)

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B), W(Q1)=3
  • Q2= SEQ4(B, C), W(Q2)=2
  • Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2

resource constraint limited memory
Resource Constraint: Limited Memory
  • Not all events can be kept in memory
  • E.g., need to keep 4 events in memory (in 4 time units)
    • Because query window = 4
  • Suppose memory = 3 (per 4 time units)
    • Best strategy: keep B, C, D and discard A. U=+=6
    • Discard D? U=+=5
    • Discard B? U==4; Discard C? U==3

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B), W(Q1)=3
  • Q2 = SEQ4(B, C), W(Q2)=2
  • Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2

integral memory bound ls imls
Integral Memory-bound LS (IMLS)
  • Negative results
    • NP-hard
    • Unlikely to be approximated within
      • Unless 3SAT
    • Reduction from Densest k-Sub-Hypergraph

[1] Hajiaghayi, et al. The minimum k-colored subgraphproblem in Haplotypingand DNA primer selection. Bioinformatics Research and Applications, 2006

integral memory bound ls imls1
Integral Memory-bound LS (IMLS)
  • Positive results
    • A general bi-criteria approximation for utility loss minimization
      • optimal loss with budget
      • () bi-criteria approximation: utility loss is at most using memory
    • LP-rounding based algorithm
integral memory bound ls imls2
Integral Memory-bound LS (IMLS)
  • Positive results (cont’d)
  • Another approximate special case:
    • If the memory can hold at least 1/f number of queries
      • memory capacity is reasonably large
    • An event can be in at most number of queries
  • A -approximation algorithm
    • For utility gain maximization
    • Use Knapsack-like approach
integral memory bound ls imls3
Integral Memory-bound LS (IMLS)
  • Positive results (cont’d)
  • Pseudo-polynomial-time solvable special case
    • Multi-tenant CEP applications, co-locating on same server
    • Disjoint events for each application
    • Each application has no more than events
  • IMLS can be solved in time O()
    • : total # of events
    • : total # of queries
    • M: memory budget
fractional memory bound ls fmls
Fractional Memory-bound LS (FMLS)
  • Negative result:
    • NP-hard even if each query has exactly two events
  • Positive result:
    • relative-approximation for utility gain maximization
      • If memory requirement of each event type exceeds total budget
      • controls precision (, )
      • max number of event in a query
      • Use a grid-based approach on Simplex [2]

[2] de Klerk, et al. A PTAS for the minimization of polynomials of fixed degree over the simplex. Theoretical Computer Science, 2006

integral cpu bound ls icls
Integral CPU-bound LS (ICLS)
  • Negative result
    • NP-complete
  • Positive result:
    • Admits an FPTAS: rounding off least significant bits
      • Use knapsack results
  • ICLS is an easy load shedding variant
fractional cpu bound ls fcls
Fractional CPU-bound LS (FCLS)
  • Positive result
    • Can be written as a simple Linear Program
    • Polynomial time solvable
  • FCLS is the easiest load shedding variant
integral dual bound ls idls
Integral Dual-bound LS (IDLS)
  • Negative result:
    • NP-hard & inapproximable
      • same as IMLS
  • Positive result:
    • A tri-criteria approximation
    • optimal loss with memory budget & CPU budget
    • At mostutility loss using memory & CPU
    • LP-rounding based algorithm
fractional dual bound ls fdls
Fractional Dual-bound LS (FDLS)
  • Negative result:
    • NP-hard even if each query has exactly two events
    • Same as FMLS since FDLS is a special case
  • Approximation: open problem
    • Non-convex optimization subject to non-convex constraints
    • We didn’t find good techniques for this
conclusion and future work
Conclusion and Future Work
  • Study the old problem of load shedding in the new context of CEP
  • Investigate six problem variants
    • Hardness & approximation (more results in the paper)
  • A rich problem with more to study
    • Delayed variants: instance-level optimization
    • Query language beyond positive event occurrence
slide23

Thank you!

Questions?