On load shedding in complex event processing
Download
1 / 23

On Load Shedding in Complex Event Processing - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

On Load Shedding in Complex Event Processing. Authors: Yeye He Microsoft Research Siddharth Barman California Institute of Technology Jeffrey F. Naughton University of Wisconsin-Madison. Presenter (non-author): Arvind Arasu Microsoft Research .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' On Load Shedding in Complex Event Processing' - eithne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
On load shedding in complex event processing

On Load Shedding in Complex Event Processing

Authors:

Yeye He Microsoft Research

SiddharthBarman California Institute of Technology

Jeffrey F. NaughtonUniversity of Wisconsin-Madison

Presenter (non-author): Arvind Arasu Microsoft Research


Overview
Overview

  • Background: Complex Event Processing (CEP)

    • A different stream processing model

  • Problem: Load shedding in CEP

    • Maximize utility under resource constraints

  • Focus of this work

    • A problem taxonomy, hardness, and approximations


Overview1
Overview

  • Background: Complex Event Processing (CEP)

    • A different stream processing model

  • Problem: Load shedding in CEP

    • Maximize utility under resource constraints

  • Focus of this work

    • A problem taxonomy, hardness, and approximations


Background cep data model
Background: CEP Data Model

  • CEP event data

    • Event stream S = (e1, e2, … )

    • Each event eiis associated with an event type Ej

    • Each event ei has a time-stamp, t(ei)

    • Stream S is temporally ordered: t(ei) < t(ei+1), for all i

a1

b2

c3

d4

a5

b6

c7

d8

The superscript of event to denote the time-stamp,e.g. t(a1) = 1

Each event is associated with a type, e.g. event a1is of type A

A set of four event types = {A, B, C, D}


Background cep query model
Background: CEP Query Model

  • CEP sequence query

    • Q = SEQ(E1, E2, ..Em), where Ek are event types

    • A time-based query window T(Q)

    • Only consider conjunctive queries in this work

  • An event sequence (, … ) is a query match of Q, if

    • Types match: is of type Ek for all k [m]

    • In query window: t() - t() T(Q)

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B) in 4 min

  • Q2= SEQ4(B, C) in 4 min

  • Q3= SEQ4(C, D) in 4 min

Q1

Q1

Q3

Q3

Q2

Q2

Outside time-window

Q1


The load shedding problem
The Load Shedding Problem

  • Event streams are often bursty

  • Not all events can be processed timely

    • Given resource constraints (CPU/memory)

  • Problem: Selectively “shed” data/processing

    • To preserve the most useful query results


Query utility in cep
Query Utility in CEP

  • Use query utility to quantify usefulness

    • Utility weight w(Qi) of query Qi to model importance

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B)

  • Q2 = SEQ4(B, C)

  • Q3= SEQ4(C, D)

  • , W(Q1)=3

  • , W(Q2)=2

  • , W(Q3)=4

Q3

Q1

Q3

Q1

W=4

W=3

W=4

W=3

Q2

Q2

W=2

W=2


Utility maximizing load shedding
Utility Maximizing Load Shedding

  • Given a set of queries {Qi}

  • Given expectedquery matches in unit time interval

    • Estimated using event arrival statistics

  • Find a type-level, global shedding strategy that

    • Maximize the expected utility

    • Respect resource constraints (Memory/CPU/Dual)

    • Integral: discard all events/queries of certain types

    • Fractional: discard randomly sampled events/queries of certain types


Why expected utility
Why Expected Utility?

  • Online algorithms with competitive ratio?

    • Hopeless!

  • No algorithm can have competitive ratio better than , where is the length of the event sequence

    • Prove by using an adversarial scenario


An adversarial scenario
An Adversarial Scenario

  • event types:

  • unit-weight queries: SEQ(),

  • Event sequence: ()

    • is of type ,

    • drawn from with equal probability

  • Memory budget = 2 events

  • Offline optimal: utility = 1

    • pick one from based on X

  • Online optimal: expected utility =

  • Competitive ratio:

Instead, we optimize utility in the expected sense


Resource constraint limited cpu
Resource Constraint: Limited CPU

  • Not all queries can be processed by CPU

  • E.g., CPU need to process 3 unit-cost queries (per 4 time units)

    • Unit-cost for simplicity, queries can have arbitrary costs

  • Suppose CPU can only process 2 queries

    • Best strategy: discard Q2, keep Q1 and Q3 (highest gain queries)

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B), W(Q1)=3

  • Q2= SEQ4(B, C), W(Q2)=2

  • Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2


Resource constraint limited memory
Resource Constraint: Limited Memory

  • Not all events can be kept in memory

  • E.g., need to keep 4 events in memory (in 4 time units)

    • Because query window = 4

  • Suppose memory = 3 (per 4 time units)

    • Best strategy: keep B, C, D and discard A. U=+=6

    • Discard D? U=+=5

    • Discard B? U==4; Discard C? U==3

a1

b2

c3

d4

a5

b6

c7

d8

  • Q1 = SEQ4(A, B), W(Q1)=3

  • Q2 = SEQ4(B, C), W(Q2)=2

  • Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2


Integral memory bound ls imls
Integral Memory-bound LS (IMLS)

  • Negative results

    • NP-hard

    • Unlikely to be approximated within

      • Unless 3SAT

    • Reduction from Densest k-Sub-Hypergraph

[1] Hajiaghayi, et al. The minimum k-colored subgraphproblem in Haplotypingand DNA primer selection. Bioinformatics Research and Applications, 2006


Integral memory bound ls imls1
Integral Memory-bound LS (IMLS)

  • Positive results

    • A general bi-criteria approximation for utility loss minimization

      • optimal loss with budget

      • () bi-criteria approximation: utility loss is at most using memory

    • LP-rounding based algorithm


Integral memory bound ls imls2
Integral Memory-bound LS (IMLS)

  • Positive results (cont’d)

  • Another approximate special case:

    • If the memory can hold at least 1/f number of queries

      • memory capacity is reasonably large

    • An event can be in at most number of queries

  • A -approximation algorithm

    • For utility gain maximization

    • Use Knapsack-like approach


Integral memory bound ls imls3
Integral Memory-bound LS (IMLS)

  • Positive results (cont’d)

  • Pseudo-polynomial-time solvable special case

    • Multi-tenant CEP applications, co-locating on same server

    • Disjoint events for each application

    • Each application has no more than events

  • IMLS can be solved in time O()

    • : total # of events

    • : total # of queries

    • M: memory budget


Fractional memory bound ls fmls
Fractional Memory-bound LS (FMLS)

  • Negative result:

    • NP-hard even if each query has exactly two events

  • Positive result:

    • relative-approximation for utility gain maximization

      • If memory requirement of each event type exceeds total budget

      • controls precision (, )

      • max number of event in a query

      • Use a grid-based approach on Simplex [2]

[2] de Klerk, et al. A PTAS for the minimization of polynomials of fixed degree over the simplex. Theoretical Computer Science, 2006


Integral cpu bound ls icls
Integral CPU-bound LS (ICLS)

  • Negative result

    • NP-complete

  • Positive result:

    • Admits an FPTAS: rounding off least significant bits

      • Use knapsack results

  • ICLS is an easy load shedding variant


Fractional cpu bound ls fcls
Fractional CPU-bound LS (FCLS)

  • Positive result

    • Can be written as a simple Linear Program

    • Polynomial time solvable

  • FCLS is the easiest load shedding variant


Integral dual bound ls idls
Integral Dual-bound LS (IDLS)

  • Negative result:

    • NP-hard & inapproximable

      • same as IMLS

  • Positive result:

    • A tri-criteria approximation

    • optimal loss with memory budget & CPU budget

    • At mostutility loss using memory & CPU

    • LP-rounding based algorithm


Fractional dual bound ls fdls
Fractional Dual-bound LS (FDLS)

  • Negative result:

    • NP-hard even if each query has exactly two events

    • Same as FMLS since FDLS is a special case

  • Approximation: open problem

    • Non-convex optimization subject to non-convex constraints

    • We didn’t find good techniques for this


Conclusion and future work
Conclusion and Future Work

  • Study the old problem of load shedding in the new context of CEP

  • Investigate six problem variants

    • Hardness & approximation (more results in the paper)

  • A rich problem with more to study

    • Delayed variants: instance-level optimization

    • Query language beyond positive event occurrence


Thank you!

Questions?


ad