- 117 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' On Load Shedding in Complex Event Processing' - eithne

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### On Load Shedding in Complex Event Processing

Authors:

Yeye He Microsoft Research

SiddharthBarman California Institute of Technology

Jeffrey F. NaughtonUniversity of Wisconsin-Madison

Presenter (non-author): Arvind Arasu Microsoft Research

Overview

- Background: Complex Event Processing (CEP)
- A different stream processing model
- Problem: Load shedding in CEP
- Maximize utility under resource constraints
- Focus of this work
- A problem taxonomy, hardness, and approximations

Overview

- Background: Complex Event Processing (CEP)
- A different stream processing model
- Problem: Load shedding in CEP
- Maximize utility under resource constraints
- Focus of this work
- A problem taxonomy, hardness, and approximations

Background: CEP Data Model

- CEP event data
- Event stream S = (e1, e2, … )
- Each event eiis associated with an event type Ej
- Each event ei has a time-stamp, t(ei)
- Stream S is temporally ordered: t(ei) < t(ei+1), for all i

a1

b2

c3

d4

a5

b6

c7

d8

The superscript of event to denote the time-stamp,e.g. t(a1) = 1

Each event is associated with a type, e.g. event a1is of type A

A set of four event types = {A, B, C, D}

Background: CEP Query Model

- CEP sequence query
- Q = SEQ(E1, E2, ..Em), where Ek are event types
- A time-based query window T(Q)
- Only consider conjunctive queries in this work
- An event sequence (, … ) is a query match of Q, if
- Types match: is of type Ek for all k [m]
- In query window: t() - t() T(Q)

a1

b2

c3

d4

a5

b6

c7

d8

- Q1 = SEQ4(A, B) in 4 min
- Q2= SEQ4(B, C) in 4 min
- Q3= SEQ4(C, D) in 4 min

Q1

Q1

Q3

Q3

Q2

Q2

Outside time-window

Q1

The Load Shedding Problem

- Event streams are often bursty
- Not all events can be processed timely
- Given resource constraints (CPU/memory)
- Problem: Selectively “shed” data/processing
- To preserve the most useful query results

Query Utility in CEP

- Use query utility to quantify usefulness
- Utility weight w(Qi) of query Qi to model importance

a1

b2

c3

d4

a5

b6

c7

d8

- Q1 = SEQ4(A, B)
- Q2 = SEQ4(B, C)
- Q3= SEQ4(C, D)

- , W(Q1)=3
- , W(Q2)=2
- , W(Q3)=4

Q3

Q1

Q3

Q1

W=4

W=3

W=4

W=3

Q2

Q2

W=2

W=2

Utility Maximizing Load Shedding

- Given a set of queries {Qi}
- Given expectedquery matches in unit time interval
- Estimated using event arrival statistics
- Find a type-level, global shedding strategy that
- Maximize the expected utility
- Respect resource constraints (Memory/CPU/Dual)
- Integral: discard all events/queries of certain types
- Fractional: discard randomly sampled events/queries of certain types

Why Expected Utility?

- Online algorithms with competitive ratio?
- Hopeless!
- No algorithm can have competitive ratio better than , where is the length of the event sequence
- Prove by using an adversarial scenario

An Adversarial Scenario

- event types:
- unit-weight queries: SEQ(),
- Event sequence: ()
- is of type ,
- drawn from with equal probability
- Memory budget = 2 events
- Offline optimal: utility = 1
- pick one from based on X
- Online optimal: expected utility =
- Competitive ratio:

Instead, we optimize utility in the expected sense

Resource Constraint: Limited CPU

- Not all queries can be processed by CPU
- E.g., CPU need to process 3 unit-cost queries (per 4 time units)
- Unit-cost for simplicity, queries can have arbitrary costs
- Suppose CPU can only process 2 queries
- Best strategy: discard Q2, keep Q1 and Q3 (highest gain queries)

a1

b2

c3

d4

a5

b6

c7

d8

- Q1 = SEQ4(A, B), W(Q1)=3
- Q2= SEQ4(B, C), W(Q2)=2
- Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2

Resource Constraint: Limited Memory

- Not all events can be kept in memory
- E.g., need to keep 4 events in memory (in 4 time units)
- Because query window = 4
- Suppose memory = 3 (per 4 time units)
- Best strategy: keep B, C, D and discard A. U=+=6
- Discard D? U=+=5
- Discard B? U==4; Discard C? U==3

a1

b2

c3

d4

a5

b6

c7

d8

- Q1 = SEQ4(A, B), W(Q1)=3
- Q2 = SEQ4(B, C), W(Q2)=2
- Q3= SEQ4(C, D), W(Q3)=4

Q3

W=4

Q1

W=3

Q3

W=4

Q1

W=3

Q2

W=2

Q2

W=2

Integral Memory-bound LS (IMLS)

- Negative results
- NP-hard
- Unlikely to be approximated within
- Unless 3SAT
- Reduction from Densest k-Sub-Hypergraph

[1] Hajiaghayi, et al. The minimum k-colored subgraphproblem in Haplotypingand DNA primer selection. Bioinformatics Research and Applications, 2006

Integral Memory-bound LS (IMLS)

- Positive results
- A general bi-criteria approximation for utility loss minimization
- optimal loss with budget
- () bi-criteria approximation: utility loss is at most using memory
- LP-rounding based algorithm

Integral Memory-bound LS (IMLS)

- Positive results (cont’d)
- Another approximate special case:
- If the memory can hold at least 1/f number of queries
- memory capacity is reasonably large
- An event can be in at most number of queries
- A -approximation algorithm
- For utility gain maximization
- Use Knapsack-like approach

Integral Memory-bound LS (IMLS)

- Positive results (cont’d)
- Pseudo-polynomial-time solvable special case
- Multi-tenant CEP applications, co-locating on same server
- Disjoint events for each application
- Each application has no more than events
- IMLS can be solved in time O()
- : total # of events
- : total # of queries
- M: memory budget

Fractional Memory-bound LS (FMLS)

- Negative result:
- NP-hard even if each query has exactly two events
- Positive result:
- relative-approximation for utility gain maximization
- If memory requirement of each event type exceeds total budget
- controls precision (, )
- max number of event in a query
- Use a grid-based approach on Simplex [2]

[2] de Klerk, et al. A PTAS for the minimization of polynomials of fixed degree over the simplex. Theoretical Computer Science, 2006

Integral CPU-bound LS (ICLS)

- Negative result
- NP-complete
- Positive result:
- Admits an FPTAS: rounding off least significant bits
- Use knapsack results
- ICLS is an easy load shedding variant

Fractional CPU-bound LS (FCLS)

- Positive result
- Can be written as a simple Linear Program
- Polynomial time solvable
- FCLS is the easiest load shedding variant

Integral Dual-bound LS (IDLS)

- Negative result:
- NP-hard & inapproximable
- same as IMLS
- Positive result:
- A tri-criteria approximation
- optimal loss with memory budget & CPU budget
- At mostutility loss using memory & CPU
- LP-rounding based algorithm

Fractional Dual-bound LS (FDLS)

- Negative result:
- NP-hard even if each query has exactly two events
- Same as FMLS since FDLS is a special case
- Approximation: open problem
- Non-convex optimization subject to non-convex constraints
- We didn’t find good techniques for this

Conclusion and Future Work

- Study the old problem of load shedding in the new context of CEP
- Investigate six problem variants
- Hardness & approximation (more results in the paper)
- A rich problem with more to study
- Delayed variants: instance-level optimization
- Query language beyond positive event occurrence

Questions?

Download Presentation

Connecting to Server..