Chris re julie letchner magdalena balazinska and dan suciu university of washington
Download
1 / 33

Extracting Events from Probabilistic Streams - PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on

Chris Re, Julie Letchner , Magdalena Balazinska and Dan Suciu University of Washington. Extracting Events from Probabilistic Streams. One Slide Overview. Motivating App: RFID Ecosystem Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Extracting Events from Probabilistic Streams' - lita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chris re julie letchner magdalena balazinska and dan suciu university of washington

Chris Re, Julie Letchner,

Magdalena Balazinskaand Dan Suciu

University of Washington

Extracting Events from Probabilistic Streams


One slide overview
One Slide Overview

  • Motivating App: RFID Ecosystem

    • Tagged people, cups, books, keys, laptops, etc.

  • Event queries [Cayuga, SASE, Snoop]

    • Alert when anyone enters the coffee room

  • Two problems

    • Missed readings, read-rates in practice are low

    • Granularity mismatch, e.g. Office v. Antenna 41

  • Instead, infer location from sensors

  • Propose, keep probs & query with PEEX+

PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.


Motivating apps
Motivating Apps

  • RFID apps

    • Diary and Active Calendar Application.

      • Alert if I go to a database meeting.

    • Supply chain

      • Alert if Mach 3 razors are being stolen

  • Many independent HMMs

    • Elder care [Intel,Patterson]

      • Alert if elder takes their medicine with water

    • Financial applications on predictive HMM

      • Alert if head-and-shoulders market


Outline
Outline

  • RFID to Probabilities via Particle Filters

  • PEEX+ query language

  • Extended Regular Query Algorithm

  • Experiments


The source of probabilities
The source of probabilities

Connectivity Diagram

6th Floor in PAC

Antennas

Blue ring is ground truth

Each orange particle is a guess of true location


Pfs to a prob db person
PFs to a (prob) DB person

At(tag,loc)

To query Particle Filter output, query At


Semantics of the model
Semantics of the Model

possible stream (worlds)

Prob =0.4 * 0.6 * …

NB: Markovian correlations OK

At(tag,loc)

“Joe enter O2 at t=8”

Query Semantic: sum weight of all worlds where Q is true at time t

Probability outside O2 (in H2,H3)


Outline1
Outline

  • RFID to Probabilities via Particle Filters

  • PEEX+ query language

  • Extended Regular Query Algorithm

  • Experiments


A hierarchy of peex queries
A hierarchy of PEEX+ queries

  • Regular Queries

    • Alert me when Joe goes to the coffee room

  • Extended Regular

    • Alert when anyone goes to the coffee room

  • Safe

    • Alert when anyone goes to the coffee room and a DB member follows them.

  • Hard Others (Simulation)

    • This line is sharp for some queries


Peex queries
Peex+ Queries

  • Fragment of Cayuga, queries define events.

p in some location

Same p in both

Technical Point: Left-to-right eval,


Regulars and extended regular
Regulars and Extended Regular

  • Query is regular if no variable is shared between subgoals

  • Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query

p is shared between subgoals


Wrinkle in the language filter v selection
Wrinkle in the language:Filter v. Selection

“Alert next time Joe is in 502 after he is in 501”

Yes

“Alert if the next place Joe is in after 501 is 502”

No

At

Time


Outline2
Outline

  • RFID to Probabilities via Particle Filters

  • PEEX+ query language

  • Extended Regular Query Algorithm

  • Experiments


Why are er queries hard
Why are ER queries hard?

  • Regular Queries ~ Regular Expressions

    • Mapping is non-trivial

      • similar to Cayuga [Demers et al. 06]

    • Queries have #P-combined complexity

      • Can encode mDNF as regular expression

    • Intuition: n-sized automaton leads to

  • Extended regular ~ 1 NFA per/person

    • k persons implies O(k)-size automaton

    • Exponential cost

When ER, can avoid blowup


Algorithm for regular queries overview
Algorithm for Regular Queries Overview

Deterministic Algorithm

  • Compile a query q

    • NFA –like-thing in a language

    • Mapping events to subsets of

  • At runtime, at time t have events E

    • Create set of symbols at time t:

    • Process NFA on

Focus on the compilation


Compile select and filter
Compile Select and Filter

  • Intuition: goal maps to two letters:

    • match (m) : matches filter

    • accept (a) : accepted by select

Does not contain

Final

language and automaton are the same for both queries

Does contain


The difference is the mapping
The difference is the mapping

Does not contain

Final

Does contain


Regular queries w probabilities
Regular Queries w. Probabilities

State at t+1 only depends on state at t and input at t+1

Probabilistic Algorithm

  • Compile a query q

    • NFA with transition in a language

    • Mapping events to subsets of

  • At time t have events E with probs

    • Create set of symbols at time t:

    • Process NFA on

Stays the same

Algorithm is constant in data, exponential in |Q|

distribution on inputs

distribution on states


Extension to extended regular
Extension to Extended regular

  • “Alert when anyonein 501 and next step in 502”

  • If substitute for p, result is regular

  • Bindings use disjoint sets of tuples.

  • Algorithm: independent copies, multiply

Depends on # distinct values (shared vars), not # of timesteps – can stream


Recap of algorithms
Recap of Algorithms

  • Regular Queries

    • Compiled them to an NFA, then used image

    • Data complexity O(1)

  • Extended regular

    • Several regulars multiplied together

    • Depends on number of distinct people in the data, not number of time steps.

  • Markov Correlations: more arithmetic & state


Peex algorithms and analysis
PEEX+ Algorithms and Analysis

  • Compilation procedures

  • Safe plans.

    • More complicated based on algebra

    • cost grows with data (useful for archives)

  • Aggregates

  • Complexity: Can we do better?

    • For a restricted class, draw a crisp line

    • Minor variants of safe result in hardness


Outline3
Outline

  • RFID to Probabilities via Particle Filters

  • PEEX+ query language

  • Extended Regular Query Algorithm

  • Experiments


Experimental setup
Experimental Setup

  • Quality Experiment

    • 52 objects, 352 locations, 10k sq. ft.

      • 2x30m trace with 10 m break in between

    • Participants marked down true locations

    • “Alert when anyone enters the Coffee Room”

  • Consider two Scenarios

    • Realtime (No correlations) v. MLE

    • Archived (Smoothing) v. Viterbi

In practice, can smooth in a short time


Quality realtime
Quality: Realtime

  • Declare an event “true”, if its Pr > threshold

    • Vary threshold

10% improvement in F1

Precision

Recall

F1


Quality archived
Quality: Archived

  • Smoothing v. Viterbi

    • PEEX keeps track of Markovian Correlations

Approx ~30% gain in F1

Precision

Recall

F1



Conclusion
Conclusion

  • Showed PEEX+

    • Processed output of several inference tasks

      • Applies more generally than just RFID

  • Quality (F1) gains by keeping probability

    • 50% from probs, 50% from correlations

  • Performance was usable in real-time

    • No indexing!

  • Preprint available on request


Future work
Future Work

  • Implementing archived stream indexing.

    • Aggregations in time

    • Aggressive indexing

    • Ranking? Top-K?

  • Shaper lines for complexity

    • Are there more streamable queries?

  • Richer language

    • Similar to linear style plans

    • What do people need?

  • Temporal Models!

    • Consistency



Sequencing by example
Sequencing by example

  • Sequencing is parameterized [Cayuga]

Semicolon means “the next event among those that match next goal”

Semicolon is not “after”

Time


Compilation by example
Compilation by example

  • Each goal “corresponds” to two letters:

    • move (m) – the query should advance

    • accept (a) – the next subgoal accepts

Does not contain

Final

Any other maps to empty set

Does contain


Subtle example
Subtle example..

  • What about:

Does not contain

Final

Any other maps to empty set

Does contain


ad