1 / 47

LAHAR: Extracting Events from Probabilistic Streams

LAHAR: Extracting Events from Probabilistic Streams. Chris Re, Julie Letchner , Magdalena Balazinska and Dan Suciu University of Washington. What is a Lahar ?. This is a Lahar. It’s a massive, fast stream of dirt(y data).

armine
Download Presentation

LAHAR: Extracting Events from Probabilistic Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinskaand Dan Suciu University of Washington

  2. What is a Lahar? This is a Lahar It’s a massive, fast stream of dirt(y data) Our system, Lahar, processes querieson massive, dirty streams of data May 18, 1980 ~ 8:27am … a few minutes later Lahar -- SIGMOD 2008 -- Christopher Re

  3. Event Queries • Motivating App: RFID • Event queries as Cayuga, Sase and Snoop • Complex sequences using projections, predicates,… E D C B Query: “Alert when Joe enters 422” A Joe entered office 422 at t=8 i.e. Joe outside 422, inside 422 Lahar -- SIGMOD 2008 -- Christopher Re

  4. Challenges: Tracking Joe’s Location Antennas 6th Floor in CS building Blue ring is Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re

  5. Challenges: Tracking Joe’s Location • Propose: infer location, keep probs & query with Lahar • Model Based View [Deshpandeet al] of an HMM Antennas Two Problems: Missed Readings Granularity Mismatch 6th Floor in CS building Blue ring is Joe’s Location Lahar retains probabilities, achieves higher quality (P/R) and is still efficient. Lahar -- SIGMOD 2008 -- Christopher Re

  6. Outline • RFID streams to probabilistic streams • Lahar queries on probabilistic streams • Query algorithms: Regular and Extended Regular • Experiments Lahar -- SIGMOD 2008 -- Christopher Re

  7. Tracking Joe’s Location 6th Floor in CS building Antennas Blue ring is ground truth Lahar -- SIGMOD 2008 -- Christopher Re

  8. Probabilities via particle filter 6th Floor in CS building Antennas Blue ring is ground truth Each orange particle is a guess of Joe’s location Particles guess many locations per timestep, so data are uncertain Lahar -- SIGMOD 2008 -- Christopher Re

  9. From particles to a probabilistic stream At(tag,loc) Query Particle Filter output via At – a model based view Lahar -- SIGMOD 2008 -- Christopher Re

  10. Semantics of the Model possible stream (worlds) Prob = 0.2 * 0.6* … At(tag,loc) A query q returns the probability that q is true at each time t “Joe enters 422” @ t=8 (0.4+0.2) * 0.6 = 0.36 Probability outside 422 (in Hall3,Hall4) Lahar -- SIGMOD 2008 -- Christopher Re

  11. Outline • RFID streams to probabilistic streams • Lahar queries on probabilistic streams • Query algorithms: Regular and Extended Regular • Experiments Lahar -- SIGMOD 2008 -- Christopher Re

  12. Inspired by Cayuga [Demers et al 2006, White et al 2007] Lahar Queries by Example Alert when Joe is in hallway 4 and later in office 422 Lahar -- SIGMOD 2008 -- Christopher Re

  13. Inspired by Cayuga [Demers et al 2006, White et al 2007] Lahar Queries by Example Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4 Joe in 422 Lahar -- SIGMOD 2008 -- Christopher Re

  14. Inspired by Cayuga [Demers et al 2006, White et al 2007] Lahar Queries by Example Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4 Joe in 422 Alert when Joe is in hallway 4, and immediately in office 422 Lahar -- SIGMOD 2008 -- Christopher Re

  15. Inspired by Cayuga [Demers et al 2006, White et al 2007] Lahar Queries by Example Alert when Joe is in hallway 4 and later in office 422 Challenge with probabilities: Naïve approach is exponential; unavoidable (#P) Joe in Hall4 Joe in 422 Alert when Joe is in hallway 4, and immediately in office 422 Joe in Hall4 Joe in 422 Lahar -- SIGMOD 2008 -- Christopher Re

  16. A hierarchy of Lahar queries • Regular Queries (Efficient, streamable) • Alert when Joe enters 422 • Extended Regular(Efficient, streamable) • Alert when anyone enters 422 Lahar -- SIGMOD 2008 -- Christopher Re

  17. A hierarchy of Lahar queries • Regular Queries (Efficient, streamable) • Alert when Joe enters 422 • Extended Regular(Efficient, streamable) • Alert when anyone enters 422 • Safe (Efficient, but not streamable) • Unsafe (Inefficient) Lahar -- SIGMOD 2008 -- Christopher Re

  18. Outline • RFID streams to probabilistic streams • Lahar queries on probabilistic streams • Query algorithms: Regular and Extended Regular • Experiments Lahar -- SIGMOD 2008 -- Christopher Re

  19. Review: A non-probabilistic example • Alert me when Joe enters 422 {} {1} {2} Accept at t = 8 Final {} {1} Joe in Hall4 Joe in 422 1 2 {} Lahar -- SIGMOD 2008 -- Christopher Re

  20. … now with probabilities • Alert me when Joe enters 422 Accept t=8 with p = 0.3 Distribution on States {} 1.0 {} 0.5, {1} 0.5 Final {} 0.65, {1} 0.05, {2} 0.3 Joe in Hall4 Joe in 422 1 2 Lahar -- SIGMOD 2008 -- Christopher Re

  21. Lies in the preceding slides… (technical details) • Richer predication: “Alert when Joe enters any office” • Translate query and input into an alphabet • Key Technical Detail: • Alphabet is small in data • Streamable • See paper for compilation Final Joe in Hall4 Joe in 422 1 2 Lahar -- SIGMOD 2008 -- Christopher Re

  22. Extension to Extended regular “Alert when anyone enters 422” Lahar -- SIGMOD 2008 -- Christopher Re

  23. Extension to Extended regular • Algorithm: • (Obs1) suggests run automaton for each person • (Obs2) suggests multiply to get prob any is true “Alert when anyone enters 422” (Obs 1) Each query is regular (Obs 2) disjoint sets of events Hence, probabilistically independent Space = O(# persons), not # timesteps: can stream Lahar -- SIGMOD 2008 -- Christopher Re

  24. Summary of Contributions • Regular Queries (Efficient, streamable) • Compiled to an automaton,streaming, O(1) space • Extended regular (Efficient, streamable) • Streaming with O(m) space, i.e. # of persons. • See paper for Markovian correlations, more sophisticated predication, complete compilation and static analysis algorithms • Safe (Efficient, but not streamable) • Unsafe (Inefficient, most #P-hard)

  25. Outline • RFID streams to probabilistic streams • Lahar queries on probabilistic streams • Query algorithms: Regular and Extended Regular • Experiments Lahar -- SIGMOD 2008 -- Christopher Re

  26. Experimental Setup • Quality: How is P/R affected by keeping probs? • 52 objects, 352 locations, 10k sq. ft. • 2x30min trace with 10 min break in between • Participants marked down true locations Lahar -- SIGMOD 2008 -- Christopher Re

  27. Experimental Setup • Quality: How is P/R affected by keeping probs? • 52 objects, 352 locations, 10k sq. ft. • 2x30min trace with 10 min break in between • Participants marked down true locations • “Alert when anyone enters a coffee room” • Baseline: Most Likely Estimate (MLE) • Each timestep/Each person: most likely location Lahar -- SIGMOD 2008 -- Christopher Re

  28. Quality: Realtime – Improve over MLE? • Declare an event “true”, if its Pr > threshold • Vary threshold 10% improvement in F1 Precision Recall F1 Lahar -- SIGMOD 2008 -- Christopher Re

  29. Performance: Is the cost too high? Synthetic Data – Same query Lahar -- SIGMOD 2008 -- Christopher Re

  30. Related Work • Event Queries – Deterministic • Cayuga, SASE, SnoopIB • Model-Based Views • BBQ, recently, Kanagalet al ICDE 08 • Probabilistic Databases • Mystiq, Trio, MayBMS, Maryland, Purdue,MCDB • Particle Filters on HMMs • Doucet, Godsill Lahar -- SIGMOD 2008 -- Christopher Re

  31. Conclusion • Showed Lahar • Processed output of several inference tasks (HMMs) • Applies more generally than just RFID • Quality (F1) gains by keeping probability • Performance usable in real-time • Lots of concurrent tags • No indexing! Lahar -- SIGMOD 2008 -- Christopher Re

  32. Lahar -- SIGMOD 2008 -- Christopher Re

  33. NB: example to follow Overview of Regular Query Algorithm • Compile an event query q • Automaton (A) over a language L • Mapping (M) events to subsets of L • Runtime – Input is set of events E • Map E into subsets of L via M • Maintain set of possible states of A Deterministic Probabilistic stays same stays same distribution distribution Size of distribution depends only on the query, q. For details, see paper Lahar -- SIGMOD 2008 -- Christopher Re

  34. Why are ER queries hard? • Regular Queries ~ Regular Expressions • Mapping is non-trivial • Inspired by Cayuga [Demers et al. 06] • Queries have #P-combined complexity • Encode mDNF as regular expression • Intuition: n-sized automaton leads to • Extended regular ~ 1 NFA per/person • k persons implies O(k)-size automaton • Exponential cost When ER, can avoid blowup Lahar -- SIGMOD 2008 -- Christopher Re

  35. Regular and Extended Regular • Query is regular if no variable is shared between subgoals • Query is extended regular if any variable shared by two subgoals, is shared by all subgoals p is shared between subgoals Lahar -- SIGMOD 2008 -- Christopher Re

  36. Correlations Lahar -- SIGMOD 2008 -- Christopher Re

  37. Sequencing by example • Sequencing is parameterized [Cayuga] Semicolon means “the next event among those that match next goal” Semicolon is not “after” Time Lahar -- SIGMOD 2008 -- Christopher Re

  38. Compilation by example • Each goal “corresponds” to two letters: • move (m) – the query should advance • accept (a) – the next subgoal accepts Does not contain Final Any other maps to empty set Lahar -- SIGMOD 2008 -- Christopher Re Does contain

  39. Subtle example.. • What about: Does not contain Final Any other maps to empty set Lahar -- SIGMOD 2008 -- Christopher Re Does contain

  40. CUT II Lahar -- SIGMOD 2008 -- Christopher Re

  41. Motivating Apps • RFID apps • Diary and Active Calendar Application. • Alert if I go to a database meeting. • Supply chain • Alert if Mach 3 razors are being stolen • Many independent HMMs • Elder care [Intel/UW] • Alert if elder takes their medicine with water • Activity Recognition • Financial applications on predictive HMM • Alert if head-and-shoulders market Lahar -- SIGMOD 2008 -- Christopher Re

  42. Compile Select and Filter • Intuition: goal maps to two letters: • match (m) : matches filter • accept (a) : accepted by select Does not contain Final language and automaton are the same for both queries Lahar -- SIGMOD 2008 -- Christopher Re Does contain

  43. Wrinkle in the language:Filter v. Selection “Alert next time Joe is in 502 after he is in 501” Yes “Alert if the next place Joe is in after 501 is 502” No At Time Lahar -- SIGMOD 2008 -- Christopher Re

  44. Recap of Algorithms • Regular Queries • Compiled them to an NFA, then used image • Data complexity O(1) • Extended regular • Several regulars multiplied together • Depends on number of distinct people in the data, not number of time steps. Lahar -- SIGMOD 2008 -- Christopher Re

  45. Text1 • Euclid • Eculid • Euclid • Euclid • Euclid • Euclid • Symbol Lahar -- SIGMOD 2008 -- Christopher Re

  46. Inspired by Cayuga [Demers et al 2006, White et al 2007] Lahar Queries by Example Alert when Joe is in hallway 4 and later in office 422 Challenge with probabilities: Naïve approach is exponential; unavoidable (#P) Joe in Hall4 Joe in 422 Alert when Joe is in hallway 4, and immediately in office 422 Joe in Hall4 Joe in 422 Lahar -- SIGMOD 2008 -- Christopher Re

  47. Quality: Archived – Improve over Viterbi? • Smoothing v. Viterbi (MAP) • Lahar tracks of Markovian Correlations • Viterbi leverages correlations for MAP estimate Approx ~30% gain in F1 Precision Recall F1 Lahar -- SIGMOD 2008 -- Christopher Re

More Related