1 / 7

Multivariate Event Detection

Multivariate Event Detection. Manu Shukla 3/23/2013. Basics. Use fast subset scan (Neill ‘12, J.R. Stat. Soc.) to do multivariate event detection

adair
Download Presentation

Multivariate Event Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Event Detection Manu Shukla 3/23/2013

  2. Basics • Use fast subset scan (Neill ‘12, J.R. Stat. Soc.) to do multivariate event detection • Multivariate event detection in this case essentially is finding keyword combinations in tweets that are most likely to signify event (in this scenario social unrest) • Reduce problem to filtering out combinations that have low probability of forming clusters using score function F(S) that satisfy Linear Time Subset Scanning (LTSS) • Find keyword combination clusters as proposed by fast subset scan technique after applying filtering

  3. Filtering • The filtering is done following two principles • By location • By probability to form clusters based on F(S) and LTSS • Use data structures kd-tree and fp-tree to aid in filtering

  4. Theorems • Two branch and bound algorithms are used: • Theorem 1: Given a spatial region R and a set of itemsets {A1,…,AK}, in which some of the itemsets may overlap, for any superset B{A1,...,AK}, we have the following upperbound property: FS(min{R.Ai.LTSS.count}i=1K, max{min{R.Ai.LTSS.count}i=1K, max{R.Ai.LTSS.minbase}i=1K}) > R.{B}.LTSS.FS • Theorem 2: Given a spatial region R and a set of itemsets {A1,…,AK}, in which some of the itemsets may overlap, for any superset B{A1,…,AK}, we have the following upperbound property: FS(min{R.Ai.LTSS.count}i=1K, max{min{R.Ai.LTSS.count}i=1K, max{R.Ai.LTSS.minbase}i=1K}, Call=min{R.Ai.LTSS.count}i=1K, Ball) > R.{B}.LTSS.FS

  5. Details • Score function F(S) is Kulldorff statistic: F(S;C,B,Call,Ball) = C log(C/B) + (Call - C) log((Call – C)/(Ball – B))- Call log(Call / Ball) • C and B are respectively the aggregate count Σcti and aggregate baseline Σbti in region S for the given time interval • Call and Ball are the total aggregate count Σcti and baseline Σbti for all spatial locations si • R.A.LTSS.count and R.A.LTSS.baseare defined as the LTSS subset count and base in the region R • R.A.LTSS.minbase=min{R.A.p.base | p ε R.A.LTSS}

  6. Steps • Build candidate clusters of single keyword terms using any technique (graph partitioning) • Filter single keyword terms spatially using 2 theorems using kd tree • Build fp-tree of keyword combinations • Filter fp-tree using 2 theorems • Cluster using fast subset scan

  7. Issues • Scaling as keyword combinations increase exponentially (Distributed?) • Verifying the quality of clusters

More Related