1 / 22

PET: A Statistical Model for Popular Events Tracking in Social Communities

PET: A Statistical Model for Popular Events Tracking in Social Communities. Cindy Xide Lin 1 , Bo Zhao 1 , Qiaozhu Mei 2 , Jiawei Han 1 1 University of Illinois at Urbana-Champaign, 2 University of Michigan KDD 2010 2010. 09. 16.

quinto
Download Presentation

PET: A Statistical Model for Popular Events Tracking in Social Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin1, Bo Zhao1, Qiaozhu Mei2, Jiawei Han1 1University of Illinois at Urbana-Champaign, 2University of Michigan KDD 2010 2010. 09. 16. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

  2. Contents • Introduction • Concept Definition • Problem Definition • Model • Interest model • Topic model • Experiment • Data Collection • Baseline and Gold standard • Analysis on Popularity Trend • Analysis on Content Evolution • Conclusions & Discussions

  3. Introduction • Boom of online communities • e.g., Facebook, Blogger, Twitter, … • Facilitates the information creation, sharing and diffusion. • Popular topic or event can spread much faster. • Needs to track the diffusion and evolution of a popular event • Hot topics emerge, prevail and die • It is desirable to monitor whether people like, what they like, and how their interests change over time • e.g., Who are still interested in watching Avatar 50 days after its release date?

  4. Introduction • Tracking the evolution of a popular topic is challenging • Diffusion of an event is vague • e.g., You don’t know whether I am interest in an event • e.g., and even if you do, from whom did I get this interest. • Fortunately, a large volume of text data is generated from the social communities. • Besides Communicating with friends, a web user also constantly generates text contents such as blog. • A network structure and a text collection which evolve simultaneously and interrelatedly.

  5. Goal • Tracking Popular Eventin a time-variant social community • A stream of text information • A stream of network structures • Modeling the interest of user • Modeling the change of topic

  6. Concept Definition: Network Stream 2 6 v2 v6 1 v1 3 v3 5 4 v5 v4 Gk: The snapshot of network at time tk G = { G1, G2, …, Gn }

  7. Concept Definition: Document Stream 2 w2, w2 w3, w1, … w8, w6 w2, w5, … 6 v2 dk,2 dk,5 v6 1 w1, w2 w3, w1, … 3 w4, w1 w1, w1, … v1 dk,1 dk,3 v3 5 4 w7, w7 w7, w7, … w2, w6 w2, w5, … v5 v4 dk,5 dk,4 Document Collection Stream D = {D1, D2, …, DT} Documents collections Dk = {dk,1, dk,2, …., dk,N}

  8. Concept Definition: Topic and Event • Topic • topic θ is a multinomial distribution of words {p(w|θ)}w∈W • Topic has different version over time, denoting the version at time tk as θk • Event • A stream of topics Theta E = {θ0E, θ1E, θ2E, … θTE} • θ0E is the primitive topic of the event • θkEcorresponds to the version of θ0E at time tk • Indicates the major aspects of the event in network Gk

  9. Concept Definition: Interest • Interest • hk(i): node vi in Gk has a certain level of interest in the particular event at time tk • Real value between 0 and 1 • Hk = {hk(1), hk(2), …, hk(N)}

  10. Problem: Popular Event Tracking • Inputs • Network Stream G • Document Stream D • Primitive topic of an event θ0 • Task1: Popularity Tracking • Inferring the latent stream of interests. (Hk) • providing much richer information about how the interest e • Task2: Topic Tracking • Inferring the latent stream of topics about the event ΘE • Keeping track of the new development about the event, • Understanding event evolution

  11. Intuitions • Observation 1. Interest and Connections • The behavior of each individual is usually influenced by its friend. • Observation 2. Interest and History • The behavior of each individual should be generally consistent over time. • Events should not change dramatically. • Observation 3. Content and Interest • When an individual has a higher level of interest in an event, the content she generates should be more likely to be related to the event

  12. The General Model • Current interest and topic depends on • Current network • Current Documents • Previous history (Markovian simplification) • Formal representation • P(Hk, Θk| Gk, Dk, Hk-1)

  13. Assumption • How to model P(Hk, Θk | Gk, Dk, Hk-1)? • Assumption 1. • Given current network structure Gk and previous Hk-1, • Current interest status Hk is independent of the document collection Dk • Hkㅛ Dk | Gk, Hk-1 • People first become interested in the event and therefore generate discussion it • Assumption 2. • Given the current interest status Hk and the document collection Dk, • The current topic model k is independent of Gk and Hk-1 • θk ㅛ Gk, Hk-1| Hk, Dk • Once the author has developed an interest in the event, the contents she writes will only depend on the event itself and the level of interest • P( Hk, Θk | Gk, Dk, Hk-1 ) = P(Hk | Gk, Hk-1) P(Θk|Hk, Dk)

  14. Interest Model 0.3 0.2 0.8 0.1 0.2 1 h’=1*0.2+0.3*0.8+0.2*0.1 = 0.46 • Gibbs Random field • Great use in studying natural processes • (Gibbs distribution) • cf. (Gaussian distribution is a special member of Gibbs distribution family) • P (Hk | Gk, Hk-1) • h’(k) is weighted sum of friends’ interest • The first part is transition energy of node i • The last part represents neighbors expectation

  15. Topic Model • Considering each document is generated two multinomial component model • Background model: θkB • Modeling Common words • Latent event topic model: θkE • Modeling discriminative and meaningful words • The probability of generating word • P(Θk|Hk, Dk)

  16. Twitter Data collection • Selecting 5000 users with follower-followee relationship • Considering each day as a time point (tk: the kth day) • Document dk,i is obtained by concatenating tweets displayed by user i in k • weight of relationship between user equals the number of tweets displayed by user I by following user j during the period from tk-30 to tk.

  17. Baseline and Gold standard • BOM: extracting the daily box office at Mojo • The box office earning is a trustworthy criterion to reflect the movie’s popularity • GInt: Google Insight • PET • PET- : special version of PET by removing network structure • JonK / Cont

  18. Analysis on Popularity Trend

  19. Analysis on Popularity Trend

  20. Analysis on Popularity Trend • PET always has the best performance • Historic, textual and structured information is reflected well • PET- can not response sufficiently to sudden changes

  21. Analysis on Content Evolution

  22. Conclusion & Discussion • Propose the novel problem of Popular Event Tracking • Propose popular event tracking model, PET • Unified probabilistic framework to model different factors • Covers classical models • Experimental studies show that PET outperforms existing ones • PET is not good framework for tracking interest • There exist the more accurate data such as Google Insight. • Tracking topic changing is a novel problem. • PET detects and tracks topic evolution well.

More Related