1 / 112

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms . Amr Ahmed Thesis Proposal. This thesis is about Document collections they are everywhere they cover many domains. ArXiv. Conference proceeding.

natane
Download Presentation

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Users and Content:Structured Probabilistic Representationand Scalable Online Inference Algorithms Amr Ahmed Thesis Proposal

  2. This thesis is about Document collections they are everywhere they cover many domains

  3. ArXiv Conference proceeding Research Publications Pubmed central Journal transactions Yahoo! news CNN Red state Social Media Blogs Google news Daily KOS BBC

  4. Structural Correspondence Temporal Dynamics Phy Bio CS time time BP: “We will make this right." Drill explosion “BP wasn't prepared for an oil spill at such depths” Choice is a fundamental, constitutional right Ban abortion with Constitutional amendment

  5. Thesis Question • How to build a structured representation of document collections that reveals • Temporal Dynamics • How ideas/events evolve over time • Structural Correspondence • How ideas are addressed across modalities and communities

  6. Thesis Approach • Models • Probabilistic graphical models • Topic models and Non-parametric Bayes • Principled, expressive and modular • Algorithms • Distributed • To deal with large-scale datasets • Online • To update the representation with new data

  7. Outline • Background • Temporal Dynamics • Timelines for research publications • Storylines form news stream • User interest-lines • Structural Correspondence • Across modalities • Across ideologies

  8. What is a Good Model for Documents? • Clustering • Mixture of unigram model • How to specify a model? • Generative process • Assume some hidden variables • Use them to generate documents • Inference • Invert the process • Given documents  hidden variables f p K ci wi N

  9. Mixture of Unigram f1 fk f p K ci wi N  pj  pk p1 wi Generative Process Is this a good model for documents? • For Document wi • Sample ci ~ Multi(p) • Sample wi~Mult(fci) When is this a good model for documents? • When documents are single-topic • Not true in our settings

  10. 0.6 0.3 0.1 MT Syntax Learning Source Target SMT Alignment Score BLEU Parse Tree Noun Phrase Grammar CFG likelihood EM Hidden Parameters Estimation argMax What Do We Need to Model? • Q: What is it about? • A: Mainly MT, with syntax, some learning A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases—phrases that contain sub-phrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system. Mixing Proportion Topics Unigram over vocabulary Topic Models

  11. Mixed-Membership Models Prior f1 fk q Generative Process • For each document d • Sample qd~Prior • For each word w in d • Sample z~Multi(qd) • Sample w~Multi(fz) z f w K N D  qj  qk q1 wi A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system.

  12. Topic Models Prior • Prior over topic Vector • Latent Dirichlet Allocation (LDA) • Correlated priors (CTM) • Hierarchical priors • Topics • Unigram, bigrams, etc • Document structure • Bag of words • Multi-modal • Side information q z f w K N D

  13. Outline • Background • Temporal Dynamics • Timelines for research publications • Storylines form news stream • User interest-lines • Structural Correspondence • Across modalities • Across ideologies

  14. Problem Statement Phy Bio • Potentially infinitenumber of topics • With time-varying trends • And time-varying distributions • And variable durations • Topics can die • New topics can be born Topics CS Research Papers 2009 1900 given Discover

  15. Time Model Dimension The Big Picture LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K f w N D

  16. LDA: The Generative Process a Generative Process • For each document d • Sample qd~Dirichlet(a) • For each word w in d • Sample z~Multi(qd) • Sample w~Multi(fz) q z w N D f K Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?

  17. Model Dimension The Big Picture Time LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K b w N D

  18. Dynamic LDA: The Generative Process a1 Necessary to evolve trends q • For each document d • Sample qd~Normal(a,lI) • For each word w in d • Sample z ~ Multi(L(qd)) • Sample w~Multi(L(fz)) z w N D f1 K Research Papers 2009 1900 Logistic transformation:

  19. Dynamic LDA: The Generative Process a1 a2 q q • at ~ Normal(.|at -1,s) • Fk,t ~Normal(.|Fk,t,r) • For each document d • Sample qd~Normal(at ,lI) • For each word w in d • Sample zd,i~Multi(L(qd)) • Sample wd,i~Multi(L(fz(d,i))) z z w w N N D D f1 f2 K K Research Papers 2009 1900

  20. Dynamic LDA: The Generative Process a1 a2 aT q q q z z z w w w N N N D D D f1 f2 fT K K K Research Papers 2009 1900

  21. Dynamic LDA: The Generative Process a1 a2 aT q q q z z z w w w N N N D D D f1 f2 fT K K K Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?

  22. Model Dimension The Big Picture Time LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K b w N D

  23. The Chinese Restaurant Franchise Process • HDPM automatically determines number of topics in LDA • We will focus on the Chinese Restaurant Franchise process construction • A set of restaurants that share a global menu • Metaphor • Restaurant = documents • Customer = word • Dish = topic • Global Menu = Set of topics

  24. The Chinese Restaurant Franchise Process Global Menu m1: Number of tables serving this dish (topic) f2 f1 f3 f4 f4: distribution for topic 4 Restaurant 1 Restaurant 2 Table Customers Sharing the same dish Customers Sharing the same dish Dish served

  25. The Chinese Restaurant Franchise Process Global Menu f1 f2 f3 f4 ? Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b  a • Sample a new dish for this table a

  26. The Chinese Restaurant Franchise Process Global Menu w~ Multi(L(f3)) f1 f2 f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b  a • Sample a new dish for this table ? a

  27. The Chinese Restaurant Franchise Process Global Menu f1 f2 f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b  a • Sample a new dish for this table ? a

  28. The Chinese Restaurant Franchise Process Global Menu g f1 f2 new f3 f4 ? Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing dish k  mk • A new dish  g ? a

  29. The Chinese Restaurant Franchise Process Global Menu w~ Multi(L(f3)) ? g f1 f2 new f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing dish k  mk • A new dish  g a

  30. The Chinese Restaurant Franchise Process Global Menu f5~ H ? f5 f1 f2 new f3 f4 ? w~ Multi(L(f5)) Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing dish k  mk • A new dish  g a

  31. The Chinese Restaurant Franchise Process Global Menu f5 f1 f2 f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?

  32. Model Dimension The Big Picture Time LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K b w N D

  33. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 = * f2,1 f3,1 f4,1 Pseudo counts Decay factor Epoch 1 Topics at end of epoch 1 • Height (mk,1) represent topic popularity • fk,1 represents topic’s k distribution Documents in epoch 1 are generated as before Observations • Popular topics at epoch 1 are likely to be popular at epoch 2 • fk,2 is likely to smoothly evolve from fk,1

  34. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 New real dish served f3,2 ~ Normal(.| f3,1,r) f5,1 f2,2 f1,1 f3,2 f2,1 f3,1 f4,1 Epoch 1 Inherited but not yet used

  35. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f2,2 f3,2 Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing and inherited dish k  m`k,2 + mk,2 • Existing but NOT inherited dish k  m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish  g Thenfnew~ H

  36. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f2,2 f3,2 Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing and inherited dish k  m`k,2 + mk,2 • Existing but NOT inherited dish k  m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish  g Thenfnew~ H

  37. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f1,2 f2,2 f3,2 f1,2 ~ Normal(.| f1,1,r) Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing and inherited dish k  m`k,2 + mk,2 • Existing but NOT inherited dish k  m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish  g Thenfnew~ H

  38. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 f6,2 ~ H Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b  a • Sample a new dish for this table • Existing and inherited dish k  m`k,2 + mk,2 • Existing but NOT inherited dish k  m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish  g Thenfnew~ H

  39. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Epoch 1 Epoch 2 died out topics Newly born

  40. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Epoch 1 Epoch 2 Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?

  41. Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Epoch 1 Epoch 2 • We just described a first order RCRF process • for a general D-order process

  42. Inference • Gibbs Sampling • Sample a table for each word • Sample a topic for each table • Sample the topic parameter over time • Sample hyper-parameters • How to deal with non-conjugacy • Algorithm 8 in Neal’s 1998 + Metropolis-Hasting • Efficiency • The Markov blanket contains the previous and following D epochs

  43. Sampling a Topic for a Table Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Past Emission Future Non-Conjugacy Efficiency

  44. Sampling a Topic for a Table Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f2,1 f3,1 f4,1 g/3 f2,2 f1,2 f3,2 f 6,2 ~ H= N(0,sI) Past Emission Future Non-Conjugacy Efficiency

  45. Sampling a Topic for a Table Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Past Emission Future Non-Conjugacy Pre-compute And update

  46. Sampling Topic Parameters f1 f2 fT • V|f ~ Mult( Logistic(f)) • Linear-State space model with non-Gaussian emission • Use Laplace approximation inside the Forward-Backward algorithm • Use the resulting distribution as a proposal v v v

  47. Experiments • Simulated data • Simulated 20 epochs with 100 data points in each epoch • Timeline of the NIPS conference • 13 years • 1740 documents • 950 words per document • ~3500 vocabulary

  48. Simulation Experiment Sample Documents:

  49. Ground Truth Recovered

  50. SOM 1987 1991 1990 1995 1994 1996 ICA boosting speech RL Memory Neuro sience Bayesian Kernels Mixtures NN Generalizatoin Classification Classification Clustering Methods Control Control PM Prob. Models image speech Kernels Mixtures ICA

More Related