1 / 52

Probabilistic Graphic Model&LDA

Probabilistic Graphic Model&LDA. Yilun Wang Chu- kochen Honors College, Zhejiang University. Outline. What does a probabilistic model do?. What are mechanisms underlying gene expression data? Colon Cancer Research.

clio
Download Presentation

Probabilistic Graphic Model&LDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Graphic Model&LDA Yilun Wang Chu-kochen Honors College, Zhejiang University

  2. Outline

  3. What does a probabilistic model do? • What are mechanisms underlying gene expression data? • Colon Cancer Research. • How to predict prices of stocks and bonds from historical data? • Hedge fund dynamics. • Given a list of movies that a particular user likes, what other movies would she like? • Netflix Prize. • How to identifyaspects of a patient’s health that are indicative of disease? • Heart Disease Classification. • Which documents from a collection are relevant to a searchquery? • Google Research.

  4. How • Setps: • Formulating questions about data. • 明确要干什么,要求解什么,有哪些参数 • Design an appropriate joint distribution. • 建模,确定数据的结构,隐变量,共轭先验(确定图模型) • Cast our questions on the computation on the joint. • 将要求解的概率通过积分,条件独立,拆成多个可计算的部分 • Develop efficient algorithms to perform or approximate the computations on the joint. • 利用吉布斯采样或者变分推理等方法求解

  5. Probability Review • R1. Joint Distributions • R2. Marginal Probabilities • R3. Conditional Probabilities (R1+R2) • Joint/Marginal • R4. Independence

  6. Probability Review • Bayes' rule prior likelihood posterior evidence Probability Estimation (R2+R3) “Bayesian Inference with Tears”

  7. Graphical Models • A family of probability distributions defined in terms of a directed (DGM/DAG/Bayesian Network) or/and(chain) undirected (Markov Networks) graph

  8. Graphical Models • A more economic representation of the joint • 图模型是表示随机变量之间的关系的图,图中的节点表示随机变量,(缺少)边表示条件独立假设。因此可以对联合分布提供一种紧致表示 • Advantages of GM • allow us to articulate structural assumptions about collections of random variables. • provide general algorithms to compute conditionals, marginals, expectations and independencies, etc. • provide control over the complexity of these operations. • decouple the factorization of the joint from its particular function form.

  9. Conditional Independence • Independence: • Conditional Independence

  10. Conditional Independence • Take graphic model of LDA as an example:

  11. Conditional Independence • Sometime we want to evaluate the following CI: • ?

  12. Probabilistic Graphic Model • Graphical model is the study of probabilistic models • Just because there are nodes and edges doesn’t mean it’s a graphical model • These are not graphical models: Xiaojin Zhu, Tutorial on Graphic Models at KDD-2012 http://pages.cs.wisc.edu/~jerryzhu/

  13. Directed Graphic Models

  14. Example: Alarm • Binary varibles 求P(B, ~E, A, J, ~M)

  15. Example: Naïve bayes • Used extensively in natural language processing • Plate representation on the right

  16. Example: Probabilistic LSI Eric Xing, Topic Models, Latent Space Models, Sparse Coding , and All That

  17. Example: Latent Dirichlet Allocation

  18. Example: Latent Dirichlet Allocation • Generative model • Models each word in a document as a sample from a mixture model. • Each word is generated from a single topic, different words in the document may be generated from different topics. • A topic is characterized by a distribution over words. • Each document is represented as a list of admixing proportions for the components (i.e. topic vector). • The topic vectors and the word rates each follows a Dirichletprior --- essentially a Bayesian pLSI

  19. Example: Latent Dirichlet Allocation

  20. Example: Latent Dirichlet Allocation

  21. Conditional Independence

  22. d-Separation Case 1: Tail-to-Tail

  23. d-Separation Case 2: Head-to-Tail

  24. d-Separation Case 3: Head-to-Head

  25. d-Separation

  26. Undirected graphical models

  27. Factor Graph

  28. Where does complicated model such as LDA come from?

  29. The origin of LDA • Dice Model • Is Dice Model a generative model? Unigram Model D N φ w N D xi Probability Topic Corpus Dice Model Vocabulary Language Model

  30. The evolution process • E1: Add a conjugate prior • Why Conjugate prior? E2: Sampling with repeated choice of dice N N α D D xi xi Bayesian (completed) Dice Model D D N N α α φ φ w w Language Model

  31. The evolution process E3: Turn DM-E2 into a Bayesian mixture model Mixture of unigrams D N α Π wdi zd N α B D xi K β 2 ψzd β ψ

  32. The evolution process • Mixture of unigrams Topic 1 D N α Π wdi zd Topic 2 K β ψzd Corpus Topic 3

  33. The evolution process • Finally: we reach the pLSA/LDA Topic 1 D N α Π wdi zd Topic 2 D β ψzd Topic 3 Corpus

  34. LDA variations

  35. Revisiting k-means: New Algorithms via Bayesian Nonparametrics • Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. • Revisiting the k-means clustering algorithm from a Bayesian nonparametric viewpoint

  36. Recall • Mixture Gaussian

  37. Recall • Hjort, N., Holmes, C., Mueller, P., and Walker, S. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK, 2010. • Dirichlet process mixture: infinite mixture

  38. DP-mean

  39. The Contextual Focused Topic Model(cFTM) • cFTM infers a sparse (“focused”) set of topics for each document, while also leveraging contextual information about the author(s) and document venue. • hierarchical beta process Xu Chen, Mingyuan Zhou, Lawrence Carin, Duke University, The Contextual Focused Topic Model

  40. LDA cFTM +HBP

  41. Pros • (1) It automatically infers the number of topics by combining properties from the Dirichlet process and hierarchical beta process, allowing an unbounded number of topics for the entire corpus, while inferring a focused (sparse) set of topics for each individual document.

  42. Pros • (2) The cFTMnonparametrically clusters then authors and venues, thereby increasing statistical strength while also inferring useful relational information. • (3) Instead of pre-specifying the importance of author/venue information (as was done in [6]), the cFTM automatically infers the document-dependent, probabilistic importance of the author/venue information on word assignment. • Data:DBLP+NSF

  43. TM-LDA: Efficient Online Modeling of Latent Topic Transitions in Social Media • Much of the textual content on the web, and especially social media, is temporally sequenced, and comes in short fragments, including microblogposts on sites such as Twitter and Weibo, status updates on social networking sites such as Facebook and LinkedIn, or comments on content sharing sites such as YouTube Yu Wang, Eugene Agichtein, Michele Benzi, Emory University, TM-LDA: Efficient Online Modeling of Latent Topic Transitions in Social Media

  44. Efficiently mining text streams such as a sequence of posts from the same author, by modeling the topic transitions that naturally arise in these data. • TM-LDA learns the transition parameters among topics by minimizing the prediction error on topic distribution in subsequent postings. After training, TM-LDA is thus able to accurately predict the expected topic distribution in future posts.

  45. Space of topic distributions • Given the topic distribution vector of a historical document x, the estimated topic distribution of a new document ˆy is given by ˆy = f(x)

  46. Experiment

  47. Experiment

  48. ComSoc: Adaptive Transfer of User Behaviors over Composite Social Network • Accurate prediction of user behaviors is important for many social media applications, including social marketing, personalization and recommendation, etc. • 1. alleviate the data sparsity problem • 2. enhance the predictive performance of user modeling ErhengZhong, Wei Fan, Junwei Wang, Lei Xiao, and Yong Li, HKUST, IBM Research Center, Tencent, ComSoc: Adaptive Transfer of User Behaviors over Composite Social Network

More Related