1 / 69

Discovering Latent Structure in Multiple Modalities

Discovering Latent Structure in Multiple Modalities. Andrew McCallum Computer Science Department University of Massachusetts Amherst. Joint work with  Xuerui Wang, Natasha Mohanty, Andres Corrada, Chris Pal, Wei Li, Greg Druck. Social Network in an Email Dataset.

pgiesen
Download Presentation

Discovering Latent Structure in Multiple Modalities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Latent Structure inMultiple Modalities Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha Mohanty, Andres Corrada, Chris Pal, Wei Li, Greg Druck.

  2. Social Network in an Email Dataset

  3. Social Network in Political Data Vote similarity inU.S. Senate [Jakulin & Buntine 2005]

  4. Groups and Topics • Input: • Observed relations between people • Attributes on those relations (text, or categorical) • Output: • Attributes clustered into “topics” • Groups of people---varying depending on topic

  5. Discovering Groups from Observed Set of Relations Student Roster Adams BennettCarterDavis Edwards Frederking Academic Admiration Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C) Admiration relations among six high school students.

  6. Adjacency Matrix Representing Relations Student Roster Adams BennettCarterDavis Edwards Frederking Academic Admiration Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C)

  7. Group Model: Partitioning Entities into Groups Stochastic Blockstructures for Relations [Nowicki, Snijders 2001] Beta Multinomial Dirichlet S: number of entities G: number of groups Binomial Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004]

  8. Two Relations with Different Attributes Student Roster Adams BennettCarterDavis Edwards Frederking Academic Admiration Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C) Social Admiration Soci(A, B) Soci(A, D) Soci(A, F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B) Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C) Soci(D, E) Soci(E, B) Soci(E, D) Soci(E, F) Soci(F, A) Soci(F, C) Soci(F, E)

  9. The Group-Topic Model: Discovering Groups and Topics Simultaneously [Wang, Mohanty, McCallum 2006] Beta Uniform Multinomial Dirichlet Dirichlet Binomial Multinomial

  10. Inference and Estimation • Gibbs Sampling: • Many r.v.s can be integrated out • Easy to implement • Reasonably fast We assume the relationship is symmetric.

  11. Dataset #1:U.S. Senate • 16 years of voting records in the US Senate (1989 – 2005) • a Senator may respond Yea or Nay to a resolution • 3423 resolutions with text attributes (index terms) • 191 Senators in total across 16 years S.543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: 102-242. Index terms: Banks and bankingAccountingAdministrative feesCost controlCreditDeposit insuranceDepressed areas and other 110 terms Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay……

  12. Topics Discovered (U.S. Senate) Mixture of Unigrams Group-Topic Model

  13. Groups Discovered (US Senate) Groups from topic Education + Domestic

  14. Senators Who Change Coalition the most Dependent on Topic e.g. Senator Shelby (D-AL) votes with the Republicans on Economic with the Democrats on Education + Domestic with a small group of maverick Republicans on Social Security + Medicaid

  15. Dataset #2:The UN General Assembly • Voting records of the UN General Assembly (1990 - 2003) • A country may choose to vote Yes, No or Abstain • 931 resolutions with text attributes (titles) • 192 countries in total • Also experiments later with resolutions from 1960-2003 Vote on Permanent Sovereignty of Palestinian People, 87th plenary meeting The draft resolution on permanent sovereignty of the Palestinian people in the occupied Palestinian territory, including Jerusalem, and of the Arab population in the occupied Syrian Golan over their natural resources (document A/54/591) was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions: In favour: Afghanistan, Argentina, Belgium, Brazil, Canada, China, France, Germany, India, Japan, Mexico, Netherlands, New Zealand, Pakistan, Panama, Russian Federation, South Africa, Spain, Turkey, and other 126 countries. Against: Israel, Marshall Islands, United States. Abstain: Australia, Cameroon, Georgia, Kazakhstan, Uzbekistan, Zambia.

  16. Topics Discovered (UN) Mixture of Unigrams Group-TopicModel

  17. GroupsDiscovered(UN) The countries list for each group are ordered by their 2005 GDP (PPP) and only 5 countries are shown in groups that have more than 5 members.

  18. Outline Discovering Latent Structure in Multiple Modalities • Groups & Text (Group-Topic Model, GT) • Nested Correlations (Pachinko Allocation, PAM) • Time & Text (Topics-over-Time Model, TOT) • Time & Text with Nested Correlations (PAM-TOT) • Multi-Conditional Mixtures a

  19. “images, motion, eyes” “motion, some junk” LDA 20 visual model motion field object image images objects fields receptive eye position spatial direction target vision multiple figure orientation location LDA 100 motion detection field optical flow sensitive moving functional detect contrast light dimensional intensity computer mt measures occlusion temporal edge real Latent Dirichlet Allocation [Blei, Ng, Jordan, 2003] α N θ n z β T w φ

  20. Correlated Topic Model [Blei, Lafferty, 2005]   N logistic normal  n z β T w φ Square matrix of pairwise correlations.

  21. Mixture Model A B C D E F G 21 parameters 14 parameters Topic Correlation Representation 7 topics: {A, B, C, D, E, F, G} Correlations: {A, B, C, D, E} and {C, D, E, F, G} CTM B C D E F G A B C D E F

  22. Pachinko Machine

  23. Pachinko Allocation Model Thanks to Michael Jordan for suggesting the name [Li, McCallum, 2005, 2006] 11 Given: directed acyclic graph (DAG); at each interior node: a Dirichlet over its children and words at leaves Model structure, not the graphical model 21 22 For each document: Sample a multinomial from each Dirichlet 31 32 33 For each word in this document: Starting from the root, sample a child from successive nodes, down to a leaf. Generate the word at the leaf 41 42 43 44 45 word1 word2 word3 word4 word5 word6 word7 word8 Like a Polya tree, but DAG shaped, with arbitrary number of children.

  24. Pachinko Allocation Model [Li, McCallum, 2005] 11 DAG may have arbitrary structure • arbitrary depth • any number of children per node • sparse connectivity • edges may skip layers Model structure, not the graphical model 21 22 31 32 33 41 42 43 44 45 word1 word2 word3 word4 word5 word6 word7 word8

  25. Pachinko Allocation Model [Li, McCallum, 2005] 11 Model structure, not the graphical model 21 22 Distributions over distributions over topics... Distributions over topics;mixtures, representing topic correlations 31 32 33 41 42 43 44 45 Distributions over words (like “LDA topics”) word1 word2 word3 word4 word5 word6 word7 word8 Some interior nodes could contain one multinomial, used for all documents. (i.e. a very peaked Dirichlet)

  26. Pachinko Allocation Model [Li, McCallum, 2005] 11 Estimate all these Dirichlets from data. Estimate model structure from data. (number of nodes, and connectivity) Model structure, not the graphical model 21 22 31 32 33 41 42 43 44 45 word1 word2 word3 word4 word5 word6 word7 word8

  27. Pachinko Allocation Special Cases Latent Dirichlet Allocation 32 41 42 43 44 45 word1 word2 word3 word4 word5 word6 word7 word8

  28. Pachinko Allocation Model ... with two layers, no skipping layers,fully-connected from one layer to the next. 11 21 22 23 “super-topics” “sub-topics” 31 32 33 34 35 fixed multinomials word1 word2 word3 word4 word5 word6 word7 word8 Another special case would select only one super-topic per document.

  29. Graphical Models Four-level PAM (with fixed multinomials for sub-topics) LDA T’ α α1 α2 N N θ θ2 θ3 n n z z2 z3 β β T T w φ w φ

  30. Inference – Gibbs Sampling T’ α2 α3 N θ2 θ3 n Jointly sampled z2 z3 β T w φ Dirichlet parameters α are estimated with moment matching

  31. Experimental Results • Topic clarity by human judgement • Likelihood on held-out data • Document classification

  32. Datasets • Rexa (http://rexa.info/) • 4000 documents, 278438 word tokens and 25597 unique words. • NIPS • 1647 documents, 114142 word tokens and 11708 unique words. • 20 newsgroup comp5 subset • 4836 documents, 35567 unique words.

  33. Topic Correlations

  34. Example Topics “images, motion eyes” “motion” (+ some generic) “motion” “eyes” “images” LDA 20 visual model motion field object image images objects fields receptive eye position spatial direction target vision multiple figure orientation location LDA 100 motion detection field optical flow sensitive moving functional detect contrast light dimensional intensity computer mt measures occlusion temporal edge real PAM 100 motion video surface surfaces figure scene camera noisy sequence activation generated analytical pixels measurements assigne advance lated shown closed perceptual PAM 100 eye head vor vestibulo oculomotor vestibular vary reflex vi pan rapid semicircular canals responds streams cholinergic rotation topographically detectors ning PAM 100 image digit faces pixel surface interpolation scene people viewing neighboring sensors patches manifold dataset magnitude transparency rich dynamical amounts tor

  35. Blind Topic Evaluation • Randomly select 25 similar pairs of topics generated from PAM and LDA • 5 people • Each asked to “select the topic in each pair that you find more semantically coherent.” Topic counts

  36. Examples 5 votes 0 vote 4 votes 1 vote

  37. Examples 4 votes 1 vote 1 vote 4 votes

  38. Likelihood Comparison • Dataset: NIPS • Two sets of experiments: • Varying number of topics • Different proportions of training data

  39. Likelihood Comparison • Varying number of topics

  40. Likelihood Comparison • Different proportions of training data

  41. Document Classification • 20 newsgroup comp5 subset • 5-way classification (accuracy in %) Statistically significant with a p-value < 0.05.

  42. Outline Discovering Latent Structure in Multiple Modalities • Groups & Text (Group-Topic Model, GT) • Nested Correlations (Pachinko Allocation, PAM) • Time & Text (Topics-over-Time Model, TOT) • Time & Text with Nested Correlations (PAM-TOT) • Multi-Conditional Mixtures a a

  43. Want to Model Trends over Time • Is prevalence of topic growing or waning? • Pattern appears only briefly • Capture its statistics in focused way • Don’t confuse it with patterns elsewhere in time • How do roles, groups, influence shift over time?

  44. distributionon time stamps  Betaover time Uniformprior   t time stamp T  multinomialover topics Dirichlet prior topicindex z  word  w T Nd Multinomialover words D Topics over Time (TOT) [Wang, McCallum 2006]  Dirichlet  multinomialover topics Uniformprior Dirichlet prior topicindex z   timestamp word  w t  T T Nd Multinomialover words Betaover time D

  45. Attributes of this Approach to Modeling Time • Not a Markov model • No state transitions, or Markov assumption • Continuous Time • Time not discretized • Easily incorporated into other more complex models with additional modalities.

  46. State of the Union Address 208 Addresses delivered between January 8, 1790 and January 29, 2002. • To increase the number of documents, we split the addresses into paragraphs and treated them as ‘documents’. One-line paragraphs were excluded. Stopping was applied. • 17156 ‘documents’ • 21534 words • 669,425 tokens Our scheme of taxation, by means of which this needless surplus is taken from the people and put into the public Treasury, consists of a tariff or duty levied upon importations from abroad and internal-revenue taxes levied upon the consumption of tobacco and spirituous and malt liquors. It must be conceded that none of the things subjected to internal-revenue taxation are, strictly speaking, necessaries. There appears to be no just complaint of this taxation by the consumers of these articles, and there seems to be nothing so well able to bear the burden without hardship to any portion of the people. 1910

  47. ComparingTOTagainstLDA

  48. TOT on 17 years of NIPS proceedings

  49. Topic Distributions Conditioned on Time topic mass (in vertical height) time

  50. TOT on 17 years of NIPS proceedings TOT LDA

More Related