1 / 91

Statistical Relational Learning for Knowledge Extraction from the Web

Statistical Relational Learning for Knowledge Extraction from the Web. Hoifung Poon Dept. of Computer Science & Eng. University of Washington. 1. “Drowning in Information, Starved for Knowledge”. WWW. 2. 2. 2. Great Vision: Knowledge Extraction from Web.

kerryn
Download Presentation

Statistical Relational Learning for Knowledge Extraction from the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1

  2. “Drowning in Information, Starved for Knowledge” WWW 2 2 2

  3. Great Vision:Knowledge Extraction from Web Craven et al., “Learning to Construct Knowledge Bases from the World Wide Web," Artificial Intelligence, 1999. • Also need: • Knowledge representation and reasoning • Close the loop: Apply knowledge to extraction • Machine reading[Etzioni et al., 2007] 3

  4. Machine Reading: Text  Knowledge …… 4 4 4

  5. Rapidly Growing Interest • AAAI-07 Spring Symposium on Machine Reading • DARPA Machine Reading Program (2009-2014) • NAACL-10 Workshop on Learning By Reading • Etc. 5

  6. Great Impact • Scientific inquiry and commercial applications • Literature-based discovery, robot scientists • Question answering, semantic search • Drug design, medical diagnosis • Breach knowledge acquisition bottleneck for AI and natural language understanding • Automatically semantify the Web • Etc. 6

  7. This Talk • Statistical relational learning offers promising solutions to machine reading • Markov logic is a leading unifying framework • A success story: USP • Unsupervised, end-to-end machine reading • Extracts five times as many correct answers as state of the art, with highest accuracy of 91% 7

  8. USP: Question-Answer Example Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction 8 8

  9. Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 9 9 9

  10. Key Challenges • Complexity • Uncertainty • Pipeline accumulates errors • Supervision is scarce 10

  11. Languages Are Structural governments lm$pxtm (Hebrew: according to their families) IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. 11 11 11

  12. Languages Are Structural S govern-ment-s l-m$px-t-m (Hebrew: according to their families) VP NP V NP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. involvement Theme Cause up-regulation activation Theme Cause Site Theme human monocyte IL-10 gp41 p70(S6)-kinase 12 12 12

  13. Knowledge Is Heterogeneous • Individuals E.g.: Socrates is a man • Types E.g.: Man is mortal • Inference rules E.g.: Syllogism • Ontological relations • Etc. MAMMAL FACE ISA ISPART HUMAN EYE 13 13

  14. Complexity • Can handle using first-order logic • Trees, graphs, dependencies, hierarchies, etc. easily expressed • Inference algorithms (satisfiability testing, theorem proving, etc.) • But … logic is brittle with uncertainty 14 14

  15. Languages Are Ambiguous Microsoft buysPowerset Microsoft acquires Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase ofPowerset, … …… I saw the man with the telescope NP I sawthe man with the telescope NP ADVP I sawthe manwith the telescope Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … G. W. Bush …… …… Laura Bush…… Mrs. Bush …… Which one? London PERSON or LOCATION? 15 15 15

  16. Knowledge Has Uncertainty • We need to model correlations • Our information is always incomplete • Our predictions are uncertain 16 16

  17. Uncertainty • Statistics provides the tools to handle this • Mixture models • Hidden Markov models • Bayesian networks • Markov random fields • Maximum entropy models • Conditional random fields • Etc. • But … statistical models assume i.i.d. data(independently and identically distributed)objects  feature vectors

  18. Pipeline is Suboptimal • E.g., NLP pipeline: Tokenization  Morphology  Chunking  Syntax  … • Accumulates and propagates errors • Wanted: Joint inference • Across all processing stages • Among all interdependent objects 18

  19. Supervision is Scarce Tons of text … but most is not annotated Labeling is expensive (Cf. Penn-Treebank) Need to leverage indirect supervision 19 19 19

  20. Redundancy • Key source of indirect supervision • State-of-the-art systems depend on this E.g., TextRunner [Banko et al., 2007] • But … Web is heterogeneous: Long tail • Redundancy only present in head regime

  21. Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 21 21 21

  22. Statistical Relational Learning Burgeoning field in machine learning Offers promising solutions for machine reading Unify statistical and logical approaches Replace pipeline with joint inference Principled framework to leverage both direct and indirect supervision 22 22

  23. Machine Reading: A Vision Challenge: Long tail 23

  24. Machine Reading: A Vision 24

  25. Challenges in Applying Statistical Relational Learning Learning is much harder Inference becomes a crucial issue Greater complexity for user 25 25

  26. Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic[Domingos & Lowd, 2009] Etc. 26 26

  27. Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009] Etc. Leading unifying framework 27 27

  28. Overview Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 28 28 28

  29. Markov Networks Undirected graphical models Smoking Cancer Asthma Cough • Log-linear model: Weight of Feature i Feature i 29

  30. First-Order Logic Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x,y) Grounding: Replace all variables by constantsE.g.: Friends (Anna, Bob) World (model, interpretation):Assignment of truth values to all ground predicates 30

  31. Markov Logic • Intuition: Soften logical constraints • Syntax: Weighted first-order formulas • Semantics: Feature templates for Markov networks • A Markov Logic Network (MLN) is a set of pairs (Fi, wi) where • Fi is a formula in first-order logic • wiis a real number Number of true groundings of Fi 31

  32. Example: Friends & Smokers 32

  33. Example: Friends & Smokers 33

  34. Example: Friends & Smokers 34

  35. Example: Friends & Smokers Probabilistic graphical models andfirst-order logic are special cases Two constants: Anna (A) and Bob (B) Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Friends(B,B) Cancer(A) Cancer(B) Friends(B,A) 35

  36. MLN Algorithms:The First Three Generations 36

  37. Efficient Inference • Logical or statistical inference already hard • But … can do approximate inference Suffice to perform well in most cases • Combine ideas from both camps • E.g., MC-SAT  MCMC  SAT solver • Can also leverage sparsity in relational domains More: Poon & Domingos, “Sound and Efficient Inference with Probabilistic and Deterministic Dependencies”, in Proc. AAAI-2006. More: Poon, Domingos & Sumner, “A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC”, in Proc. AAAI-2008. 37

  38. Weight Learning • Probability model P(X) • X: Observable in training data • Maximize likelihood of observed data • Regularization to prevent overfitting

  39. Weight Learning Gradient descent Use MC-SAT for inference Can also leverage second-order information [Lowd & Domingos, 2007] Requires inference No. of times clause i is true in data Expected no. times clause i is true according to MLN 39 39 39

  40. Unsupervised Learning: How? I.I.D. learning: Sophisticated model requires more labeled data Statistical relational learning: Sophisticated model may require less labeled data Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones One formula is worth a thousand labels Small amount of domain knowledge  large-scale joint inference 40 40 40

  41. Unsupervised Weight Learning • Probability model P(X,Z) • X: Observed in training data • Z: Hidden variables • E.g., clustering with mixture models • Z: Cluster assignment • X: Observed features • Maximize likelihood of observed data by summing out hidden variables Z

  42. Unsupervised Weight Learning Gradient descent Use MC-SAT to compute both expectations May also combine with contrastive estimation Sum over z, conditioned on observed x Summed over bothx and z More: Poon, Cherry, & Toutanova, “Unsupervised Morphological Segmentation with Log-Linear Models”, in Proc. NAACL-2009. 42 42 42 Best Paper Award

  43. Markov Logic Unified inference and learning algorithms  Can handle millions of variables, billions of features, ten of thousands of parameters Easy-to-use software: Alchemy Many successful applications E.g.: Information extraction, coreference resolution, semantic parsing, ontology induction 43 43 43

  44. Pipeline  Joint Inference Combine segmentation and entity resolution for information extraction Extract complex and nested bio-events from PubMed abstracts More: Poon & Domingos, “Joint Inference for Information Extraction”, in Proc. AAAI-2007. More: Poon & Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, in Proc. NAACL-2010. 44 44

  45. Unsupervised Learning: Example Coreference resolution:Accuracy comparable to previous supervised state of the art More: Poon & Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008. 45 45

  46. Overview Machine reading: Challenges Statistical relational learning Markov logic USP:Unsupervised Semantic Parsing Research directions 46 46 46

  47. Unsupervised Semantic Parsing USP [Poon & Domingos, EMNLP-09] First unsupervised approach for semantic parsing End-to-end machine reading system Read text, answer questions OntoUSP USP  Ontology Induction [Poon & Domingos, ACL-10] Encoded in a few Markov logic formulas Best Paper Award 47 47

  48. Semantic Parsing Goal • Microsoft buys Powerset • BUY(MICROSOFT,POWERSET) Challenge Microsoft buysPowerset Microsoft acquiressemantic search engine Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase of Powerset, … 48 48 48

  49. Limitations of Existing Approaches • Manual grammar or supervised learning • Applicable to restricted domains only • For general text • Not clear what predicates and objects to use • Hard to produce consistent meaning annotation • Also, often learn both syntax and semantics • Fail to leverage advanced syntactic parsers • Make semantic parsing harder

  50. USP: Key Idea # 1 Target predicates and objects can be learned Viewed as clusters of syntactic or lexical variations of the same meaning BUY(-,-)  buys, acquires, ’s purchase of, …  Cluster of various expressions for acquisition MICROSOFT  Microsoft, the Redmond software giant, …  Cluster of various mentions of Microsoft 50

More Related