1 / 149

Markov Logic in Natural Language Processing

Markov Logic in Natural Language Processing. Hoifung Poon Dept. of Computer Science & Eng. University of Washington. Overview. Motivation Foundational areas Markov logic NLP applications Basics Supervised learning Unsupervised learning.

Pat_Xavi
Download Presentation

Markov Logic in Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

  2. Overview • Motivation • Foundational areas • Markov logic • NLP applications • Basics • Supervised learning • Unsupervised learning

  3. Holy Grail of NLP:Automatic Language Understanding Natural language search Answer questions Knowledge discovery …… Text Meaning 3 3

  4. Reality: Increasingly Fragmented Parsing Semantics Tagging Information Extraction Morphology

  5. Time for a New Synthesis? • Speed up progress • New opportunities to improve performance • But we need a new tool for this …

  6. Languages Are Structural governments lm$pxtm (according to their families)

  7. Languages Are Structural S govern-ment-s l-m$px-t-m (according to their families) VP NP V NP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. involvement Theme Cause up-regulation activation Theme Cause Site Theme human monocyte IL-10 gp41 p70(S6)-kinase

  8. Languages Are Structural S govern-ment-s l-m$px-t-m (according to their families) VP NP V NP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. involvement Theme Cause up-regulation activation Theme Cause Site Theme human monocyte IL-10 gp41 p70(S6)-kinase

  9. Processing Is Complex Morphology POS Tagging Chunking Semantic Role Labeling Syntactic Parsing Coreference Resolution Information Extraction ……

  10. Pipeline Is Suboptimal Morphology POS Tagging Chunking Semantic Role Labeling Syntactic Parsing Coreference Resolution Information Extraction ……

  11. First-Order Logic • Main theoretical foundation of computer science • General language for describing complex structures and knowledge • Trees, graphs, dependencies, hierarchies, etc. easily expressed • Inference algorithms (satisfiability testing, theorem proving, etc.)

  12. Languages Are Statistical Microsoft buysPowerset Microsoft acquires Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase ofPowerset, … …… I saw the man with the telescope NP I sawthe man with the telescope NP ADVP I sawthe manwith the telescope Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … G. W. Bush …… …… Laura Bush…… Mrs. Bush …… Which one? London PERSON or LOCATION?

  13. Languages Are Statistical • Languages are ambiguous • Our information is always incomplete • We need to model correlations • Our predictions are uncertain • Statistics provides the tools to handle this

  14. Probabilistic Graphical Models • Mixture models • Hidden Markov models • Bayesian networks • Markov random fields • Maximum entropy models • Conditional random fields • Etc.

  15. The Problem • Logic is deterministic, requires manual coding • Statistical models assume i.i.d. data,objects = feature vectors • Historically, statistical and logical NLPhave been pursued separately • We need to unify the two!

  16. Also, Supervision Is Scarce • Supervised learning needs training examples • Tons of texts … but most are not annotated • Labeling is expensive (Cf. Penn-Treebank) Need to leverage indirect supervision

  17. A Promising Solution: Statistical Relational Learning • Emerging direction in machine learning • Unifies logical and statistical approaches • Principal way to leverage direct and indirect supervision

  18. Key: Joint Inference • Models complex interdependencies • Propagates information from more certain decisions to resolve ambiguities in others • Advantages: • Better and more intuitive models • Improve predictive accuracy • Compensate for lack of training examples • SRL can have even greater impact when direct supervision is scarce

  19. Challenges in ApplyingStatistical Relational Learning • Learning is much harder • Inference becomes a crucial issue • Greater complexity for user

  20. Progress to Date • Probabilistic logic [Nilsson, 1986] • Statistics and beliefs [Halpern, 1990] • Knowledge-based model construction[Wellman et al., 1992] • Stochastic logic programs [Muggleton, 1996] • Probabilistic relational models [Friedman et al., 1999] • Relational Markov networks [Taskar et al., 2002] • Etc. • This talk: Markov logic [Domingos & Lowd, 2009]

  21. Markov Logic: A Unifying Framework • Probabilistic graphical models andfirst-order logic are special cases • Unified inference and learning algorithms • Easy-to-use software: Alchemy • Broad applicability • Goal of this tutorial:Quickly learn how to use Markov logic and Alchemy for a broad spectrum of NLP applications

  22. Overview • Motivation • Foundational areas • Probabilistic inference • Statistical learning • Logical inference • Inductive logic programming • Markov logic • NLP applications • Basics • Supervised learning • Unsupervised learning

  23. Markov Networks Smoking Cancer • Undirected graphical models Asthma Cough • Potential functions defined over cliques

  24. Markov Networks Smoking Cancer • Undirected graphical models Asthma Cough • Log-linear model: Weight of Feature i Feature i

  25. Markov Nets vs. Bayes Nets

  26. Inference in Markov Networks • Goal: compute marginals & conditionals of • Exact inference is #P-complete • Conditioning on Markov blanket is easy: • Gibbs sampling exploits this

  27. MCMC: Gibbs Sampling state← random truth assignment fori ←1tonum-samplesdo for eachvariable x sample x according to P(x|neighbors(x)) state←state with new value of x P(F) ← fraction of states in which F is true

  28. Other Inference Methods • Belief propagation (sum-product) • Mean field / Variational approximations

  29. MAP/MPE Inference • Goal: Find most likely state of world given evidence Query Evidence

  30. MAP Inference Algorithms • Iterated conditional modes • Simulated annealing • Graph cuts • Belief propagation (max-product) • LP relaxation

  31. Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Markov logic NLP applications Basics Supervised learning Unsupervised learning

  32. Generative Weight Learning • Maximize likelihood • Use gradient ascent or L-BFGS • No local maxima • Requires inference at each step (slow!) No. of times feature i is true in data Expected no. times feature i is true according to model

  33. Pseudo-Likelihood • Likelihood of each variable given its neighbors in the data • Does not require inference at each step • Widely used in vision, spatial statistics, etc. • But PL parameters may not work well forlong inference chains

  34. Discriminative Weight Learning • Maximize conditional likelihood of query (y) given evidence (x) • Approximate expected counts by counts in MAP state of y given x No. of true groundings of clause i in data Expected no. true groundings according to model

  35. Voted Perceptron • Originally proposed for training HMMs discriminatively • Assumes network is linear chain • Can be generalized to arbitrary networks wi← 0 fort←1toT do yMAP← Viterbi(x) wi←wi+ η[counti(yData) – counti(yMAP)] return wi / T

  36. Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Markov logic NLP applications Basics Supervised learning Unsupervised learning

  37. First-Order Logic • Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y) • Literal: Predicate or its negation • Clause: Disjunction of literals • Grounding: Replace all variables by constantsE.g.: Friends (Anna, Bob) • World (model, interpretation):Assignment of truth values to all ground predicates

  38. Inference in First-Order Logic • Traditionally done by theorem proving(e.g.: Prolog) • Propositionalization followed by model checking turns out to be faster (often by a lot) • Propositionalization:Create all ground atoms and clauses • Model checking: Satisfiability testing • Two main approaches: • Backtracking (e.g.: DPLL) • Stochastic local search (e.g.: WalkSAT)

  39. Satisfiability • Input: Set of clauses(Convert KB to conjunctive normal form (CNF)) • Output: Truth assignment that satisfies all clauses, or failure • The paradigmatic NP-complete problem • Solution: Search • Key point:Most SAT problems are actually easy • Hard region: Narrow range of#Clauses/#Variables

  40. Stochastic Local Search • Uses complete assignments instead of partial • Start with random state • Flip variables in unsatisfied clauses • Hill-climbing: Minimize # unsatisfied clauses • Avoid local minima: Random flips • Multiple restarts

  41. The WalkSAT Algorithm fori←1 to max-triesdo solution = random truth assignment for j←1tomax-flipsdo if all clauses satisfiedthen return solution c←random unsatisfied clause with probabilityp flip a random variable inc else flip variable in c that maximizes # satisfied clauses returnfailure

  42. Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Markov logic NLP applications Basics Supervised learning Unsupervised learning

  43. Rule Induction • Given: Set of positive and negative examples of some concept • Example:(x1, x2, … , xn, y) • y:concept (Boolean) • x1, x2, … , xn:attributes (assume Boolean) • Goal: Induce a set of rules that cover all positive examples and no negative ones • Rule: xa ^ xb ^ …  y (xa: Literal, i.e., xi or its negation) • Same as Horn clause: Body  Head • Rulercovers example x iff xsatisfies body of r • Eval(r): Accuracy, info gain, coverage, support, etc.

  44. Learning a Single Rule head ← y body←Ø repeat for eachliteral x rx← r with x added to body Eval(rx) body ← body ^ best x untilno x improves Eval(r) returnr

  45. Learning a Set of Rules R ← Ø S ← examples repeat learn a single ruler R ← R U { r } S ← S − positive examples covered by r untilS = Ø returnR

  46. First-Order Rule Induction • y and xiare now predicates with argumentsE.g.: y is Ancestor(x,y), xi is Parent(x,y) • Literals to add are predicates or their negations • Literal to add must include at least one variablealready appearing in rule • Adding a literal changes # groundings of ruleE.g.: Ancestor(x,z) ^ Parent(z,y)  Ancestor(x,y) • Eval(r) must take this into accountE.g.: Multiply by # positive groundings of rule still covered after adding literal

  47. Overview • Motivation • Foundational areas • Markov logic • NLP applications • Basics • Supervised learning • Unsupervised learning

  48. Markov Logic • Syntax: Weighted first-order formulas • Semantics: Feature templates for Markov networks • Intuition:Soften logical constraints • Give each formula a weight(Higher weight  Stronger constraint)

  49. Example: Coreference Resolution Barack Obama, the 44th President of the United States, is the first African American to hold the office. ……

  50. Example: Coreference Resolution

More Related