1 / 54

Markov Logic: A Representation Language for Natural Language Semantics

Markov Logic: A Representation Language for Natural Language Semantics. Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on joint work with Stanley Kok, Matt Richardson and Parag Singla). Overview. Motivation Background Representation Inference Learning

kenda
Download Presentation

Markov Logic: A Representation Language for Natural Language Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Logic:A Representation Language forNatural Language Semantics Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on joint work with Stanley Kok,Matt Richardson and Parag Singla)

  2. Overview • Motivation • Background • Representation • Inference • Learning • Applications • Discussion

  3. Motivation • Natural language is characterized by • Complex relational structure • High uncertainty (ambiguity, imperfect knowledge) • First-order logic handles relational structure • Probability handles uncertainty • Let’s combine the two

  4. Markov Logic[Richardson & Domingos, 2006] • Syntax: First-order logic + Weights • Semantics: Templates for Markov nets • Inference: Weighted satisfiability + MCMC • Learning: Voted perceptron + ILP

  5. Overview • Motivation • Background • Representation • Inference • Learning • Applications • Discussion

  6. Markov Networks A B • Undirected graphical models C D • Potential functions defined over cliques

  7. Markov Networks A B • Undirected graphical models C D • Potential functions defined over cliques Weight of Feature i Feature i

  8. First-Order Logic • Constants, variables, functions, predicatesE.g.: Anna, X, mother_of(X), friends(X, Y) • Grounding: Replace all variables by constantsE.g.: friends (Anna, Bob) • World (model, interpretation):Assignment of truth values to all ground predicates

  9. Overview • Motivation • Background • Representation • Inference • Learning • Applications • Discussion

  10. Markov Logic Networks • A logical KB is a set of hard constraintson the set of possible worlds • Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible • Give each formula a weight(Higher weight  Stronger constraint)

  11. Definition • A Markov Logic Network (MLN) is a set of pairs (F, w) where • F is a formula in first-order logic • w is a real number • Together with a set of constants,it defines a Markov network with • One node for each grounding of each predicate in the MLN • One feature for each grounding of each formula F in the MLN, with the corresponding weight w

  12. Example: Friends & Smokers Suppose we have two constants: Anna (A) and Bob (B) Smokes(A) Smokes(B) Cancer(A) Cancer(B)

  13. Example: Friends & Smokers Suppose we have two constants: Anna (A) and Bob (B) Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Friends(B,B) Cancer(A) Cancer(B) Friends(B,A)

  14. Example: Friends & Smokers Suppose we have two constants: Anna (A) and Bob (B) Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Friends(B,B) Cancer(A) Cancer(B) Friends(B,A)

  15. Example: Friends & Smokers Suppose we have two constants: Anna (A) and Bob (B) Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Friends(B,B) Cancer(A) Cancer(B) Friends(B,A)

  16. More on MLNs • MLN is template for ground Markov nets • Typed variables and constants greatly reduce size of ground Markov net • Functions, existential quantifiers, etc. • MLN without variables = Markov network(subsumes graphical models)

  17. Relation to First-Order Logic • Infinite weights  First-order logic • Satisfiable KB, positive weights Satisfying assignments = Modes of distribution • MLNs allow contradictions between formulas

  18. Overview • Motivation • Background • Representation • Inference • Learning • Applications • Discussion

  19. MPE/MAP Inference • Find most likely truth values of non-evidence ground atoms given evidence • Apply weighted satisfiability solver(maxes sum of weights of satisfied clauses) • MaxWalkSat algorithm [Kautz et al., 1997] • Start with random truth assignment • With prob p, flip atom that maxes weight sum;else flip random atom in unsatisfied clause • Repeat n times • Restart m times

  20. Conditional Inference • P(Formula|MLN,C) = ? • MCMC: Sample worlds, check formula holds • P(Formula1|Formula2,MLN,C) = ? • If Formula2 = Conjunction of ground atoms • First construct min subset of network necessary to answer query (generalization of KBMC) • Then apply MCMC (or other)

  21. Ground Network Construction • Initialize Markov net to contain all query preds • For each node in network • Add node’s Markov blanket to network • Remove any evidence nodes • Repeat until done

  22. Probabilistic Inference • Recall • Exact inference is #P-complete • Conditioning on Markov blanket is easy: • Gibbs sampling exploits this

  23. Markov Chain Monte Carlo • Gibbs Sampler 1. Start with an initial assignment to nodes 2. One node at a time, sample node given others 3. Repeat 4. Use samples to compute P(X) • Apply to ground network • Initialization: MaxWalkSat • Can use multiple chains

  24. Overview • Motivation • Background • Representation • Inference • Learning • Applications • Discussion

  25. Learning • Data is a relational database • Closed world assumption (if not: EM) • Learning parameters (weights) • Generatively: Pseudo-likelihood • Discriminatively: Voted perceptron + MaxWalkSat • Learning structure • Generalization of feature induction in Markov nets • Learn and/or modify clauses • Inductive logic programming with pseudo-likelihood as the objective function

  26. Generative Weight Learning • Maximize likelihood (or posterior) • Use gradient ascent • Requires inference at each step (slow!) Feature count according to data Feature count according to model

  27. Pseudo-Likelihood [Besag, 1975] • Likelihood of each variable given its Markov blanket in the data • Does not require inference at each step • Widely used

  28. Optimization • Parameter tying over groundings of same clause • Maximize using L-BFGS [Liu & Nocedal, 1989] where nsati(x=v) is the number of satisfied groundingsof clause i in the training data when x takes value v • Most terms not affected by changes in weights • After initial setup, each iteration takesO(# ground predicates x # first-order clauses)

  29. Discriminative Weight Learning Gradient of Conditional Log Likelihood # true groundings of formula in DB Expected # of true groundings – slow! Approximate expected count by MAP count

  30. Voted Perceptron[Collins, 2002] • Used for discriminative training of HMMs • Expected count in gradient approximated by count in MAP state • MAP state found using Viterbi algorithm • Weights averaged over all iterations initialize wi=0 • for t=1 to Tdo • find the MAP configuration using Viterbi • wi, =  * (training count – MAP count) • end for

  31. Voted Perceptron for MLNs[Singla & Domingos, 2004] • HMM is special case of MLN • Expected count in gradient approximated by count in MAP state • MAP state found using MaxWalkSat • Weights averaged over all iterations initialize wi=0 • for t=1 to Tdo • find the MAP configuration using MaxWalkSat • wi, =  * (training count – MAP count) • end for

  32. Overview • Motivation • Background • Representation • Inference • Learning • Applications • Discussion

  33. Applications to Date • Entity resolution (Cora, BibServ) • Information extraction for biology(won LLL-2005 competition) • Probabilistic Cyc • Link prediction • Topic propagation in scientific communities • Etc.

  34. Entity Resolution • Most logical systems make unique names assumption • What if we don’t? • Equality predicate:Same(A,B), or A = B • Equality axioms • Reflexivity, symmetry, transitivity • For every unary predicate P: x1 = x2 => (P(x1) <=> P(x2)) • For every binary predicate R:x1 = x2  y1 = y2 => (R(x1,y1) <=> R(x2,y2)) • Etc. • But in Markov logic these are soft and learnable • Can also introduce reverse direction:R(x1,y1)  R(x2,y2)x1 = x2 => y1 = y2 • Surprisingly, this is all that’s needed

  35. Example: Citation Matching

  36. Markov Logic Formulation: Predicates • Are two bibliography records the same?SameBib(b1,b2) • Are two field values the same?SameAuthor(a1,a2)SameTitle(t1,t2)SameVenue(v1,v2) • How similar are two field strings?Predicates for ranges of cosine TF-IDF score:TitleTFIDF.0(t1,t2) is true iff TF-IDF(t1,t2)=0TitleTFIDF.2(a1,a2) is true iff 0 <TF-IDF(a1,a2) < 0.2Etc.

  37. Markov Logic Formulation: Formulas • Unit clauses (defaults):! SameBib(b1,b2) • Two fields are same => Corresponding bib. records are same:Author(b1,a1) Author(b2,a2) SameAuthor(a1,a2) => SameBib(b1,b2) • Two bib. records are same => Corresponding fields are same:Author(b1,a1) Author(b2,a2)  SameBib(b1,b2) => SameAuthor(a1,a2) • High similarity score => Two fields are same:TitleTFIDF.8(t1,t2) =>SameTitle(t1,t2) • Transitive closure (not incorporated in experiments):SameBib(b1,b2)  SameBib(b2,b3) => SameBib(b1,b3) • 25 predicates, 46 first-order clauses

  38. What Does This Buy You? • Objects are matched collectively • Multiple types matched simultaneously • Constraints are soft, and strengths can be learned from data • Easy to add further knowledge • Constraints can be refined from data • Standard approach still embedded

  39. Example Subset of a Bibliography Database

  40. Standard Approach [Fellegi & Sunter, 1969] Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? Sim(PKDD 04, 8th PKDD) Sim(PKDD 04, 8th PKDD) Venue Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author record-match node field-similarity node (evidence node)

  41. What’s Missing? Title Title Sim(Object Identification using CRF, Object Identification using CRF) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? Sim(PKDD 04, 8th PKDD) Sim(PKDD 04, 8th PKDD) Venue Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author If from b1=b2, you infer that “PKDD 04” is same as “8th PKDD”, how can you use that to help figure out if b3=b4?

  42. Merging the Evidence Nodes Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b3=b4 ? b1=b2 ? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author Author Still does not solve the problem. Why?

  43. Introducing Field-Match Nodes Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? field-match node b1.T=b2.T? b3.T=b4.T? b1.V=b2.V? b3.V=b4.V? b1.A=b2.A? b3.A=b4.A? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author Full representation in Collective Model

  44. Flow of Information Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? b1.T=b2.T? b3.T=b4.T? b1.V=b2.V? b3.V=b4.V? b1.A=b2.A? b3.A=b4.A? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author

  45. Flow of Information Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? b1.T=b2.T? b3.T=b4.T? b1.V=b2.V? b3.V=b4.V? b1.A=b2.A? b3.A=b4.A? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author

  46. Flow of Information Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? b1.T=b2.T? b3.T=b4.T? b1.V=b2.V? b3.V=b4.V? b1.A=b2.A? b3.A=b4.A? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author

  47. Flow of Information Title Title Sim(Object Identification using CRF, Object Identification using CRF) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? b1.T=b2.T? b3.T=b4.T? b1.V=b2.V? b3.V=b4.V? b1.A=b2.A? b3.A=b4.A? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author

  48. Flow of Information Title Title Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Learning Boolean Formulas, Learning of Boolean Expressions) b1=b2 ? b3=b4 ? b1.T=b2.T? b3.T=b4.T? b1.V=b2.V? b3.V=b4.V? b1.A=b2.A? b3.A=b4.A? Sim(PKDD 04, 8th PKDD) Venue Sim(Linda Stewart, Linda Stewart) Sim(Bill Johnson, William Johnson) Author Author

  49. Experiments • Databases: • Cora [McCallum et al., IRJ, 2000]:1295 records, 132 papers • BibServ.org [Richardson & Domingos, ISWC-03]:21,805 records, unknown #papers • Goal: De-duplicate bib.records, authors and venues • Pre-processing: Form canopies [McCallum et al, KDD-00 ] • Compared with naïve Bayes (standard method), etc. • Measured area under precision-recall curve (AUC) • Our approach wins across the board

  50. Results:Matching Venues on Cora

More Related