Discriminative training of markov logic networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 58

Discriminative Training of Markov Logic Networks PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Discriminative Training of Markov Logic Networks. Parag Singla & Pedro Domingos. Outline. Motivation Review of MLNs Discriminative Training Experiments Link Prediction Object Identification Conclusion and Future Work. Outline. Motivation Review of MLNs Discriminative Training

Download Presentation

Discriminative Training of Markov Logic Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Discriminative training of markov logic networks

Discriminative Training of Markov Logic Networks

Parag Singla & Pedro Domingos


Outline

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Outline1

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Markov logic networks mlns

Markov Logic Networks(MLNs)

  • AI systems must be able to learn, reason logically and handle uncertainty

  • Markov Logic Networks [Richardson and Domingos, 2004]- an effective way to combine first order logic and probability

  • Markov Networks are used as underlying representation

  • Features specfied using arbitrary formulas in finite first order logic


Training of mlns generative approach

Training of MLNs – Generative Approach

  • Optimize the joint distribution of all the variables

  • Parameters learnt independent of specific inference task

  • Maximum-likelihood (ML) training – computation of the gradient involves inference – too slow!

  • Use Psuedo-likelihood (PL) as an alternative – easy to compute

  • PL is suboptimal. Ignores any non-local interactions between variables

  • ML, PL – generative training approaches


Training of mlns discriminative approach

Training of MLNs -Discriminative Approach

  • No need to optimize the joint distribution of all the variables

  • Optimize the conditional likelihood (CL) of non-evidence variables given evidence variables

  • Parameters learnt for a specific inference task

  • Tends to do better than generative training in general


Why is discriminative better

Why is Discriminative Better?


Outline2

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Markov logic networks

Markov Logic Networks

  • A Markov Logic Network (MLN) is a set of pairs (F, w) where

    • F is a formula in first-order logic

    • w is a real number

  • Together with a finite set of constants,it defines a Markov network with

    • One node for each grounding of each predicate in the MLN

    • One feature for each grounding of each formula F in the MLN, with the corresponding weight w


Likelihood

1 if jth ground clause is true, 0 otherwise

Iterate over all ground clauses

Likelihood

# true groundings

of ith clause

Iterate over all MLN clauses


Gradient of log likelihood

Gradient of Log-Likelihood

Feature count according to data

Feature count according to model

1st term: # true groundings of formula in DB

2nd term: inference required (slow!)


Pseudo likelihood besag 1975

Pseudo-Likelihood [Besag, 1975]

  • Likelihood of each ground atom given its Markov blanket in the data

  • Does not require inference at each step

  • Optimized using L-BFGS [Liu & Nocedal, 1989]


Gradient of pseudo log likelihood

Gradient ofPseudo-Log-Likelihood

where nsati(x=v) is the number of satisfied groundingsof clause i in the training data when x takes value v

  • Most terms not affected by changes in weights

  • After initial setup, each iteration takesO(# ground predicates x # first-order clauses)


Outline3

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Conditional likelihood cl

Conditional Likelihood (CL)

Normalize over all possible configurations

of non-evidence variables

Non-evidence

variables

Iterate over all MLN clauses with at least one grounding

containing query variables

Evidence variables


Derivative of log cl

Derivative of log CL

1st term: # true groundings (involving query

variables) of formula in DB

2nd term: inference required, as before (slow!)


Derivative of log cl1

MAP state

Derivative of log CL

Approximate the expected count by MAP count


Approximating the expected count

Approximating the Expected Count

  • Use Voted Perceptron Algorithm [Collins, 2002]

    • Approximate the expected count by count for the most likely state (MAP) state

    • Used successfully for linear chain Markov networks

    • MAP state found using Viterbi


Voted perceptron algorithm

Voted Perceptron Algorithm

  • Initialize wi=0

  • For t=1 to T

    • Find the MAP configuration according to current set of weights.

    • wi,t=  * (training count – MAP count)

  • wi=wi,t/T (Avoids over-fitting)


Generalizing voted perceptron

Generalizing Voted Perceptron

  • Finding the MAP configuration NP hard for the general case.

  • Can be reduced to a weighted satisfiability (MaxSAT) problem.

    • Given a SAT formula in clausal form e.g. (x1 V x3 V x5) … (x5 Vx7 Vx50) with clause i having weight of wi

    • Find the assignment maximizing the sum of weights of satisfied clauses.


Maxwalksat

MaxWalkSAT

  • [Kautz, Selman & Jiang 97]

  • Assumes clauses with positive weights

  • Mixes greedy search with random walks

    • Start with some configuration of variables.

    • Randomly pick an unsatisfied clause.

    • With probability p, flip the literal in the clause which gives maximum gain. With probability 1-p flip a random literal in the clause.

    • Repeat for a pre-decided number of flips, storing the best seen configuration.


Handling the negative weights

Handling the Negative Weights

  • MLN allows formulas with negative weights.

  • A formula with weight w can be replaced by its negation with weight –w in the ground Markov network.

  • (x1 x3  x5) [w] => (x1 x3  x5) [-w]

    => (x1 x3 x5) [-w]

  • (x1 x3 x5) [-w] => x1 ,x3 ,x5 [ -w/3]


Weight initialization and learning rate

Weight Initialization and Learning Rate

  • Weights initialized using log odds of each clause being true in the data.

  • Determining the learning rate – use a validation set.

    • Learning rate  1/#(ground predicates)


Outline4

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Outline5

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Link prediction

Link Prediction

  • UW-CSE database

    • Used by Richardson & Domingos [2004]

    • Database of people/courses/publications at UW-CSE

    • 22 Predicates e.g. Student(P), Professor(P), AdvisedBy(P1,P2)

    • 1158 constants divided into 10 types

    • 4,055,575 ground atoms

    • 3212 true ground atoms

    • 94 hand coded rules stating various regularities

      • Student(P) => !Professor(P)

    • Predict AdvisedBy in the absence of information about the predicates Professor and Student


Systems compared

Systems Compared

  • MLN(VP)

  • MLN(ML)

  • MLN(PL)

  • KB

  • CL

  • NB

  • BN


Results on link prediction

Results on Link Prediction


Results on link prediction1

Results on Link Prediction


Outline6

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Object identification

Object Identification

  • Given a database of various records referring to objects in the real world

  • Each record represented by a set of attribute values

  • Want to find out which of the records refer to the same object

  • Example: A paper may have more than one reference in a bibliography database


Why is it important

Why is it Important?

  • Data Cleaning and Integration – first step in the KDD process

  • Merging of data from multiple sources results in duplicates

  • Entity Resolution: Extremely important for doing any sort of data-mining

  • State of the art – far from what is required.

  • Citeseer has 30 different entries for the AI textbook by Russell and Norvig


Standard approach

Standard Approach

  • [Fellegi & Sunter, 1969]

  • Look at each pair of records independently

  • Calculate the similarity score for each attribute value pair based on some metric

  • Find the overall similarity score

  • Merge the records whose similarity is above a threshold

  • Take a transitive closure


An example

An Example

Subset of a Bibliography Relation


Graphical representation in standard model

Graphical Representation in Standard Model

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

Sim(KDD 2004, SIGKDD 10)

Sim(KDD 2004, SIGKDD 10)

Venue

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author

Record-pair node

Evidence node


What s missing

What’s Missing?

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

Sim(KDD 2004, SIGKDD 10)

Sim(KDD 2004, SIGKDD 10)

Venue

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author

If from b1=b2, you infer that “KDD 2004” is same as “SIGKDD 10”, how can

you use that to help figure out if b3=b4?


Collective model basic idea

Collective Model – Basic Idea

  • Perform simultaneous inference for all the candidate pairs

  • Facilitate flow of information through shared attribute values


Representation in standard model

Representation in Standard Model

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b3=b4

?

b1=b2

?

Sim(KDD 2004, SIGKDD 10)

Sim(KDD 2004, SIGKDD 10)

Venue

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author

No sharing of nodes


Merging the evidence nodes

Merging the Evidence Nodes

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b3=b4

?

b1=b2

?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author

Author

Still does not solve the problem. Why?


Introducing information nodes

Introducing Information Nodes

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

Information node

b1.T=b2.T?

b3.T=b4.T?

b1.V=b2.V?

b3.V=b4.V?

b1.A=b2.A?

b3.A=b4.A?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author

Full representation in Collective Model


Flow of information

Flow of Information

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

b1.T=b2.T?

b3.T=b4.T?

b1.V=b2.V?

b3.V=b4.V?

b1.A=b2.A?

b3.A=b4.A?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author


Flow of information1

Flow of Information

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

b1.T=b2.T?

b3.T=b4.T?

b1.V=b2.V?

b3.V=b4.V?

b1.A=b2.A?

b3.A=b4.A?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author


Flow of information2

Flow of Information

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

b1.T=b2.T?

b3.T=b4.T?

b1.V=b2.V?

b3.V=b4.V?

b1.A=b2.A?

b3.A=b4.A?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author


Flow of information3

Flow of Information

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

b1.T=b2.T?

b3.T=b4.T?

b1.V=b2.V?

b3.V=b4.V?

b1.A=b2.A?

b3.A=b4.A?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author


Flow of information4

Flow of Information

Title

Title

Sim(Object Identification using MLNs,

Object Identification using MLNs)

Sim(Learning Boolean Formulas,

Leraning of Boolean Formulas)

b1=b2

?

b3=b4

?

b1.T=b2.T?

b3.T=b4.T?

b1.V=b2.V?

b3.V=b4.V?

b1.A=b2.A?

b3.A=b4.A?

Sim(KDD 2004, SIGKDD 10)

Venue

Sim(Linda Stewart,

Linda Stewart)

Sim(Bill Johnson,

William Johnson)

Author

Author


Mln predicates for de duplicating citation databases

MLN Predicates for De-Duplicating Citation Databases

  • If two bib entries are the same - SameBib(b1,b2)

  • If two field values are the same - SameAuthor(a1,a2), SameTitle(t1,t2), SameVenue(v1,v2)

  • If cosine based TFIDF score of two field values lies in a particular range (0, 0 - .2, .2 - .4, etc.) – 6 predicates for each field.

    • E.g. AuthorTFIDF.8(a1,a2) is true if TFIDF similarity score of a1,a2 is in the range (.2, .4]


Mln rules for de duplicating citation databases

MLN Rules for De-Duplicating Citation Databases

  • Singleton Predicates

    • ! SameBib(b1,b2)

  • Two fields are same => corresponding bib entries are same.

    • Author(b1,a1) Author(b2,a2)  SameAuthor(a1,a2)=> SameBib(b1,b2)

  • Two papers are same => corresponding fields are same

    • Author(b1,a1) Author(b2,a2)  SameBib(b1,b2)=> SameAuthor(a1,a2)

  • High similarity score => two fields are same

    • AuthorTFIDF.8(a1,a2) =>SameAuthor(a1,a2)

  • Transitive closure (currently not incorporated)

    • SameBib(b1,b2)  SameBib(b2,b3) => SameBib(b1,b3)

  • 25 first order predicates, 46 first order clauses.


Cora database

Cora Database

  • Cleaned up version of McCallum’s Cora database.

  • 1295 citations to 132 difference Computer Science research papers, each citation described by author, venue, title fields.

  • 401,552 ground atoms.

  • 82,026 tuples (true ground atoms)

  • Predict SameBib, SameAuthor, SameVenue


Systems compared1

Systems Compared

  • MLN(VP)

  • MLN(ML)

  • MLN(PL)

  • KB

  • CL

  • NB

  • BN


Results on cora

Results on Cora

Predicting the Citation Matches


Results on cora1

Results on Cora

Predicting the Citation Matches


Results on cora2

Results on Cora

Predicting the Author Matches


Results on cora3

Results on Cora

Predicting the Author Matches


Results on cora4

Results on Cora

Predicting the Venue Matches


Results on cora5

Results on Cora

Predicting the Venue Matches


Outline7

Outline

  • Motivation

  • Review of MLNs

  • Discriminative Training

  • Experiments

    • Link Prediction

    • Object Identification

  • Conclusion and Future Work


Conclusions

Conclusions

  • Markov Logic Networks – a powerful way of combining logic and probability.

  • MLNs can be discriminatively trained using a voted perceptron algorithm

  • Discriminatively trained MLNs perform better than purely logical approaches, purely probabilistic approaches as well as generatively trained MLNs.


Future work

Future Work

  • Discriminative learning of MLN structure

  • Max-margin type training of MLNs

  • Extensions of MaxWalkSAT

  • Further application to the link prediction, object identification and possibly other application areas.


  • Login