Learning to construct and reason with a large kb of extracted information
Download
1 / 57

Learning to Construct and Reason with a Large KB of Extracted Information - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

Learning to Construct and Reason with a Large KB of Extracted Information. William W. Cohen Machine Learning Dept and Language Technology Dept joint work with: Tom Mitchell, Ni Lao, William Wang, Kathryn Rivard Mazaitis,

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Learning to Construct and Reason with a Large KB of Extracted Information' - tekla


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Learning to construct and reason with a large kb of extracted information

Learning to Construct and Reason with a Large KB of Extracted Information

William W. CohenMachine Learning Dept and Language Technology Dept

joint work with:

Tom Mitchell, Ni Lao, William Wang, Kathryn Rivard Mazaitis,

Richard Wang, Frank Lin, Ni Lao, Estevam Hruschka, Jr., Burr Settles, Partha Talukdar, Derry Wijaya, Edith Law, Justin Betteridge, Jayant Krishnamurthy, Bryan Kisiel, Andrew Carlson, Weam Abu Zaki , Bhavana Dalvi, Malcolm Greaves,

Lise Getoor, Jay Pujara, Hui Miao, …


Outline
Outline Extracted Information

  • Background: information extraction and NELL

  • Key ideas in NELL

    • Coupled learning

    • Multi-view, multi-strategy learning

  • Inference in NELL

    • Inference as another learning strategy

      • Learning in graphs

      • Path Ranking Algorithm

      • ProPPR

    • Promotion as inference

  • Conclusions & summary


But first some backstory
But first….some backstory Extracted Information


And an unrelated project
..and an unrelated project… Extracted Information


Called simstudent
..called SimStudent… Extracted Information


SimStudent will learn rules to solve a problem step-by-step Extracted Informationand guide a student through how solve problems step-by-step


Quinlan’s FOIL Extracted Information


Summary of simstudent
Summary of SimStudent Extracted Information

  • Possible for a human author (eg middle school teacher) to build an ITS system

    • by building a GUI, then demonstrating problem solving and having the system learn how from examples

  • The rules learned by SimStudent can be used to construct a “student model”

    • with parameter tuning this can predict how well individual students will learn

    • better than state-of-the-art in some cases!

  • AI problem solving with a cognitively predictive model … and ILP is a key component!


Information extraction
Information Extraction Extracted Information

  • Goal:

    • Extract facts about the world automatically by reading text

    • IE systems are usually based on learning how to recognize facts in text

      • .. and then (sometimes) aggregating the results

      • Latest-generation IE systems need not require large amounts of training

      • … and IE does not necessarily require subtle analysis of any particular piece of text


Never ending language learning nell
Never Ending Language Learning (NELL) Extracted Information

  • NELL is a broad-coverage IE system

    • Simultaneously learning 500-600 concepts and relations (person, celebrity, emotion, aquiredBy, locatedIn, capitalCityOf, ..)

    • Starting point: containment/disjointness relations between concepts, types for relations, and O(10) examples per concept/relation

    • Uses 500M web page corpus + live queries

    • Running (almost) continuously for over three years

    • Has learned over 50M beliefs, over 1M high-confidence ones

      • about 85% of high-confidence beliefs are correct


Demo Extracted Information

  • http://rtw.ml.cmu.edu/rtw/


Nell screenshots
NELL Screenshots Extracted Information


Nell screenshots1
NELL Screenshots Extracted Information


Nell screenshots2
NELL Screenshots Extracted Information


More examples of what nell knows
More examples of what NELL knows Extracted Information


Outline1
Outline Extracted Information

  • Background: information extraction and NELL

  • Key ideas in NELL

    • Coupled learning

    • Multi-view, multi-strategy learning

  • Inference in NELL

    • Inference as another learning strategy

      • Learning in graphs

      • Path Ranking Algorithm

      • ProPPR

    • Promotion as inference

  • Conclusions & summary


Bootstrapped ssl learning of lexical patterns
Bootstrapped SSL learning of lexical patterns Extracted Information

it’s underconstrained!!

Extract cities:

Paris

Pittsburgh

Seattle

Cupertino

San Francisco

Austin

denial

anxiety

selfishness

Berlin

mayor of arg1

live in arg1

arg1 is home of

traits such as arg1

Given: four seed examples of the class “city”


One Key to Accurate Semi-Supervised Learning Extracted Information

teamPlaysSport(t,s)

playsForTeam(a,t)

person

playsSport(a,s)

sport

team

athlete

coach

coach(NP)

coachesTeam(c,t)

NP

NP1

NP2

Krzyzewski coaches the Blue Devils.

Krzyzewski coaches the Blue Devils.

much easier (more constrained)

semi-supervised learning problem

hard (underconstrained)

semi-supervised learning problem

Easier to learn manyinterrelated tasks than one isolated task

Also easier to learn using many different types of information


Outline2
Outline Extracted Information

  • Background: information extraction and NELL

  • Key ideas in NELL

    • Coupled learning

    • Multi-view, multi-strategy learning

  • Inference in NELL

    • Inference as another learning strategy

      • Learning in graphs

      • Path Ranking Algorithm

      • ProPPR

    • Promotion as inference

  • Conclusions & summary


Another key idea: Extracted Informationuse multiple types of information

evidence integration

CBL

text extraction patterns

SEAL

HTML extraction patterns

Morph

Morphologybased extractor

PRA

learned inference rules

Ontology and populated KB

the Web


Outline3
Outline Extracted Information

  • Background: information extraction and NELL

  • Key ideas in NELL

    • Coupled learning

    • Multi-view, multi-strategy learning

  • Inference in NELL

    • Inference as another learning strategy

      • Background: Learning in graphs

      • Path Ranking Algorithm

      • ProPPR

    • Promotion as inference

  • Conclusions & summary


Background personal info management as similarity queries on a graph
Background: Personal Info Management as Similarity Queries on a Graph

[SIGIR 2006, EMNLP 2008, TOIS 2010]

NSF

Term In Subject

Einat Minkov, Univ Haifa

Sent To

William

graph

proposal

CMU

6/17/07

6/18/07

[email protected]


Learning about graph similarity
Learning about graph similarity on a Graph

  • Personalized PageRank aka Random Walk with Restart:

    • Similarity measure for nodes in a graph, analogous to TFIDF for text in a WHIRL database

    • natural extension to PageRank

    • amenable to learning parameters of the walk (gradient search, w/ various optimization metrics):

      • Toutanova, Manning & NG, ICML2004; Nie et al, WWW2005; Xi et al, SIGIR 2005

    • or: reranking, etc

    • queries:

      Given type t* and node x, find y:T(y)=t* and y~x

      Given type t* and nodes X, find y:T(y)=t* and y~X


Many tasks can be reduced to similarity queries
Many tasks can be reduced to similarity queries on a Graph

Person namedisambiguation

[ term “andy”file msgId ]

“person”

Threading

  • What are the adjacent messages in this thread?

  • A proxy for finding “more messages like this one”

[ file msgId ]

“file”

Alias finding

What are the email-addresses of Jason ?...

[ term Jason ]

“email-address”

Meeting attendees finder

Which email-addresses (persons) should I notify about this meeting?

[ meeting mtgId ]

“email-address”


Learning about graph similarity the next generation
Learning about graph similarity: on a Graphthe next generation

  • Personalized PageRank aka Random Walk with Restart:

    • Given type t* and nodes X, find y:T(y)=t* and y~X

  • Ni Lao’s thesis (2012): New, better learning methods

    • richer parameterization

    • faster PPR inference

    • structure learning

  • Other tasks:

    • relation-finding in parsed text

    • information management for biologists

    • inference in large noisy knowledge bases


Lao: A learned random walk strategy is a on a Graphweighted set of random-walk “experts”, each of which is a walk constrained by a path (i.e., sequence of relations)

Recommending papers to cite in a paper being prepared

1) papers co-cited with on-topic papers

6) approx. standard IR retrieval

7,8) papers cited during the past two years

12-13) papers published during the past two years


Another study learning inference rules for a noisy kb lao cohen mitchell 2011 lao et al 2012
Another study: on a Graphlearning inference rules for a noisy KB(Lao, Cohen, Mitchell 2011)(Lao et al, 2012)

Random walk interpretation is crucial

Synonyms of the query team

i.e. 10-15 extra points in MRR


Another key idea: on a Graphuse multiple types of information

evidence integration

CBL

text extraction patterns

SEAL

HTML extraction patterns

Morph

Morphologybased extractor

PRA

learned inference rules

Ontology and populated KB

the Web


Outline4
Outline on a Graph

  • Background: information extraction and NELL

  • Key ideas in NELL

  • Inference in NELL

    • Inference as another learning strategy

      • Background: Learning in graphs

      • Path Ranking Algorithm

      • PRA + FOL: ProPPR and joint learning for inference

    • Promotion as inference

  • Conclusions & summary


How can you extend pra to
How can you extend PRA to on a Graph

  • Non-binary predicates?

  • Paths that include constants?

  • Recursive rules?

  • …. ?

  • Current direction: using ideas from PRA in a general first-order logic: ProPPR


A limitation on a Graph

  • Paths are learned separately for each relation type, and one learned rule can’t call another

  • PRA can learn this….

athletePlaySportViaRule(Athlete,Sport) 

onTeamViaKB(Athlete,Team), teamPlaysSportViaKB(Team,Sport)

teamPlaysSportViaRule(Team,Sport) 

memberOfViaKB(Team,Conference),

hasMemberViaKB(Conference,Team2),

playsViaKB(Team2,Sport).

teamPlaysSportViaRule(Team,Sport) 

onTeamViaKB(Athlete,Team), athletePlaysSportViaKB(Athlete,Sport)


A limitation on a Graph

  • Paths are learned separately for each relation type, and one learned rule can’t call another

  • But PRA can’t learn this…..

athletePlaySport(Athlete,Sport) 

onTeam(Athlete,Team), teamPlaysSport(Team,Sport)

athletePlaySport(Athlete,Sport)  athletePlaySportViaKB(Athlete,Sport)

teamPlaysSport(Team,Sport) 

memberOf(Team,Conference),

hasMember(Conference,Team2),

plays(Team2,Sport).

teamPlaysSport(Team,Sport) 

onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport)

teamPlaysSport(Team,Sport)  teamPlaysSportViaKB(Team,Sport)


Solution: a major extension from PRA to include large subset of Prolog

athletePlaySport(Athlete,Sport) 

onTeam(Athlete,Team), teamPlaysSport(Team,Sport)

athletePlaySport(Athlete,Sport)  athletePlaySportViaKB(Athlete,Sport)

teamPlaysSport(Team,Sport) 

memberOf(Team,Conference),

hasMember(Conference,Team2),

plays(Team2,Sport).

teamPlaysSport(Team,Sport) 

onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport)

teamPlaysSport(Team,Sport)  teamPlaysSportViaKB(Team,Sport)


Sample proppr program
Sample ProPPR program…. of Prolog

features of rules

(vars from head ok)

Horn rules


And search space
.. and search space… of Prolog

D’oh! This is a graph!


  • Score for a query soln (e.g., “Z=sport” for “about(a,Z)”) depends on probability of reaching a ☐ node*

    • learn transition probabilities based on features of the rules

    • implicit “reset” transitions with (p≥α) back to query node

  • Looking for answers supported by many short proofs

*Exactly as in Stochastic Logic Programs

[Cussens, 2001]

“Grounding” size is O(1/αε) … ie independent of DB size  fast approx incremental inference (Reid,Lang,Chung, 08)

Learning: supervised variant of personalized PageRank (Backstrom & Leskovic, 2011)


Sample task citation matching
Sample Task: Citation Matching “about(a,Z)”) depends on

  • Task:

    • citation matching (Alchemy: Poon & Domingos).

  • Dataset:

    • CORA dataset, 1295 citations of 132 distinct papers.

  • Training set: section 1-4.

  • Test set: section 5.

  • ProPPR program:

    • translated from corresponding Markov logic network (dropping non-Horn clauses)

  • # of rules: 21.


Task citation matching
Task: Citation Matching “about(a,Z)”) depends on


Time citation matching vs alchemy
Time: Citation Matching “about(a,Z)”) depends on vs Alchemy

“Grounding” is independent of DB size


Accuracy citation matching
Accuracy: Citation Matching “about(a,Z)”) depends on

Our rules

UW rules

AUC scores: 0.0=low, 1.0=hi

w=1 is before learning


It gets better
It gets better….. “about(a,Z)”) depends on

  • Learning uses many example queries

    • e.g: sameCitation(c120,X) with X=c123+, X=c124-, …

  • Each query is grounded to a separate small graph (for its proof)

  • Goal is to tune weights on these edge features to optimize RWR on the query-graphs.

  • Can do SGD and run RWR separately on each query-graph

    • Graphs do share edge features, so there’s some synchronization needed



Back to NELL…… “groundings” of each query

evidence integration

CBL

text extraction patterns

SEAL

HTML extraction patterns

Morph

Morphologybased extractor

PRA

learned inference rules

Ontology and populated KB

the Web


  • Experiment: “groundings” of each query

  • Take top K paths for each predicate learned by Lao’s PRA

    • (I don’t know how to do structure learning for ProPPR yet)

  • Convert to a mutually recursive ProPPR program

  • Train weights on entire program

athletePlaySport(Athlete,Sport) 

onTeam(Athlete,Team), teamPlaysSport(Team,Sport)

athletePlaySport(Athlete,Sport)  athletePlaySportViaKB(Athlete,Sport)

teamPlaysSport(Team,Sport) 

memberOf(Team,Conference),

hasMember(Conference,Team2),

plays(Team2,Sport).

teamPlaysSport(Team,Sport) 

onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport)

teamPlaysSport(Team,Sport)  teamPlaysSportViaKB(Team,Sport)


More details
More details “groundings” of each query

  • Train on NELL’s KB as of iteration 713

  • Test on new facts from later iterations

  • Try three “subdomains” of NELL

    • pick a seed entity S

    • pick top M entities nodes in a (simple untyped RWR) from S

    • project KB to just these M entities

    • look at three subdomains, six values of M


Outline5
Outline “groundings” of each query

  • Background: information extraction and NELL

  • Key ideas in NELL

    • Coupled learning

    • Multi-view, multi-strategy learning

  • Inference in NELL

    • Inference as another learning strategy

      • Learning in graphs

      • Path Ranking Algorithm

      • ProPPR

    • Promotion as inference

  • Conclusions & summary


More detail on nell
More detail on NELL “groundings” of each query

  • For iteration i=1,….,715,…:

    • For each view (lexical patterns, …, PRA):

      • Distantly-train for that view using KBi

      • Propose new “candidate beliefs” based on the learned view-specific classifier

    • Hueristically find the “best” candidate beliefs and “promote” them into KBi+1

Not obvious how to promote in a principled way …


Promotion identifying new correct extractions from a pool of noisy extractions
Promotion: identifying new “groundings” of each querycorrect extractions from a pool of noisy extractions

  • Many types of noise are possible:

    • co-referent entities

    • missing or spurious labels

    • missing or spurious relations

    • violations of ontology (e.g., an athlete that is not a person)

  • Identifying true extractions requires joint reasoning, e.g.

    • Pooling information about co-referent entities

    • Enforcing mutual exclusion of labels and relations

  • Problem: How can we integrate extractions from multiple sources in the presence of ontological constraints at the scale of millions of extractions?


An example
An example “groundings” of each query

A knowledge graph view of NELL’s extractions

Sample Extractions:

Lbl(Kyrgyzstan, bird)

Lbl(Kyrgyzstan, country)

Lbl(Kyrgyz Republic, country)

Rel(Kyrgyz Republic, Bishkek,

hasCapital)

SameEnt

Kyrgyzstan

Kyrgyz Republic

Lbl

Lbl

Lbl

Ontology:

Dom(hasCapital, country)

Mut(country, bird)

Dom

country

Rel(hasCapital)

Mut

bird

Entity Resolution:

SameEnt(Kyrgyz Republic,

Kyrgyzstan)

Bishkek

What you want

Kyrgyzstan

Kyrgyz Republic

Rel(hasCapital)

Lbl

Bishkek

country


Knowledge graph

graph identification “groundings” of each query

Knowledge graph

Representation as a noisy knowledge graph

SameEnt

Kyrgyzstan

Kyrgyz Republic

Lbl

Lbl

Lbl

Dom

country

Rel(hasCapital)

Mut

bird

Bishkek

Lise Getoor, Jay Pujara, and Hui Miao @ UMD

After Knowledge Graph Identification

Kyrgyzstan

Kyrgyz Republic

Rel(hasCapital)

Lbl

Bishkek

country


Graph identification as joint reasoning probabilistic soft logic psl
Graph Identification as Joint Reasoning: Probabilistic Soft Logic (PSL)

  • Templating language for hinge-loss MRFs, much more scalable!

  • Model specified as a collection of logical formulas

    • Formulas are ground by substituting literal values

    • Truth values of atoms relaxed to [0,1] interval

    • Truth values of formulas derived from Lukasiewicz t-norm

  • Each ground rule, r, has a weighted potential, ϕr corresponding to a distance to satisfaction

  • PSL defines a probability distribution over atom truth value assignments, I:

  • Most probable explanation (MPE) inference is convex

  • Running time scales linearly with grounded rules (|R|)


Psl representation of heuristics for promotion
PSL Representation of Heuristics for Promotion Logic (PSL)

Promote any candidate

Promote “hints” (old promotion strategy)

Be consistent about labels for duplicate entities


Psl representation of ontological rules
PSL Representation of Ontological Rules Logic (PSL)

Be consistent with constraints from ontology

Too expressive for ProPPR

Adapted from Jiang et al., ICDM 2012


Datasets results
Datasets & Results Logic (PSL)

  • Evaluation on NELL dataset from iteration 165:

    • 1.7M candidate facts

    • 70K ontological constraints

  • Predictions on 25K facts from a 2-hop neighborhood around test data

  • Beats other methods, runs in just 10 seconds!


Summary
Summary Logic (PSL)

  • Background: information extraction and NELL

  • Key ideas in NELL

    • Coupled learning

    • Multi-view, multi-strategy learning

  • Inference in NELL

    • Inference as another learning strategy

      • Learning in graphs

      • Path Ranking Algorithm

      • ProPPR

    • Promotion as inference

  • Conclusions & summary


ad