Information extraction data mining and joint inference
Download
1 / 78

Information Extraction, Data Mining and Joint Inference - PowerPoint PPT Presentation


  • 181 Views
  • Uploaded on

Information Extraction, Data Mining and Joint Inference. Andrew McCallum Computer Science Department University of Massachusetts Amherst. Joint work with Charles Sutton, Aron Culotta, Xuerui Wang, Ben Wellner, David Mimno, Gideon Mann. Goal:.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Information Extraction, Data Mining and Joint Inference' - kaethe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Information extraction data mining and joint inference l.jpg

Information Extraction,Data Miningand Joint Inference

Andrew McCallum

Computer Science Department

University of Massachusetts Amherst

Joint work with Charles Sutton, Aron Culotta, Xuerui Wang,

Ben Wellner, David Mimno, Gideon Mann.


Slide2 l.jpg

Goal:

Mine actionable knowledgefrom unstructured text.


Extracting job openings from the web l.jpg

foodscience.com-Job2

JobTitle: Ice Cream Guru

Employer: foodscience.com

JobCategory: Travel/Hospitality

JobFunction: Food Services

JobLocation: Upper Midwest

Contact Phone: 800-488-2611

DateExtracted: January 8, 2001

Source: www.foodscience.com/jobs_midwest.html

OtherCompanyJobs: foodscience.com-Job1

Extracting Job Openings from the Web



Slide5 l.jpg

Job Openings:

Category = High Tech

Keyword = Java

Location = U.S.



Ie from research papers l.jpg
IE from Research Papers

[McCallum et al ‘99]



Slide9 l.jpg

Mining Research Papers

[Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004]

[Giles et al]


Ie from chinese documents regarding weather l.jpg
IE fromChinese Documents regarding Weather

Department of Terrestrial System, Chinese Academy of Sciences

200k+ documents

several millennia old

- Qing Dynasty Archives

- memos

- newspaper articles

- diaries


What is information extraction l.jpg
What is “Information Extraction”

As a familyof techniques:

Information Extraction =

segmentation + classification + clustering + association

October 14, 2002, 4:00 a.m. PT

For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation

CEO

Bill Gates

Microsoft

Gates

Microsoft

Bill Veghte

Microsoft

VP

Richard Stallman

founder

Free Software Foundation


What is information extraction12 l.jpg
What is “Information Extraction”

As a familyof techniques:

Information Extraction =

segmentation + classification + association + clustering

October 14, 2002, 4:00 a.m. PT

For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation

CEO

Bill Gates

Microsoft

Gates

Microsoft

Bill Veghte

Microsoft

VP

Richard Stallman

founder

Free Software Foundation


What is information extraction13 l.jpg
What is “Information Extraction”

As a familyof techniques:

Information Extraction =

segmentation + classification+ association + clustering

October 14, 2002, 4:00 a.m. PT

For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation

CEO

Bill Gates

Microsoft

Gates

Microsoft

Bill Veghte

Microsoft

VP

Richard Stallman

founder

Free Software Foundation


What is information extraction14 l.jpg
What is “Information Extraction”

As a familyof techniques:

Information Extraction =

segmentation + classification+ association+ clustering

October 14, 2002, 4:00 a.m. PT

For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation

CEO

Bill Gates

Microsoft

Gates

Microsoft

Bill Veghte

Microsoft

VP

Richard Stallman

founder

Free Software Foundation

*

Free Soft..

Microsoft

Microsoft

TITLE ORGANIZATION

*

founder

*

CEO

VP

*

Stallman

NAME

Veghte

Bill Gates

Richard

Bill


From text to actionable knowledge l.jpg
From Text to Actionable Knowledge

Spider

Filter

Data

Mining

IE

Segment

Classify

Associate

Cluster

Discover patterns - entity types - links / relations - events

Database

Documentcollection

Actionableknowledge

Prediction

Outlier detection

Decision support


Slide16 l.jpg

Solution:

Uncertainty Info

Spider

Filter

Data

Mining

IE

Segment

Classify

Associate

Cluster

Discover patterns - entity types - links / relations - events

Database

Documentcollection

Actionableknowledge

Emerging Patterns

Prediction

Outlier detection

Decision support


Slide17 l.jpg

Discriminatively-trained undirected graphical models

Conditional Random Fields

[Lafferty, McCallum, Pereira]

Conditional PRMs

[Koller…], [Jensen…],

[Geetor…], [Domingos…]

Complex Inference and Learning

Just what we researchers like to sink our teeth into!

Solution:

Unified Model

Spider

Filter

Data

Mining

IE

Segment

Classify

Associate

Cluster

Discover patterns - entity types - links / relations - events

Probabilistic

Model

Documentcollection

Actionableknowledge

Prediction

Outlier detection

Decision support


Scientific questions l.jpg
Scientific Questions

  • What model structures will capture salient dependencies?

  • Will joint inference actually improve accuracy?

  • How to do inference in these large graphical models?

  • How to do parameter estimation efficiently in these models,which are built from multiple large components?

  • How to do structure discovery in these models?


Scientific questions19 l.jpg
Scientific Questions

  • What model structures will capture salient dependencies?

  • Will joint inference actually improve accuracy?

  • How to do inference in these large graphical models?

  • How to do parameter estimation efficiently in these models,which are built from multiple large components?

  • How to do structure discovery in these models?


Outline l.jpg
Outline

a

a

  • Examples of IE and Data Mining.

  • Motivate Joint Inference

  • Brief introduction to Conditional Random Fields

  • Joint inference: Examples

    • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation)

    • Joint Co-reference Resolution (Graph Partitioning)

    • Joint Co-reference with Weighted 1st-order Logic (MCMC)

    • Joint Relation Extraction and Data Mining (Bootstrapping)

  • Ultimate application area: Rexa, a Web portal for researchers


Linear chain conditional random fields l.jpg

where

Wide-spread interest, positive experimental results in many applications.

Noun phrase, Named entity [HLT’03], [CoNLL’03]Protein structure prediction [ICML’04]

IE from Bioinformatics text [Bioinformatics ‘04],…

Asian word segmentation [COLING’04], [ACL’04]IE from Research papers [HTL’04]

Object classification in images [CVPR ‘04]

(Linear Chain) Conditional Random Fields

[Lafferty, McCallum, Pereira 2001]

Undirected graphical model, trained to maximize

conditional probability of output (sequence) given input (sequence)

Finite state model

Graphical model

OTHERPERSONOTHERORGTITLE …

output seq

y

y

y

y

y

t+2

t+3

t

-

1

t

t+1

FSM states

. . .

observations

x

x

x

x

x

t

t

+2

+3

t

-

t

+1

t

1

input seq

said Jones a Microsoft VP …


Outline22 l.jpg
Outline

a

a

  • Examples of IE and Data Mining.

  • Motivate Joint Inference

  • Brief introduction to Conditional Random Fields

  • Joint inference: Examples

    • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation)

    • Joint Co-reference Resolution (Graph Partitioning)

    • Joint Co-reference with Weighted 1st-order Logic (MCMC)

    • Joint Relation Extraction and Data Mining (Bootstrapping)

  • Ultimate application area: Rexa, a Web portal for researchers

a


Jointly labeling cascaded sequences factorial crfs l.jpg
Jointly labeling cascaded sequencesFactorial CRFs

[Sutton, Khashayar,

McCallum, ICML 2004]

Named-entity tag

Noun-phrase boundaries

Part-of-speech

English words


Jointly labeling cascaded sequences factorial crfs24 l.jpg
Jointly labeling cascaded sequencesFactorial CRFs

[Sutton, Khashayar,

McCallum, ICML 2004]

Named-entity tag

Noun-phrase boundaries

Part-of-speech

English words


Jointly labeling cascaded sequences factorial crfs25 l.jpg
Jointly labeling cascaded sequencesFactorial CRFs

[Sutton, Khashayar,

McCallum, ICML 2004]

Named-entity tag

Noun-phrase boundaries

Part-of-speech

English words

But errors cascade--must be perfect at every stage to do well.


Jointly labeling cascaded sequences factorial crfs26 l.jpg
Jointly labeling cascaded sequencesFactorial CRFs

[Sutton, Khashayar,

McCallum, ICML 2004]

Named-entity tag

Noun-phrase boundaries

Part-of-speech

English words

Joint prediction of part-of-speech and noun-phrase in newswire,

matching accuracy with only 50% of the training data.

Inference:

Loopy Belief Propagation


Outline27 l.jpg
Outline

a

a

  • Examples of IE and Data Mining.

  • Motivate Joint Inference

  • Brief introduction to Conditional Random Fields

  • Joint inference: Examples

    • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation)

    • Joint Co-reference Resolution (Graph Partitioning)

    • Joint Co-reference with Weighted 1st-order Logic (MCMC)

    • Joint Relation Extraction and Data Mining (Bootstrapping)

  • Ultimate application area: Rexa, a Web portal for researchers


Joint co reference among all pairs affinity matrix crf l.jpg
Joint co-reference among all pairsAffinity Matrix CRF

“Entity resolution”“Object correspondence”

. . . Mr Powell . . .

45

. . . Powell . . .

Y/N

Y/N

-99

Y/N

~25% reduction in error on co-reference of

proper nouns in newswire.

11

. . . she . . .

Inference:

Correlational clustering

graph partitioning

[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]

[Bansal, Blum, Chawla, 2002]


Joint co reference for multiple entity types l.jpg
Joint Co-reference for Multiple Entity Types

[Culotta & McCallum 2005]

People

Stuart Russell

Y/N

Stuart Russell

Y/N

Y/N

S. Russel


Joint co reference for multiple entity types30 l.jpg
Joint Co-reference for Multiple Entity Types

[Culotta & McCallum 2005]

People

Organizations

Stuart Russell

University of California at Berkeley

Y/N

Y/N

Stuart Russell

Y/N

Berkeley

Y/N

Y/N

Y/N

S. Russel

Berkeley


Joint co reference for multiple entity types31 l.jpg
Joint Co-reference for Multiple Entity Types

[Culotta & McCallum 2005]

People

Organizations

Stuart Russell

University of California at Berkeley

Y/N

Y/N

Stuart Russell

Y/N

Berkeley

Y/N

Y/N

Y/N

Reduces error by 22%

S. Russel

Berkeley


Outline32 l.jpg
Outline

a

a

  • Examples of IE and Data Mining.

  • Motivate Joint Inference

  • Brief introduction to Conditional Random Fields

  • Joint inference: Examples

    • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation)

    • Joint Co-reference Resolution (Graph Partitioning)

    • Joint Co-reference with Weighted 1st-order Logic (MCMC)

    • Joint Relation Extraction and Data Mining (Bootstrapping)

  • Ultimate application area: Rexa, a Web portal for researchers

a

a

a


Sometimes pairwise comparisons are not enough l.jpg
Sometimes pairwise comparisonsare not enough.

  • Entities have multiple attributes (name, email, institution, location);need to measure “compatibility” among them.

  • Having 2 “given names” is common, but not 4.

  • Need to measure size of the clusters of mentions.

  •  a pair of lastname strings that differ > 5?

    We need measures on hypothesized “entities”

    We need First-order logic


Pairwise co reference features l.jpg

SamePerson(Howard Dean, Howard Martin)?

SamePerson(Dean Martin, Howard Dean)?

Pairwise Features

StringMatch(x1,x2)

EditDistance(x1,x2)

SamePerson(Dean Martin, Howard Martin)?

Pairwise Co-reference Features

Howard Dean

Dean Martin

Howard Martin


Cluster wise higher order representations l.jpg

First-Order Features

x1,x2StringMatch(x1,x2) x1,x2 ¬StringMatch(x1,x2)

x1,x2EditDistance>.5(x1,x2)

ThreeDistinctStrings(x1,x2, x3 )

N = 3

Cluster-wise (higher-order) Representations

Howard Dean

SamePerson(Howard Dean,

Howard Martin,

Dean Martin)?

Dean Martin

Howard Martin


Cluster wise higher order representations36 l.jpg

. . .

. . .

Combinatorial Explosion!

SamePerson(x1,x2 ,x3,x4 ,x5 ,x6)

SamePerson(x1,x2 ,x3,x4 ,x5)

SamePerson(x1,x2 ,x3,x4)

SamePerson(x1,x2 ,x3)

SamePerson(x1,x2)

Cluster-wise (higher-order) Representations

Dino

Martin

Dean Martin

Howard Dean

Howard Martin

Howie



Slide38 l.jpg

Markov Logic models: (Weighted 1st-order Logic)Using 1st-order Logic as a Template to Construct a CRF [Richardson & Domingos 2005]

ground Markov network

grounding Markov network requires space O(nr)

n = number constants r = highest clause arity


How can we perform inference and learning in models that cannot be grounded l.jpg
How can we perform modelsinference and learning in models that cannot be grounded?


Inference in first order models sat solvers l.jpg
Inference in First-Order Models modelsSAT Solvers

  • Weighted SAT solvers [Kautz et al 1997]

    • Requires complete grounding of network

  • LazySAT [Singla & Domingos 2006]

    • Saves memory by only storing clauses that may become unsatisfied

    • Still requires exponential time to visit all ground clauses at initialization.


Inference in first order models sampling l.jpg
Inference in First-Order Models modelsSampling

  • Gibbs Sampling

    • Difficult to move between high probability configurations by changing single variables

      • Although, consider MC-SAT [Poon & Domingos ‘06]

  • An alternative: Metropolis-Hastings sampling

    • Can be extended to partial configurations

      • Only instantiate relevant variables

    • Successfully used in BLOG models [Milch et al 2005]

    • 2 parts: proposal distribution, acceptance distribution.

[Culotta & McCallum 2006]


Proposal distribution l.jpg

Howard Martin models Howie Martin

Dean Martin Dino

Proposal Distribution

Dean Martin Howie Martin

Howard Martin Dino

y

y’


Proposal distribution43 l.jpg
Proposal Distribution models

Dean Martin Howie Martin

Howard Martin Dino

y

y’

Dean Martin Howie Martin Howard Martin Howie Martin


Proposal distribution44 l.jpg
Proposal Distribution models

y

y’

Dean Martin Howie Martin Howard Martin Howie Martin

Dean Martin Howie Martin

Howard Martin Dino


Inference with metropolis hastings l.jpg
Inference with Metropolis-Hastings models

  • y : configuration

  • p(y’)/p(y) : likelihood ratio

    • Ratio of P(Y|X)

    • ZX cancels

  • q(y’|y) : proposal distribution

    • probability of proposing move y y’


Experiments l.jpg
Experiments models

  • Paper citation coreference

  • Author coreference

  • First-order features

    • All Titles Match

    • Exists Year MisMatch

    • Average String Edit Distance > X

    • Number of mentions


Results on citation data l.jpg
Results on Citation Data models

Citeseer paper coreference results (pair F1)

Author coreference results (pair F1)


Outline48 l.jpg
Outline models

a

a

  • Examples of IE and Data Mining.

  • Motivate Joint Inference

  • Brief introduction to Conditional Random Fields

  • Joint inference: Examples

    • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation)

    • Joint Co-reference Resolution (Graph Partitioning)

    • Joint Co-reference with Weighted 1st-order Logic (MCMC)

    • Joint Relation Extraction and Data Mining (Bootstrapping)

  • Ultimate application area: Rexa, a Web portal for researchers

a

a

a

a


Slide49 l.jpg
Data models

  • 270 Wikipedia articles

  • 1000 paragraphs

  • 4700 relations

  • 52 relation types

    • JobTitle, BirthDay, Friend, Sister, Husband, Employer, Cousin, Competition, Education, …

  • Targeted for density of relations

    • Bush/Kennedy/Manning/Coppola families and friends


Slide51 l.jpg

sibling models

George HW Bush

Nancy Ellis Bush

son

son

cousin

George W Bush

John Prescott Ellis

Y

X

George W. Bush

…his father George H. W. Bush…

…his cousin John Prescott Ellis…

George H. W. Bush

…his sister Nancy Ellis Bush…

Nancy Ellis Bush

…her son John Prescott Ellis…

Cousin = Father’s Sister’s Son


Slide52 l.jpg

Name models

Son

Rosemary Forbes

John Kerry

James Forbes

Stuart Forbes

Name

Sibling

Rosemary Forbes

James Forbes

sibling

Rosemary Forbes

James Forbes

son

son

cousin

John Kerry

Stuart Forbes

John Kerry

…celebrated with Stuart Forbes…

likely a cousin


Iterative db construction l.jpg

Son models

Wife

Use relational features with “second-pass” CRF

Fill DB with “first-pass” CRF

Ronald Reagan George W. Bush

Iterative DB Construction

Joseph P. Kennedy, Sr

… son John F. Kennedy with Rose Fitzgerald

(0.3)


Results l.jpg

Member Of: 0.4211 -> 0.6093 models

Results

ME = maximum entropy CRF = conditional random field RCRF = CRF + mined features


Examples of discovered relational features l.jpg
Examples of Discovered Relational Features models

  • Mother: FatherWife

  • Cousin: MotherHusbandNephew

  • Friend: EducationStudent

  • Education: FatherEducation

  • Boss: BossSon

  • MemberOf: GrandfatherMemberOf

  • Competition: PoliticalPartyMemberCompetition


Outline56 l.jpg
Outline models

a

a

  • Examples of IE and Data Mining.

  • Motivate Joint Inference

  • Brief introduction to Conditional Random Fields

  • Joint inference: Examples

    • Joint Labeling of Cascaded Sequences (Loopy Belief Propagation)

    • Joint Co-reference Resolution (Graph Partitioning)

    • Joint Co-reference with Weighted 1st-order Logic (MCMC)

    • Joint Relation Extraction and Data Mining (Bootstrapping)

  • Ultimate application area: Rexa, a Web portal for researchers

a

a


Mining our research literature l.jpg
Mining our Research Literature models

  • Better understand structure of our own research area.

  • Structure helps us learn a new field.

  • Aid collaboration

  • Map how ideas travel through social networks of researchers.

  • Aids for hiring and finding reviewers!



Slide60 l.jpg

Previous Systems models

Cites

Research

Paper


Slide61 l.jpg

More Entities and Relations models

Expertise

Cites

Grant

Research

Paper

Person

Venue

University

Groups


Topical transfer l.jpg
Topical Transfer models

[Mann, Mimno, McCallum, JCDL 2006]

Citation counts from one topic to another. Map “producers and consumers”


Impact diversity l.jpg
Impact Diversity models

Topic Diversity: Entropy of the distribution of citing topics


Summary l.jpg
Summary models

  • Joint inference needed for avoiding cascading errors in information extraction and data mining.

  • Challenge: making inference & learning scale to massive graphical models.

    • Markov-chain Monte Carlo

  • Rexa: New research paper search engine, mining the interactions in our community.