15 505 lecture 11 generative models for text classification and information extraction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
15-505: Lecture 11 Generative Models for Text Classification and Information Extraction PowerPoint Presentation
Download Presentation
15-505: Lecture 11 Generative Models for Text Classification and Information Extraction

Loading in 2 Seconds...

play fullscreen
1 / 62

15-505: Lecture 11 Generative Models for Text Classification and Information Extraction - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

15-505: Lecture 11 Generative Models for Text Classification and Information Extraction. Kamal Nigam. Some slides from William Cohen, Andrew McCallum. Text Classification by Example. Text Classification by Example. Text Classification by Example. Text Classification by Example.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '15-505: Lecture 11 Generative Models for Text Classification and Information Extraction' - leighanna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
15 505 lecture 11 generative models for text classification and information extraction

15-505: Lecture 11Generative Models for Text Classification and Information Extraction

Kamal Nigam

Some slides from William Cohen, Andrew McCallum

how could you build a text classifier
How could you build a text classifier?
  • Take some ideas from machine learning
    • Supervised learning setting
    • Examples of each class (a few or thousands)
  • Take some ideas from machine translation
    • Generative models
    • Language models
  • Simplify each and stir thoroughly
basic approach of generative modeling
Basic Approach of Generative Modeling
  • Pick representation for data
  • Write down probabilistic generative model
  • Estimate model parameters with training data
  • Turn model around to calculate unknown values for new data
na ve bayes bag of words representation
Naïve Bayes: Bag of Words Representation

Occurrence counts

All words in dictionary

Corn prices rose today while corn futures dropped in surprising trading activity. Corn ...

na ve bayes mixture of multinomials model
Naïve Bayes: Mixture of Multinomials Model
  • Pick the class: P(class)
  • For every word, pick from the class urn: P(word|class)

java

ball

modem

polo

soccer

windows

dropped

web

the

the

while

in

in

again

the

the

soccer

activity

windows

SPORTS

COMPUTERS

Word independence assumption!

na ve bayes estimating parameters
Naïve Bayes: Estimating Parameters
  • Just like estimating biased coin flip probabilities
  • Estimate MAP word probabilities:
  • Estimate MAP class priors:
na ve bayes performing classification
Naïve Bayes: Performing Classification
  • Word independence assumption
  • Take the class with the highest probability
classification tricks of the trade
Classification Tricks of the Trade
  • Stemming
    • run, runs, running, ran  run
    • table, tables, tabled  table
    • computer, compute, computing  compute
  • Stopwords
    • Very frequent function words generally uninformative
    • if, in, the, like, …
  • Information gain feature selection
    • Keep just most indicative words in the vocabulary
na ve bayes rules of thumb
Naïve Bayes Rules of Thumb
  • Need hundreds of labeled examples per class for good performance (~85% accuracy)
  • Stemming and stopwords may or may not help
  • Feature selection may or may not help
  • Predicted probabilities will be very extreme
  • Use sum of logs instead of multiplying probabilities for underflow prevention
  • Coding this up is trivial, either as a mapreduce or not
example a problem
Example: A Problem

Mt. Baker, the school district

Baker Hostetler, the company

Baker, a job opening

Genomics job

slide18

Job Openings:

Category = Food Services

Keyword = Baker

Location = Continental U.S.

extracting job openings from the web
Extracting Job Openings from the Web

Title: Ice Cream Guru

Description: If you dream of cold creamy…

Contact:susan@foodscience.com

Category: Travel/Hospitality

Function: Food Services

what is information extraction
What is Information Extraction?
  • Recovering structured data from formatted text
what is information extraction1
What is Information Extraction?
  • Recovering structured data from formatted text
    • Identifying fields (e.g. named entity recognition)
what is information extraction2
What is Information Extraction?
  • Recovering structured data from formatted text
    • Identifying fields (e.g. named entity recognition)
    • Understanding relations between fields (e.g. record association)
what is information extraction3
What is Information Extraction?
  • Recovering structured data from formatted text
    • Identifying fields (e.g. named entity recognition)
    • Understanding relations between fields (e.g. record association)
    • Normalization and deduplication
what is information extraction4
What is Information Extraction?
  • Recovering structured data from formatted text
    • Identifying fields (e.g. named entity recognition)
    • Understanding relations between fields (e.g. record association)
    • Normalization and deduplication
  • Today, focus on field identification
ie posed as a machine learning task
IE Posed as a Machine Learning Task
  • Training data: documents marked up with ground truth
  • In contrast to text classification, local features crucial. Features of:
    • Contents
    • Text just before item
    • Text just after item
    • Begin/end boundaries

00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun

prefix

contents

suffix

good features for information extraction
Good Features for Information Extraction

Creativity and Domain Knowledge Required!

contains-question-mark

contains-question-word

ends-with-question-mark

first-alpha-is-capitalized

indented

indented-1-to-4

indented-5-to-10

more-than-one-third-space

only-punctuation

prev-is-blank

prev-begins-with-ordinal

shorter-than-30

begins-with-number

begins-with-ordinal

begins-with-punctuation

begins-with-question-word

begins-with-subject

blank

contains-alphanum

contains-bracketed-number

contains-http

contains-non-space

contains-number

contains-pipe

Example word features:

  • identity of word
  • is in all caps
  • ends in “-ski”
  • is part of a noun phrase
  • is in a list of city names
  • is under node X in WordNet or Cyc
  • is in bold font
  • is in hyperlink anchor
  • features of past & future
  • last person name was female
  • next two words are “and Associates”
good features for information extraction1
Good Features for Information Extraction

Creativity and Domain Knowledge Required!

Is Capitalized

Is Mixed Caps

Is All Caps

Initial Cap

Contains Digit

All lowercase

Is Initial

Punctuation

Period

Comma

Apostrophe

Dash

Preceded by HTML tag

Character n-gram classifier says string is a person name (80% accurate)

In stopword list(the, of, their, etc)

In honorific list(Mr, Mrs, Dr, Sen, etc)

In person suffix list(Jr, Sr, PhD, etc)

In name particle list (de, la, van, der, etc)

In Census lastname list;segmented by P(name)

In Census firstname list;segmented by P(name)

In locations lists(states, cities, countries)

In company name list(“J. C. Penny”)

In list of company suffixes(Inc, & Associates, Foundation)

Word Features

  • lists of job titles,
  • Lists of prefixes
  • Lists of suffixes
  • 350 informative phrases

HTML/Formatting Features

  • {begin, end, in} x {<b>, <i>, <a>, <hN>} x{lengths 1, 2, 3, 4, or longer}
  • {begin, end} of line
landscape of ml techniques for ie

Sliding Window

Abraham Lincoln was born in Kentucky.

Classifier

which class?

Try alternatewindow sizes:

Boundary Models

Finite State Machines

Wrapper Induction

Abraham Lincoln was born in Kentucky.

Abraham Lincoln was born in Kentucky.

<b><i>Abraham Lincoln</i></b> was born in Kentucky.

BEGIN

Most likely state sequence?

Learn and apply pattern for a website

<b>

Classifier

<i>

which class?

PersonName

BEGIN

END

BEGIN

END

Landscape of ML Techniques for IE:

Classify Candidates

Abraham Lincoln was born in Kentucky.

Classifier

which class?

Any of these models can be used to capture words, formatting or both.

information extraction by sliding windows
Information Extraction by Sliding Windows

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

information extraction by sliding windows1
Information Extraction by Sliding Windows

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

information extraction by sliding window
Information Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

information extraction by sliding window1
Information Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

information extraction by sliding window2
Information Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

information extraction with sliding windows
Information Extraction with Sliding Windows

[Freitag 97, 98; Soderland 97; Califf 98]

00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun

w t-m

w t-1

w t

w t+n

w t+n+1

w t+n+m

prefix

contents

suffix

  • Standard supervised learning setting
    • Positive instances: Windows with real label
    • Negative instances: All other windows
    • Features based on candidate, prefix and suffix
  • Special-purpose rule learning systems work well

courseNumber(X) :-

tokenLength(X,=,2),

every(X, inTitle, false),

some(X, A, <previousToken>, inTitle, true),

some(X, B, <>. tripleton, true)

ie by boundary detection
IE by Boundary Detection

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

ie by boundary detection1
IE by Boundary Detection

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

ie by boundary detection2
IE by Boundary Detection

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

ie by boundary detection3
IE by Boundary Detection

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

ie by boundary detection4
IE by Boundary Detection

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell

School of Computer Science

Carnegie Mellon University

3:30 pm

7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

bwi learning to detect boundaries
BWI: Learning to detect boundaries

[Freitag & Kushmerick, AAAI 2000]

  • Another formulation: learn three probabilistic classifiers:
    • START(i) = Prob( position i starts a field)
    • END(j) = Prob( position j ends a field)
    • LEN(k) = Prob( an extracted field has length k)
  • Then score a possible extraction (i,j) by

START(i) * END(j) * LEN(j-i)

  • LEN(k) is estimated from a histogram
  • START(i) and END(j) learned by boosting over simple boundary patterns and features
problems with sliding windows and boundary finders
Problems with Sliding Windows and Boundary Finders
  • Decisions in neighboring parts of the input are made independently from each other.
    • Sliding Window may predict a “seminar end time” before the “seminar start time”.
    • It is possible for two overlapping windows to both be above threshold.
    • In a Boundary-Finding system, left boundaries are laid down independently from right boundaries, and their pairing happens as a separate step.
citation parsing
Citation Parsing
  • Fahlman, Scott & Lebiere, Christian(1989).The cascade-correlation learning architecture.Advances in Neural Information Processing Systems, pp. 524-532.
  • Fahlman, S.E. and Lebiere, C.,“The Cascade Correlation Learning Architecture,”Neural Information Processing Systems, pp. 524-532, 1990.
  • Fahlman, S. E.(1991)The recurrent cascade-correlation learning architecture.NIPS 3, 190-205.
can we do this with probabilistic generative models
Can we do this with probabilistic generative models?
  • Could have classes for {author, title, journal, year, pages}
  • Could classify every word or sequence?
    • Which sequences?
  • Something interesting in the sequence of fields that we’d like to capture
    • Authors come first
    • Title comes before journal
    • Page numbers come near the end
hidden markov models the representation
Hidden Markov Models: The Representation
  • A document is a sequence of words
  • Each word is tagged by its class
  • fahlman s e and lebiere c the cascade correlation learning architectureneural information processing systems pp 524 532 1990
hmm generative model 1
HMM: Generative Model (1)

Journal

Author

Title

Year

Pages

hmm generative model 3
HMM: Generative Model (3)
  • States: xi
  • State transitions: P(xi|xj) = a[xi|xj]
  • Output probabilities: P(oi|xj) = b[oi|xj]
  • Markov independence assumption
hmms estimating parameters
HMMs: Estimating Parameters
  • With fully-labeled data, just like naïve Bayes
  • Estimate MAP output probabilities:
  • Estimate MAP state transitions:
hmms performing extraction
HMMs: Performing Extraction
  • Given output words:
    • fahlman s e 1991 the recurrent cascade correlation learning architecture nips 3 190 205
  • Find state sequence that maximizes:
  • Lots of possible state sequences to test (514)

Hmm…

hmm example nymble
HMM Example: Nymble

[Bikel, et al 97]

Task: Named Entity Extraction

  • Bigram within classes
  • Backoff to unigram
  • Special capitalization and number features…

Person

end-of-sentence

start-of-sentence

Org

(Five other name classes)

Other

Train on 450k words of news wire text.

Results:

Case Language F1 .

Mixed English 93%

Upper English 91%

Mixed Spanish 90%

hmms a plethora of applications
HMMs: A Plethora of Applications
  • Information extraction
  • Part of speech tagging
  • Word segmentation
  • Gene finding
  • Protein structure prediction
  • Speech recognition
  • Economics, Climatology, Robotics, …