splitting words l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Splitting Words PowerPoint Presentation
Download Presentation
Splitting Words

Loading in 2 Seconds...

play fullscreen
1 / 32

Splitting Words - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

Splitting Words. a.k.a. “Word Sense Disambiguation”. slide courtesy of D. Yarowsky. slide courtesy of D. Yarowsky. slide courtesy of D. Yarowsky. slide courtesy of D. Yarowsky. slide courtesy of D. Yarowsky. slide courtesy of D. Yarowsky. slide courtesy of D. Yarowsky.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Splitting Words' - liesel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
splitting words

Splitting Words

a.k.a. “Word Sense Disambiguation”

600.465 - Intro to NLP - J. Eisner

representing word as vector
Representing Word as Vector
  • Could average over many occurrences of the word ...
  • Each word type has a different vector?
  • Each word token has a different vector?
  • Each word sense has a different vector?(for this one, we need sense-tagged training data)(is this more like a type vector or a token vector?)
  • What is each of these good for?

600.465 - Intro to NLP - J. Eisner

each word type has a different vector
Each word type has a different vector
  • We saw this yesterday
  • It’s good for grouping words
    • similar semantics?
    • similar syntax?
    • depends on how you build the vector

600.465 - Intro to NLP - J. Eisner

each word token has a different vector
Each word token has a different vector
  • Good for splitting words - unsupervised WSD
  • Cluster the tokens: each cluster is a sense!
  • have turned it into the hot dinner-party topic. The comedy is the
  • selection for the World Cup party, which will be announced on May 1
  • the by-pass there will be a street party. "Then," he says, "we are going
  • in the 1983 general election for a party which, when it could not bear to
  • to attack the Scottish National Party , who look set to seize Perth and
  • number-crunchers within the Labour party, there now seems little doubt
  • that had been passed to a second party who made a financial decision
  • A future obliges each party to the contract to fulfil it by

600.465 - Intro to NLP - J. Eisner

each word sense has a different vector

?

?

Each word sense has a different vector
  • Represent each new word token as vector, too
  • Now assign each token the closest sense
    • (could lump together all tokens of the word in the same document: assume they all have same sense)
  • have turned it into the hot dinner-party topic. The comedy is the
  • selection for the World Cup party, which will be announced on May 1
  • the by-pass there will be a street party. "Then," he says, "we are going
  • let you know that there’s a party at my house tonight. Directions: Drive
  • in the 1983 general election for a party which, when it could not bear to
  • to attack the Scottish National Party , who look set to seize Perth and
  • number-crunchers within the Labour party, there now seems little doubt

600.465 - Intro to NLP - J. Eisner

where can we get sense labeled training data

have turned it into the hot dinner-party topic. The comedy is the

  • selection for the World Cup party, which will be announced on May 1
  • the by-pass there will be a street party. "Then," he says, "we are going
  • let you know that there’s a party at my house tonight. Directions: Drive
  • in the 1983 general election for a party which, when it could not bear to
  • to attack the Scottish National Party , who look set to seize Perth and
  • number-crunchers within the Labour party, there now seems little doubt

?

?

Where can we get sense-labeled training data?
  • To do supervised WSD, need many examples of each sense in context

600.465 - Intro to NLP - J. Eisner

where can we get sense labeled training data14

hardly any data per sense – but we’ll use it later to get unsupervised training started

Where can we get sense-labeled training data?
  • To do supervised WSD, need many examples of each sense in context
  • Sources of sense-labeled training text:
    • Human-annotated text - expensive
    • Bilingual text (Rosetta stone) – can figure out which sense of “plant” is meant by how it translates
    • Dictionary definition of a sense is one sample context
    • Roget’s thesaurus entry of a sense is one sample context

600.465 - Intro to NLP - J. Eisner

a problem with the vector model
A problem with the vector model
  • Bad idea to treat all context positions equally:
  • Possible solutions:
    • Faraway words don’t count as strongly?
    • Words in different positions relative to plant are different elements of the vector?
      • (i.e., (pesticide, -1) and (pesticide,+1) are different features)
    • Words in different syntactic relationships to plant are different elements of the vector?

600.465 - Intro to NLP - J. Eisner

an assortment of possible cues

slide courtesy of D. Yarowsky (modified)

generates a whole bunch of potential cues – use data to find out which ones work best

An assortment of possible cues ...

600.465 - Intro to NLP - J. Eisner

an assortment of possible cues18

slide courtesy of D. Yarowsky (modified)

only a weak cue ...

but we’ll trust it if there’s nothing better

An assortment of possible cues ...

merged rankingof all cues of all these types

600.465 - Intro to NLP - J. Eisner

final decision list for lead abbreviated

slide courtesy of D. Yarowsky (modified)

Final decision list for lead(abbreviated)

To disambiguate a token of lead :

  • Scan down the sorted list
  • The first cue that is found gets to make the decision all by itself
  • Not as subtle as combining cues, but works well for WSD

Cue’s score is its log-likelihood ratio:log [ p(cue | sense A) [smoothed] / p(cue | sense B) ]

600.465 - Intro to NLP - J. Eisner

slide20

unsupervised learning!

slide courtesy of D. Yarowsky (modified)

very readable paper at http://cs.jhu.edu/~yarowsky/acl95.ps

sketched on the following slides ...

first find a pair of seed words that correlate well with the 2 senses

unsupervised learning!

make that “minimally supervised”

First, find a pair of “seed words”that correlate well with the 2 senses
  • If “plant” really has 2 senses, it should appear in
    • 2 dictionary entries: Pick content words from those
    • 2 thesaurus entries: Pick synonyms from those
    • 2 different clusters of documents: Pick representative words from those
    • 2 translations in parallel text: Use the translations as seed words
  • Or just have a human name the seed words(maybe from a list of words that occur unusually often near “plant”)

600.465 - Intro to NLP - J. Eisner

yarowsky s bootstrapping algorithm

unsupervised learning!

table taken from Yarowsky (1995)

Yarowsky’s bootstrapping algorithm

life

(1%)

target word:

plant

98%

manufacturing(1%)

(life, manufacturing)

600.465 - Intro to NLP - J. Eisner

yarowsky s bootstrapping algorithm23

unsupervised learning!

figure taken from Yarowsky (1995)

Learn a classifier that distinguishes A from B.

It will notice features like “animal”  A, “automate”  B.

Yarowsky’s bootstrapping algorithm

(life, manufacturing)

600.465 - Intro to NLP - J. Eisner

yarowsky s bootstrapping algorithm24

unsupervised learning!

table taken from Yarowsky (1995)

no surprise what the top cues are

Learn a classifier that distinguishes A from B.

It will notice features like “animal”  A, “automate”  B.

Yarowsky’s bootstrapping algorithm

but other cues also good for discriminating these seed examples

(life, manufacturing)

600.465 - Intro to NLP - J. Eisner

slide25

slide courtesy of D. Yarowsky (modified)

unsupervised learning!

the strongest of the new cues help us classify more examples ...

from which we can extract and rank even more cues that discriminate them ...

(life, manufacturing)

600.465 - Intro to NLP - J. Eisner

slide26

unsupervised learning!

figure taken from Yarowsky (1995)

Should be a good classifier, unless we accidentally learned some bad cues along the way that polluted the original sense distinction.

(life, manufacturing)

600.465 - Intro to NLP - J. Eisner

slide27

unsupervised learning!

slide courtesy of D. Yarowsky (modified)

top ranked cue appearing in this test example

Now use the final decision list to classify test examples:

life and manufacturing are no longer even in the top cues!

many unexpected cues were extracted, without supervised training

one sense per discourse

unsupervised learning!

slide courtesy of D. Yarowsky (modified)

3 tokens in same document gang up on the 4th

“One sense per discourse”
  • A final trick:
    • All tokens of plant in the same document probably have the same sense.

600.465 - Intro to NLP - J. Eisner

one sense per discourse29

unsupervised learning!

slide courtesy of D. Yarowsky (modified)

“One sense per discourse”
  • A final trick:
    • All tokens of plant in the same document probably have the same sense.

600.465 - Intro to NLP - J. Eisner

a note on combining cues

slide courtesy of D. Yarowsky (modified)

A Note on Combining Cues

these stats come from term papers of known authorship

(i.e., supervised training)

600.465 - Intro to NLP - J. Eisner

a note on combining cues31

slide courtesy of D. Yarowsky (modified)

A Note on Combining Cues

1

2

1

2

Would this kind of sentence be more typical of a student A paper or a student B paper?

  • “Naive Bayes” model for classifying text
    • (Note the naive independence assumptions!)
  • We’ll look at it again in a later lecture

600.465 - Intro to NLP - J. Eisner

a note on combining cues32

slide courtesy of D. Yarowsky (modified)

plantA

plantA

plantA

plantA

plantA

plantA

plantB

plantB

plantB

plantB

plantB

plantB

A Note on Combining Cues

1

2

1

2

“Naive Bayes” model for classifying text

Used here for word sense disambiguation

Would this kind of sentence be more typical of a plant A context or a plant B context?

600.465 - Intro to NLP - J. Eisner