Unsupervised approaches to sequence tagging morphology induction and lexical resource acquisition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008 PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

Unsupervised Approaches to Sequence Tagging, Morphology Induction, and Lexical Resource Acquisition. Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008. Unsupervised Methods. Sequence Labeling (Part-of-Speech Tagging ) Morphology Induction Lexical Resource Acquisition. pronoun.

Download Presentation

Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Unsupervised approaches to sequence tagging morphology induction and lexical resource acquisition

Unsupervised Approaches to Sequence Tagging, Morphology Induction, and Lexical Resource Acquisition

Reza Bosaghzadeh & Nathan Schneider

LS2 ~ 1 December 2008


Unsupervised methods

Unsupervised Methods

  • Sequence Labeling (Part-of-Speech Tagging)

  • Morphology Induction

  • Lexical Resource Acquisition

    .

pronoun

verb

preposition

det

noun

adverb

She

ran

to

the

station

quickly

un-supervise-dlearn-ing


Part of speech pos tagging

Part-of-speech (POS) tagging

  • Prototype-driven model

    Cluster based on a few examples

    (Haghighi and Klein 2006)

  • Contrastive Estimation

    Discount positive examples at cost of implicit negative example

    (Smith and Eisner 2005)


Prototype driven tagging haghighi klein 2006

Target Label

Prototypes

Prototype-driven taggingHaghighi & Klein (2006)

Annotated

Data

Unlabeled

Data

Prototype

List

+

slide courtesy Haghighi & Klein


Prototype driven tagging haghighi klein 20061

Size

Restrict

Terms

Location

Features

Prototype-driven taggingHaghighi& Klein (2006)

Information Extraction: Classified Ads

Newly remodeled2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park.Paid water and garbage.No dogs allowed.

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed.

Prototype List

slide courtesy Haghighi & Klein


Prototype driven tagging haghighi klein 20062

PUNC

NN

VBN

CC

JJ

CD

IN

DET

NNS

IN

NNP

RB

Prototype-driven taggingHaghighi & Klein (2006)

English POS

Newly remodeled 2Bdrms/1Bath,spacious upper unit,locatedin Hilltop Mallarea.Walkingdistance toshopping, public transportation,schoolsandpark.Paid water andgarbage. Nodogs allowed.

Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed.

Prototype List

slide courtesy Haghighi & Klein


Prototype driven tagging haghighi klein 20063

Prototype-driven taggingHaghighi & Klein (2006)

  • 3 prototypes

    per tag

  • Automatically

    extracted by

    frequency

(Haghighi and Klein 2006)


Prototype driven tagging haghighi klein 20064

Prototype-driven taggingHaghighi & Klein (2006)

  • Trigram tagger, same features as (Smith & Eisner 2005)

    • Word type, suffixes up to length 3, contains-hyphen, contains-digit, initial capitalization

  • Tie each word to its most similar prototype, using context-based similarity technique (Schütze 1993)

    • SVD dimensionality reduction

    • Cosine similarity between context vectors

(Haghighi and Klein 2006)


Prototype driven tagging haghighi klein 20065

Prototype-driven taggingHaghighi & Klein (2006)

Pros

  • Fairly easy to choose a few tag prototypes

  • Number of prototypes is flexible

  • Doesn’t require tagging dictionary

    Cons

  • Still need a tag set

  • Chosen prototypes may not work so well if infrequent or highly ambiguous


Contrastive e stimation smith eisner 2005

Contrastive EstimationSmith & Eisner (2005)

  • Already discussed in class

  • Key idea: exploits implicit negative evidence

    • Mutating training examples often gives ungrammatical (negative) sentences

    • During training, shift probability mass from generated negative examples to given positive examples


Unsupervised pos tagging the state of the art

Unsupervised POS taggingThe State of the Art

Best supervised result (CRF): 99.5%!


Unsupervised methods1

Unsupervised Methods

  • Sequence Labeling (Part-of-Speech Tagging)

  • Morphology Induction

  • Lexical Resource Acquisition

    .

pronoun

verb

preposition

det

noun

adverb

She

ran

to

the

station

quickly

un-supervise-dlearn-ing


Unsupervised approaches to morphology

Unsupervised Approaches to Morphology

  • Morphology refers to the internal structure of words

    • A morpheme is a minimal meaningful linguistic unit

    • Morpheme segmentation is the process of dividing words into their component morphemes

      unsupervised learning

      .


Unsupervised approaches to morphology1

Unsupervised Approaches to Morphology

  • Morphology refers to the internal structure of words

    • A morpheme is a minimal meaningful linguistic unit

    • Morpheme segmentation is the process of dividing words into their component morphemes

      un-supervise-dlearn-ing

    • Word segmentation is the process of finding word boundaries in a stream of speech or text

      unsupervisedlearningofnaturallanguage


Unsupervised approaches to morphology2

Unsupervised Approaches to Morphology

  • Morphology refers to the internal structure of words

    • A morpheme is a minimal meaningful linguistic unit

    • Morpheme segmentation is the process of dividing words into their component morphemes

      un-supervise-dlearn-ing

    • Word segmentation is the process of finding word boundaries in a stream of speech or text

      unsupervised learning of natural language


Paramor morphological paradigms monson et al 2007 2008

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • Learns inflectional paradigms from raw text

    • Requires only the vocabulary of a corpus

    • Looks at word counts of substrings, and proposes (stem, suffix) pairings based on type frequency

  • 3-stage algorithm

    • Stage 1: Candidate paradigms based on frequencies

    • Stages 2-3: Refinement of paradigm set via merging and filtering

  • Paradigms can be used for morpheme segmentation or stemming


Paramor monson et al 2007 2008

ParaMor: Monson et al. (2007, 2008)

  • A sampling of Spanish verb conjugations


Paramor morphological paradigms monson et al 2007 20081

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • A proposed paradigm with stems {habl, bail} and suffixes {-ar, -o, -amos, -an}


Paramor morphological paradigms monson et al 2007 20082

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • Same paradigm from previous slide, but with stems {habl, bail, compr}


Paramor morphological paradigms monson et al 2007 20083

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • From just this list, other paradigm analyses (which happen to be incorrect) are possible


Paramor morphological paradigms monson et al 2007 20084

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • Another possibility: stems {hab, bai}, suffixes {-lar, -lo, -lamos, -lan}


Paramor morphological paradigms monson et al 2007 20085

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • Spurious segmentations—this paradigm doesn’t generalize to comprar(or most verbs)


Paramor morphological paradigms monson et al 2007 20086

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • What if not all conjugations were in the corpus?


Paramor morphological paradigms monson et al 2007 20087

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • We have two similar paradigms that we want to merge


Paramor morphological paradigms monson et al 2007 20088

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • This amounts to smoothing, or “hallucinating” out-of-vocabulary items


Paramor morphological paradigms monson et al 2007 20089

ParaMor: Morphological paradigmsMonson et al. (2007, 2008)

  • Heuristic-based, deterministic algorithm can learn inflectional paradigms from raw text

  • Paradigms can be used straightforwardly to predict segmentations

    • Combining the outputs of ParaMor and Morfessor (another system) won the segmentation task at MorphoChallenge 2008 for every language: English, Arabic, Turkish, German, and Finnish

  • Currently, ParaMor assumes suffix-based morphology


Bayesian word segmentation goldwater et al 2006 in submission

Bayesian word segmentation Goldwater et al. (2006; in submission)

  • Word segmentation results – comparison

  • See Narges & Andreas’s presentation for more on this model

Goldwater Unigram DP

Goldwater Bigram HDP


Multilingual morpheme segmentation snyder barzilay 2008

Multilingual morpheme segmentation Snyder & Barzilay (2008)

  • Considers parallel phrases and tries to find morpheme correspondences

  • Stray morphemes don’t correspond across languages

  • Abstract morphemes cross languages: (ar,er), (o, e), (amos,ons), (an,ent), (habl, parl)


Morphology papers inputs outputs

Morphology Papers: Inputs & Outputs


Unsupervised methods2

Unsupervised Methods

  • Sequence Labeling (Part-of-Speech Tagging)

  • Morphology Induction

  • Lexical Resource Acquisition

    .

pronoun

verb

preposition

det

noun

adverb

She

ran

to

the

station

quickly

un-supervise-dlearn-ing


Lexical resource acquisition

Lexical Resource Acquisition

  • Bilingual Lexicons from Monolingual Corpora (Haghighi et al. 2008)

  • Narrative Event Chains (Chambers and Jurafsky 2008)

  • A Statistical Verb Lexicon for Semantic Role Labeling (Grenager and Manning 2006)


Bilingual lexicons from monolingual corpora haghighi et al 2008

Bilingual lexicons from monolingual corporaHaghighi et al. (2008)

Source Words

s

Target Words

t

Source

Text

Target

Text

Matching

m

nombre

estado

state

world

slide courtesy Haghighi et al.

política

name

mundo

nation


Bilingual lexicons from monolingual corpora haghighi et al 20081

Bilingual Lexicons from Monolingual CorporaHaghighi et al. (2008)

Data Representation

Orthographic Features

Orthographic Features

estado

state

#st

#es

1.0

1.0

sta

tat

1.0

1.0

te#

do#

1.0

1.0

Source

Text

Target

Text

Context Features

Context Features

mundo

world

20.0

17.0

politica

politics

5.0

10.0

Used a variant of CCA (Canonical Correlation Analysis)

Trained with ViterbiEM to find best matching

sociedad

society

6.0

10.0

slide courtesy Haghighi et al.


Bilingual lexicons from monolingual corpora haghighi et al 20082

Bilingual Lexicons from Monolingual CorporaHaghighi et al. (2008)

Feature Experiments

  • MCCA: Orthographic and context features

Precision

4k EN-ES Wikipedia Articles

(Haghighi et al. 2008)


Narrative events chambers jurafsky 2008

Narrative eventsChambers & Jurafsky (2008)

  • Given a corpus, identifies related events that constitute a “narrative” and (when possible) predict their typical temporal ordering

    • E.g.: criminal prosecution narrative, with verbs: arrest, accuse, plead, testify, acquit/convict

  • Key insight: related events tend to share a participant in a document

    • The common participant may fill different syntactic/semantic roles with respect to verbs: arrest.object, accuse.object, plead.subject


Narrative events chambers jurafsky 20081

Narrative eventsChambers & Jurafsky (2008)

  • A temporal classifier can reconstruct pairwise canonical event orderings, producing a directed graph for each narrative


Statistical verb lexicon grenager manning 2006

Statistical verb lexiconGrenager& Manning (2006)

  • From dependency parses, a generative model predicts semantic roles corresponding to each verb’s arguments, as well as their syntactic realizations

    • PropBank-style: arg0, arg1, etc. per verb (do not necessarily correspond across verbs)

    • Learned syntactic patterns of the form: (subj=give.arg0, verb=give, np#1=give.arg2, np#2=give.arg1) or (subj=give.arg0, verb=give, np#2=give.arg1, pp_to=give.arg2)

  • Used for semantic role labeling


Semanticity our proposed scale of semantic richness

“Semanticity”: Our proposed scale of semantic richness

  • text < POS < syntax/morphology/alignments < coreference/semantic roles/temporal ordering < translations/narrative event sequences

  • We score each model’s inputs and outputs on this scale, and call the input-to-output increase “semantic gain”

    • Haghighi et al.’s bilingual lexicon induction wins in this respect, going from raw text to lexical translations


Robustness to language variation

Robustness to language variation

  • About half of the papers we examined had English-only evaluations

  • We considered which techniques were most adaptable to other (esp. resource-poor) languages. Two main factors:

    • Reliance on existing tools/resourcesfor preprocessing (parsers, coreference resolvers, …)

    • Any linguistic specificityin the model (e.g. suffix-based morphology)


Summary

Summary

We examined three areas of unsupervised NLP:

  • Sequence tagging: How can we predict POS (or topic) tags for words in sequence?

  • Morphology: How are words put together from morphemes (and how can we break them apart)?

  • Lexical resources: How can we identify lexical translations, semantic roles and argument frames, or narrative event sequences from text?

    In eight recent papers we found a variety of approaches, including heuristic algorithms, Bayesian methods, and EM-style techniques.


Questions

Target Label

Prototypes

Questions?

Thanks to Noah and Kevin for their feedback on the paper; Andreas and Narges for their collaboration on the presentations; and all of you for giving us your attention!

subj=give.arg0verb=givenp#1=give.arg2np#2=give.arg1

  • un-supervise-dlearn-ing


  • Login