Minimally supervised morphological analysis by multimodal alignment
Download
1 / 29

Minimally Supervised Morphological Analysis by Multimodal Alignment - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Minimally Supervised Morphological Analysis by Multimodal Alignment. David Yarowsky and Richard Wicentowski. Introduction. The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Minimally Supervised Morphological Analysis by Multimodal Alignment' - dianne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Minimally supervised morphological analysis by multimodal alignment

Minimally Supervised Morphological Analysis by Multimodal Alignment

David Yarowsky

and

Richard Wicentowski


Introduction
Introduction Alignment

  • The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

  • The Algorithm combines four original alignment models based on:

    • Relative corpus frequency.

    • Contextual Similarity.

    • Weighted string similarity.

    • Incrementally retrained inflectional transduction probabilities.


Lecture s subjects
Lecture Alignment’s Subjects

  • Task definition.

  • Required and Optional resources.

  • The Algorithm.

  • Empirical Evaluation.


Task definition
Task Definition Alignment

Consider this task as three steps:

  • Estimate a probabilistic alignment between inflected forms and root forms.

  • Train a supervised morphological analysis learner on a weighted subset of these aligned pairs.

  • Use the result from step 2 to iteratively refine the alignment in step 1.


Example pos
Example (POS) Alignment

  • Definitions:


Task definition cont
Task Definition cont. Alignment

  • The target output of step 1:


Required and optional resources
Required and Optional resources Alignment

  • For the given language we need:

    • A table of the inflectional Part of Speech (POS).

    • A list of the canonical suffixes.

  • A large text corpus.


Required and optional resources cont
Required and Optional resources cont. Alignment

  • A list of the candidate noun, verb and adjective roots (from dictionary), and any rough mechanism for identifying the candidates POS of the remaining vocabulary. (not based on morphological analysis).

  • A list of the consonants and vowels.


Required and optional resources cont1
Required and Optional resources cont. Alignment

  • A list of common function words.

  • A distance/similarity tables generated on previously studied languages.

Not essential

If available


The algorithm
The Algorithm Alignment

  • Combines four original alignment models:

    • Alignment by Frequency Similarity.

    • Alignment by Context Similarity.

    • Alignment by Weighted Levenshtein Distance.

    • Alignment by Morphological Transformation Probabilities.


Lemma alignment by frequency similarity

? Alignment

?

sing

sing

take

singed

taked

VBD

VBD

?

sang

VBD

Lemma Alignment by Frequency Similarity

  • The motivating dilemma:


Lemma alignment by frequency similarity cont
Lemma Alignment by Frequency Similarity cont. Alignment

  • This Table is based on relative corpus frequency:



Lemma alignment by frequency similarity cont2
Lemma Alignment by Frequency Similarity cont. Alignment

  • A problem: the true alignments between inflections are unknown in advance.

  • A simplifying assumption: the frequency ratios between inflections and roots is not significantly different between regular and irregular morphological processes.


Lemma alignment by frequency similarity cont3
Lemma Alignment by Frequency Similarity cont. Alignment

  • Similarity between regular and irregular forms:


Lemma alignment by frequency similarity cont4
Lemma Alignment by Frequency Similarity cont. Alignment

  • The expected frequency should also be estimable from the frequency of any of the other inflectional variants.

  • VBD/VBG and VBD/VBZ could also be used as estimators.



Lemma alignment by context similarity
Lemma Alignment by Context Similarity Alignment

  • Based on contextual similarity of the candidate form.

  • Computing similarity between vectors of weighted and filtered context features.

    Clustering inflectional variants of verbs (e.g. sipped, sipping, and sip).


Lemma alignment by context similarity cont

CW Alignmentsubj(AUX|NEG)*VkeywordDET?CW*CWobj

eating

the

apple

Shlomo

is

Lemma Alignment by Context Similarity cont.

  • Example:


Lemma alignment by weighted levenshtein distance
Lemma Alignment by Weighted Levenshtein Distance Alignment

  • Consider overall stem edit distance.

  • A cost matrix with initial distance costs:

    initially set to (0.5,0.6,1.0,0.98)


Lemma alignment by morphological transformation probabilities
Lemma Alignment by Morphological Transformation Probabilities

The goal is to generalize a mapping function via a generative probabilistic model.



Lemma alignment by morphological transformation probabilities cont
Lemma Alignment by Morphological Transformation Probabilities cont.

<root>+<stem change>+<suffix><inflection>

P(inflection | root,suffix,POS)=P(stemchange | root,suffix,POS)

unique



Lemma alignment by morphological transformation probabilities cont2
Lemma Alignment by Morphological Transformation Probabilities cont.

Example:

P(solidified | solidify, +ed, VBD)

= P(yi | solidify, +ed, VBD)

≈ 1P(yi | ify, +ed)

+ (1-1)( 2P(yi | fy, +ed)

+ (1-2)( 3P(yi | y, +ed)

+ (1-3)( 4P(yi | +ed)

+ (1-4) P(yi)

POS can be deleted


Lemma alignment by model combination and the pigeonhole principle
Lemma Alignment by Model Combination and the Pigeonhole Principle

  • No single model is sufficiently effective on its own.

  • The Frequency, Levenshtein and Context Similarity models retain equal relative weight.

  • The Morphological Transformation Similarity model increases in relative weight.



Lemma alignment by model combination and the pigeonhole principle cont
Lemma Alignment by Model Combination and the Pigeonhole Principle cont.

  • The final alignment is based on the pigeonhole principle.

  • For a given POS a root shouldn't have more than one inflection norshould multiple inflections in the same POS share the same root.


Empirical evaluation
Empirical Evaluation Principle cont.

  • Performance:


ad