Minimally supervised morphological analysis by multimodal alignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Minimally Supervised Morphological Analysis by Multimodal Alignment PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

Minimally Supervised Morphological Analysis by Multimodal Alignment. David Yarowsky and Richard Wicentowski. Introduction. The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

Download Presentation

Minimally Supervised Morphological Analysis by Multimodal Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Minimally supervised morphological analysis by multimodal alignment

Minimally Supervised Morphological Analysis by Multimodal Alignment

David Yarowsky

and

Richard Wicentowski


Introduction

Introduction

  • The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

  • The Algorithm combines four original alignment models based on:

    • Relative corpus frequency.

    • Contextual Similarity.

    • Weighted string similarity.

    • Incrementally retrained inflectional transduction probabilities.


Lecture s subjects

Lecture’s Subjects

  • Task definition.

  • Required and Optional resources.

  • The Algorithm.

  • Empirical Evaluation.


Task definition

Task Definition

Consider this task as three steps:

  • Estimate a probabilistic alignment between inflected forms and root forms.

  • Train a supervised morphological analysis learner on a weighted subset of these aligned pairs.

  • Use the result from step 2 to iteratively refine the alignment in step 1.


Example pos

Example (POS)

  • Definitions:


Task definition cont

Task Definition cont.

  • The target output of step 1:


Required and optional resources

Required and Optional resources

  • For the given language we need:

    • A table of the inflectional Part of Speech (POS).

    • A list of the canonical suffixes.

  • A large text corpus.


Required and optional resources cont

Required and Optional resources cont.

  • A list of the candidate noun, verb and adjective roots (from dictionary), and any rough mechanism for identifying the candidates POS of the remaining vocabulary. (not based on morphological analysis).

  • A list of the consonants and vowels.


Required and optional resources cont1

Required and Optional resources cont.

  • A list of common function words.

  • A distance/similarity tables generated on previously studied languages.

Not essential

If available


The algorithm

The Algorithm

  • Combines four original alignment models:

    • Alignment by Frequency Similarity.

    • Alignment by Context Similarity.

    • Alignment by Weighted Levenshtein Distance.

    • Alignment by Morphological Transformation Probabilities.


Lemma alignment by frequency similarity

?

?

sing

sing

take

singed

taked

VBD

VBD

?

sang

VBD

Lemma Alignment by Frequency Similarity

  • The motivating dilemma:


Lemma alignment by frequency similarity cont

Lemma Alignment by Frequency Similarity cont.

  • This Table is based on relative corpus frequency:


Lemma alignment by frequency similarity cont1

Lemma Alignment by Frequency Similarity cont.


Lemma alignment by frequency similarity cont2

Lemma Alignment by Frequency Similarity cont.

  • A problem: the true alignments between inflections are unknown in advance.

  • A simplifying assumption: the frequency ratios between inflections and roots is not significantly different between regular and irregular morphological processes.


Lemma alignment by frequency similarity cont3

Lemma Alignment by Frequency Similarity cont.

  • Similarity between regular and irregular forms:


Lemma alignment by frequency similarity cont4

Lemma Alignment by Frequency Similarity cont.

  • The expected frequency should also be estimable from the frequency of any of the other inflectional variants.

  • VBD/VBG and VBD/VBZ could also be used as estimators.


Lemma alignment by frequency similarity cont5

Lemma Alignment by Frequency Similarity cont.


Lemma alignment by context similarity

Lemma Alignment by Context Similarity

  • Based on contextual similarity of the candidate form.

  • Computing similarity between vectors of weighted and filtered context features.

    Clustering inflectional variants of verbs (e.g. sipped, sipping, and sip).


Lemma alignment by context similarity cont

CWsubj(AUX|NEG)*VkeywordDET?CW*CWobj

eating

the

apple

Shlomo

is

Lemma Alignment by Context Similarity cont.

  • Example:


Lemma alignment by weighted levenshtein distance

Lemma Alignment by Weighted Levenshtein Distance

  • Consider overall stem edit distance.

  • A cost matrix with initial distance costs:

    initially set to (0.5,0.6,1.0,0.98)


Lemma alignment by morphological transformation probabilities

Lemma Alignment by Morphological Transformation Probabilities

The goal is to generalize a mapping function via a generative probabilistic model.


Lemma alignment by morphological transformation probabilities1

Lemma Alignment by Morphological Transformation Probabilities

  • Result table:


Lemma alignment by morphological transformation probabilities cont

Lemma Alignment by Morphological Transformation Probabilities cont.

<root>+<stem change>+<suffix><inflection>

P(inflection | root,suffix,POS)=P(stemchange | root,suffix,POS)

unique


Lemma alignment by morphological transformation probabilities cont1

Lemma Alignment by Morphological Transformation Probabilities cont.

Example:


Lemma alignment by morphological transformation probabilities cont2

Lemma Alignment by Morphological Transformation Probabilities cont.

Example:

P(solidified | solidify, +ed, VBD)

= P(yi | solidify, +ed, VBD)

≈ 1P(yi | ify, +ed)

+ (1-1)( 2P(yi | fy, +ed)

+ (1-2)( 3P(yi | y, +ed)

+ (1-3)( 4P(yi | +ed)

+ (1-4) P(yi)

POS can be deleted


Lemma alignment by model combination and the pigeonhole principle

Lemma Alignment by Model Combination and the Pigeonhole Principle

  • No single model is sufficiently effective on its own.

  • The Frequency, Levenshtein and Context Similarity models retain equal relative weight.

  • The Morphological Transformation Similarity model increases in relative weight.


Lemma alignment by model combination and the pigeonhole principle1

Lemma Alignment by Model Combination and the Pigeonhole Principle

  • Example:


Lemma alignment by model combination and the pigeonhole principle cont

Lemma Alignment by Model Combination and the Pigeonhole Principle cont.

  • The final alignment is based on the pigeonhole principle.

  • For a given POS a root shouldn't have more than one inflection norshould multiple inflections in the same POS share the same root.


Empirical evaluation

Empirical Evaluation

  • Performance:


  • Login