Minimally Supervised Morphological Analysis by Multimodal Alignment - PowerPoint PPT Presentation

Minimally supervised morphological analysis by multimodal alignment
Download
1 / 29

  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Minimally Supervised Morphological Analysis by Multimodal Alignment. David Yarowsky and Richard Wicentowski. Introduction. The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Minimally Supervised Morphological Analysis by Multimodal Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Minimally supervised morphological analysis by multimodal alignment

Minimally Supervised Morphological Analysis by Multimodal Alignment

David Yarowsky

and

Richard Wicentowski


Introduction

Introduction

  • The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

  • The Algorithm combines four original alignment models based on:

    • Relative corpus frequency.

    • Contextual Similarity.

    • Weighted string similarity.

    • Incrementally retrained inflectional transduction probabilities.


Lecture s subjects

Lecture’s Subjects

  • Task definition.

  • Required and Optional resources.

  • The Algorithm.

  • Empirical Evaluation.


Task definition

Task Definition

Consider this task as three steps:

  • Estimate a probabilistic alignment between inflected forms and root forms.

  • Train a supervised morphological analysis learner on a weighted subset of these aligned pairs.

  • Use the result from step 2 to iteratively refine the alignment in step 1.


Example pos

Example (POS)

  • Definitions:


Task definition cont

Task Definition cont.

  • The target output of step 1:


Required and optional resources

Required and Optional resources

  • For the given language we need:

    • A table of the inflectional Part of Speech (POS).

    • A list of the canonical suffixes.

  • A large text corpus.


Required and optional resources cont

Required and Optional resources cont.

  • A list of the candidate noun, verb and adjective roots (from dictionary), and any rough mechanism for identifying the candidates POS of the remaining vocabulary. (not based on morphological analysis).

  • A list of the consonants and vowels.


Required and optional resources cont1

Required and Optional resources cont.

  • A list of common function words.

  • A distance/similarity tables generated on previously studied languages.

Not essential

If available


The algorithm

The Algorithm

  • Combines four original alignment models:

    • Alignment by Frequency Similarity.

    • Alignment by Context Similarity.

    • Alignment by Weighted Levenshtein Distance.

    • Alignment by Morphological Transformation Probabilities.


Lemma alignment by frequency similarity

?

?

sing

sing

take

singed

taked

VBD

VBD

?

sang

VBD

Lemma Alignment by Frequency Similarity

  • The motivating dilemma:


Lemma alignment by frequency similarity cont

Lemma Alignment by Frequency Similarity cont.

  • This Table is based on relative corpus frequency:


Lemma alignment by frequency similarity cont1

Lemma Alignment by Frequency Similarity cont.


Lemma alignment by frequency similarity cont2

Lemma Alignment by Frequency Similarity cont.

  • A problem: the true alignments between inflections are unknown in advance.

  • A simplifying assumption: the frequency ratios between inflections and roots is not significantly different between regular and irregular morphological processes.


Lemma alignment by frequency similarity cont3

Lemma Alignment by Frequency Similarity cont.

  • Similarity between regular and irregular forms:


Lemma alignment by frequency similarity cont4

Lemma Alignment by Frequency Similarity cont.

  • The expected frequency should also be estimable from the frequency of any of the other inflectional variants.

  • VBD/VBG and VBD/VBZ could also be used as estimators.


Lemma alignment by frequency similarity cont5

Lemma Alignment by Frequency Similarity cont.


Lemma alignment by context similarity

Lemma Alignment by Context Similarity

  • Based on contextual similarity of the candidate form.

  • Computing similarity between vectors of weighted and filtered context features.

    Clustering inflectional variants of verbs (e.g. sipped, sipping, and sip).


Lemma alignment by context similarity cont

CWsubj(AUX|NEG)*VkeywordDET?CW*CWobj

eating

the

apple

Shlomo

is

Lemma Alignment by Context Similarity cont.

  • Example:


Lemma alignment by weighted levenshtein distance

Lemma Alignment by Weighted Levenshtein Distance

  • Consider overall stem edit distance.

  • A cost matrix with initial distance costs:

    initially set to (0.5,0.6,1.0,0.98)


Lemma alignment by morphological transformation probabilities

Lemma Alignment by Morphological Transformation Probabilities

The goal is to generalize a mapping function via a generative probabilistic model.


Lemma alignment by morphological transformation probabilities1

Lemma Alignment by Morphological Transformation Probabilities

  • Result table:


Lemma alignment by morphological transformation probabilities cont

Lemma Alignment by Morphological Transformation Probabilities cont.

<root>+<stem change>+<suffix><inflection>

P(inflection | root,suffix,POS)=P(stemchange | root,suffix,POS)

unique


Lemma alignment by morphological transformation probabilities cont1

Lemma Alignment by Morphological Transformation Probabilities cont.

Example:


Lemma alignment by morphological transformation probabilities cont2

Lemma Alignment by Morphological Transformation Probabilities cont.

Example:

P(solidified | solidify, +ed, VBD)

= P(yi | solidify, +ed, VBD)

≈ 1P(yi | ify, +ed)

+ (1-1)( 2P(yi | fy, +ed)

+ (1-2)( 3P(yi | y, +ed)

+ (1-3)( 4P(yi | +ed)

+ (1-4) P(yi)

POS can be deleted


Lemma alignment by model combination and the pigeonhole principle

Lemma Alignment by Model Combination and the Pigeonhole Principle

  • No single model is sufficiently effective on its own.

  • The Frequency, Levenshtein and Context Similarity models retain equal relative weight.

  • The Morphological Transformation Similarity model increases in relative weight.


Lemma alignment by model combination and the pigeonhole principle1

Lemma Alignment by Model Combination and the Pigeonhole Principle

  • Example:


Lemma alignment by model combination and the pigeonhole principle cont

Lemma Alignment by Model Combination and the Pigeonhole Principle cont.

  • The final alignment is based on the pigeonhole principle.

  • For a given POS a root shouldn't have more than one inflection norshould multiple inflections in the same POS share the same root.


Empirical evaluation

Empirical Evaluation

  • Performance:


  • Login