Minimally Supervised Morphological Analysis by Multimodal Alignment

1 / 29

# Minimally Supervised Morphological Analysis by Multimodal Alignment - PowerPoint PPT Presentation

Minimally Supervised Morphological Analysis by Multimodal Alignment. David Yarowsky and Richard Wicentowski. Introduction. The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Minimally Supervised Morphological Analysis by Multimodal Alignment' - dianne

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Minimally Supervised Morphological Analysis by Multimodal Alignment

David Yarowsky

and

Richard Wicentowski

Introduction
• The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.
• The Algorithm combines four original alignment models based on:
• Relative corpus frequency.
• Contextual Similarity.
• Weighted string similarity.
• Incrementally retrained inflectional transduction probabilities.
Lecture’s Subjects
• Required and Optional resources.
• The Algorithm.
• Empirical Evaluation.

Consider this task as three steps:

• Estimate a probabilistic alignment between inflected forms and root forms.
• Train a supervised morphological analysis learner on a weighted subset of these aligned pairs.
• Use the result from step 2 to iteratively refine the alignment in step 1.
Example (POS)
• Definitions:
• The target output of step 1:
Required and Optional resources
• For the given language we need:
• A table of the inflectional Part of Speech (POS).
• A list of the canonical suffixes.
• A large text corpus.
Required and Optional resources cont.
• A list of the candidate noun, verb and adjective roots (from dictionary), and any rough mechanism for identifying the candidates POS of the remaining vocabulary. (not based on morphological analysis).
• A list of the consonants and vowels.
Required and Optional resources cont.
• A list of common function words.
• A distance/similarity tables generated on previously studied languages.

Not essential

If available

The Algorithm
• Combines four original alignment models:
• Alignment by Frequency Similarity.
• Alignment by Context Similarity.
• Alignment by Weighted Levenshtein Distance.
• Alignment by Morphological Transformation Probabilities.
?

?

sing

sing

take

singed

taked

VBD

VBD

?

sang

VBD

Lemma Alignment by Frequency Similarity
• The motivating dilemma:
Lemma Alignment by Frequency Similarity cont.
• This Table is based on relative corpus frequency:
Lemma Alignment by Frequency Similarity cont.
• A problem: the true alignments between inflections are unknown in advance.
• A simplifying assumption: the frequency ratios between inflections and roots is not significantly different between regular and irregular morphological processes.
Lemma Alignment by Frequency Similarity cont.
• Similarity between regular and irregular forms:
Lemma Alignment by Frequency Similarity cont.
• The expected frequency should also be estimable from the frequency of any of the other inflectional variants.
• VBD/VBG and VBD/VBZ could also be used as estimators.
Lemma Alignment by Context Similarity
• Based on contextual similarity of the candidate form.
• Computing similarity between vectors of weighted and filtered context features.

Clustering inflectional variants of verbs (e.g. sipped, sipping, and sip).

Lemma Alignment by Weighted Levenshtein Distance
• Consider overall stem edit distance.
• A cost matrix with initial distance costs:

initially set to (0.5,0.6,1.0,0.98)

Lemma Alignment by Morphological Transformation Probabilities

The goal is to generalize a mapping function via a generative probabilistic model.

Lemma Alignment by Morphological Transformation Probabilities cont.

++

P(inflection | root,suffix,POS)=P(stemchange | root,suffix,POS)

unique

Lemma Alignment by Morphological Transformation Probabilities cont.

Example:

P(solidified | solidify, +ed, VBD)

= P(yi | solidify, +ed, VBD)

≈ 1P(yi | ify, +ed)

+ (1-1)( 2P(yi | fy, +ed)

+ (1-2)( 3P(yi | y, +ed)

+ (1-3)( 4P(yi | +ed)

+ (1-4) P(yi)

POS can be deleted

• No single model is sufficiently effective on its own.
• The Frequency, Levenshtein and Context Similarity models retain equal relative weight.
• The Morphological Transformation Similarity model increases in relative weight.
• Example:
Lemma Alignment by Model Combination and the Pigeonhole Principle cont.
• The final alignment is based on the pigeonhole principle.
• For a given POS a root shouldn't have more than one inflection norshould multiple inflections in the same POS share the same root.