1 / 29

# Minimally Supervised Morphological Analysis by Multimodal Alignment - PowerPoint PPT Presentation

Minimally Supervised Morphological Analysis by Multimodal Alignment. David Yarowsky and Richard Wicentowski. Introduction. The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Minimally Supervised Morphological Analysis by Multimodal Alignment' - dianne

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Minimally Supervised Morphological Analysis by Multimodal Alignment

David Yarowsky

and

Richard Wicentowski

Introduction Alignment

• The Algorithm capable of inducing inflectional morphological analyses of regular and highly irregular forms.

• The Algorithm combines four original alignment models based on:

• Relative corpus frequency.

• Contextual Similarity.

• Weighted string similarity.

• Incrementally retrained inflectional transduction probabilities.

Lecture Alignment’s Subjects

• Required and Optional resources.

• The Algorithm.

• Empirical Evaluation.

Consider this task as three steps:

• Estimate a probabilistic alignment between inflected forms and root forms.

• Train a supervised morphological analysis learner on a weighted subset of these aligned pairs.

• Use the result from step 2 to iteratively refine the alignment in step 1.

Example (POS) Alignment

• Definitions:

• The target output of step 1:

Required and Optional resources Alignment

• For the given language we need:

• A table of the inflectional Part of Speech (POS).

• A list of the canonical suffixes.

• A large text corpus.

• A list of the candidate noun, verb and adjective roots (from dictionary), and any rough mechanism for identifying the candidates POS of the remaining vocabulary. (not based on morphological analysis).

• A list of the consonants and vowels.

• A list of common function words.

• A distance/similarity tables generated on previously studied languages.

Not essential

If available

The Algorithm Alignment

• Combines four original alignment models:

• Alignment by Frequency Similarity.

• Alignment by Context Similarity.

• Alignment by Weighted Levenshtein Distance.

• Alignment by Morphological Transformation Probabilities.

? Alignment

?

sing

sing

take

singed

taked

VBD

VBD

?

sang

VBD

Lemma Alignment by Frequency Similarity

• The motivating dilemma:

• This Table is based on relative corpus frequency:

• A problem: the true alignments between inflections are unknown in advance.

• A simplifying assumption: the frequency ratios between inflections and roots is not significantly different between regular and irregular morphological processes.

• Similarity between regular and irregular forms:

• The expected frequency should also be estimable from the frequency of any of the other inflectional variants.

• VBD/VBG and VBD/VBZ could also be used as estimators.

• Based on contextual similarity of the candidate form.

• Computing similarity between vectors of weighted and filtered context features.

Clustering inflectional variants of verbs (e.g. sipped, sipping, and sip).

CW Alignmentsubj(AUX|NEG)*VkeywordDET?CW*CWobj

eating

the

apple

Shlomo

is

Lemma Alignment by Context Similarity cont.

• Example:

• Consider overall stem edit distance.

• A cost matrix with initial distance costs:

initially set to (0.5,0.6,1.0,0.98)

Lemma Alignment by Morphological Transformation Probabilities

The goal is to generalize a mapping function via a generative probabilistic model.

Lemma Alignment by Morphological Transformation Probabilities cont.

<root>+<stem change>+<suffix><inflection>

P(inflection | root,suffix,POS)=P(stemchange | root,suffix,POS)

unique

Lemma Alignment by Morphological Transformation Probabilities cont.

Example:

P(solidified | solidify, +ed, VBD)

= P(yi | solidify, +ed, VBD)

≈ 1P(yi | ify, +ed)

+ (1-1)( 2P(yi | fy, +ed)

+ (1-2)( 3P(yi | y, +ed)

+ (1-3)( 4P(yi | +ed)

+ (1-4) P(yi)

POS can be deleted

• No single model is sufficiently effective on its own.

• The Frequency, Levenshtein and Context Similarity models retain equal relative weight.

• The Morphological Transformation Similarity model increases in relative weight.

• Example:

Lemma Alignment by Model Combination and the Pigeonhole Principle cont.

• The final alignment is based on the pigeonhole principle.

• For a given POS a root shouldn't have more than one inflection norshould multiple inflections in the same POS share the same root.

Empirical Evaluation Principle cont.

• Performance: