1 / 1

ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis

ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis. Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin. Monolingual Text. Unsupervised Morphology Induction. Morphologically Analyzed Text. Paradigms Organize Inflectional Morphology.

Download Presentation

ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin Monolingual Text Unsupervised Morphology Induction Morphologically Analyzed Text Paradigms Organize Inflectional Morphology Cross-linguistically, languages inflect using paradigms—sets of mutually exclusive cells. Exactly one cell from each paradigm can be filled (by an affix) in a surface word form. Paradigm Discovery in 3 Steps • Search – Greedy bottom-up search through an empirical network of candidate partial paradigms. Here, red candidate paradigms are active in search • Cluster – Hierarchical agglomerative clustering adapted to the peculiarities of partial paradigms • Filter – Improve precision by removing unclustered and unlikely candidates • Spanish data guided algorithm development and parameter adjustment 1. Recall Centric Search e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend, ... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr, ... azar.e.ido.ieron.ir.ió 1: sal e.er.erá.ieron.ió 32: deb, padec, romp, ... e.erá.ido.ieron.ió 28: deb, escog, ... e.er.ido.ieron.ió 46: deb, parec, recog... e.ido.ieron.irá.ió 28: asist, dirig, ... e.ido.ieron.ir.ió 39: asist, bat, sal, ... e.ido.ieron.ió 86: asist, deb, hund,... e.erá.ieron.ió 32: deb, padec, ... er.ido.ieron.ió 58: ascend, ejerc, recog, ... ido.ieron.ir.ió 44: interrump, sal, ... 3. Filter Unlikely Candidates 2. Cluster Candidate Paradigms Error analysis identified 2 major categories of incorrect candidates 17: a.aba.aban.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.aría.ó 15: a.aba.aban.ada.adas.ado.ados.an.ando.ar.aron.arse.ará.arán.ó Small Candidates contain few affixes and cover few types Incorrect Morpheme Boundary Candi- dates segment too far to the left. 16: a.aba.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.aría.ó 15: a.aba.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.ó Ø.ipo covers 8 words Ø.e.iu covers 12 words iza.izado.izan.izar.izaron.izarán.izó der.derá.dido.diendo.dieron.dió.día 15: a.aba.ada.adas.ado.ados.an.ando.ar.aron.arse.ará.arán.aría.ó Segmentation Evaluation Methodology llega • Match word to segment against clustered affixes • Replace any matched affix with new affix from cluster • Segment the original word, if the corpus contains the hypothesized word form • Sample pairs of words that share morphemes. • Precision: Sample pairs sharing a morpheme in the automatic analyses • Recall: Sample pairs from an answer key of morphologically analyzed words • Examine corresponding analyses • Precsion: Count sampled pairs that share a morpheme in the answer key • Recall: Count sampled pairs that share a morpheme in the automatic analyses lleg aba lleg aban lleg ada … lleg +a Results A Closer Look at ParaMor vs. Morfessor • Morpho Challenge 2007 • Competition for unsupervised morphology • induction algorithms • English • 3rd Place Overall • Bested Morfessor (Creutz, 2006) a • state-of-the-art unsupervised • morphology induction algorithm • German • 1st Place with Combined ParaMor- • Morfessor System The Next Steps Extend ParaMor to hypothesize more than one morpheme boundary per analysis Expand beyond suffixation to other morphological phenomena, prefixes, etc. Merge inflection classes of the same paradigm Identify morphophonemic changes

More Related