1 / 22

CSA3050: Natural Language Algorithms

CSA3050: Natural Language Algorithms. Morphological Parsing. Morphology. Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s , are called morphemes . Combination of morphemes to form words that are legal in some language. Two kinds of morphology

terra
Download Presentation

CSA3050: Natural Language Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3050: Natural Language Algorithms Morphological Parsing CSA3050 NLP Algorithms

  2. Morphology • Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s, are called morphemes. • Combination of morphemes to form words that are legal in some language. • Two kinds of morphology • Inflectional • Derivational CSA3050 NLP Algorithms

  3. Inflectional+s plural+ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational+ment category changingescape+ment not completely productive: detractment* not completely systematic: apartment Inflectional/DerivationalMorphology CSA3050 NLP Algorithms

  4. Noun Inflections CSA3050 NLP Algorithms

  5. Morphological Parsing Output Analysis cat N PL Input Word cats Morphological Parser • Output is a string of morphemes • Reversibility? CSA3050 NLP Algorithms

  6. Morphological Parsing • The goal of morphological parsing is to find out what morphemes a given word is built from. mouse mouse N SG mice mouse N PL foxes fox N PL CSA3050 NLP Algorithms

  7. 2 Steps • Split word up into its possible components, using + to indicate possible morpheme boundaries. cats cat + s foxes fox + s foxes foxe + s • Look up the categories of the stems and the meaning of the affixes, using a lexicon of stems and affixescat + s cat + NP + PL fox + s fox + N + PL. CSA3050 NLP Algorithms

  8. Step 1: Surface IntermediateFST CSA3050 NLP Algorithms

  9. Step 1: Surface IntermediateOperation CSA3050 NLP Algorithms

  10. 2. Intermediate Morphemes Possible inputs to the transducer are: • Regular noun stem: cat • Regular noun stem + s: cat+s • Singular irregular noun stem: mouse • Plural irregular noun stem: mice CSA3050 NLP Algorithms

  11. 2. Intermediate MorphemesTransducer CSA3050 NLP Algorithms

  12. Handling Stems cat /cat mice/mouse CSA3050 NLP Algorithms

  13. Completed Stage 2 CSA3050 NLP Algorithms

  14. Joining Stages 1 and 2 • If the two transducers run in a cascade (i.e. we let the second transducer run on the output of the first one), we can do a morphological parse of (some) English noun phrases. • We can change also the direction of translation (in translation mode). • This transducer can also be used for generating a surface form from an underlying form. CSA3050 NLP Algorithms

  15. Prolog • The transducer specifications we have seen translate easily into Prolog format except for the other transition. • arc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(1,3,<other>). CSA3050 NLP Algorithms

  16. Handling other arcs arc(1,3,z:z) :- !.arc(1,3,s:s) :- !.arc(1,3,x:x) :- !.arc(1,2,#:+) :- !.arc(1,3,X:X) :- !. CSA3050 NLP Algorithms

  17. Combining Rules • Consider the word “berries”. • Two rules are involved • berry + s • y → ie under certain circumstances. • Combinations of such rules can be handled in two ways • Cascade, i.e. sequentially • Parallel • Algorithms exist for combining transducers together in series or in parallel. • Such algorithms involve computations over regular relations. CSA3050 NLP Algorithms

  18. FSA 3 Related Frameworks REGULAR LANGUAGES REGULAR EXPRESSIONS CSA3050 NLP Algorithms

  19. FINITE STATE TRANSDUCERS REGULAR RELATIONS REGULAR RELATIONS AUGMENTED REGULAR EXPRESSIONS CSA3050 NLP Algorithms

  20. Putting it all together execution of FSTi takes place in parallel CSA3050 NLP Algorithms

  21. Kaplan and KayThe Xerox View FSTi are aligned but separate FSTi intersected together CSA3050 NLP Algorithms

  22. Summary • Morphological processing can be handled by finite state machinery • Finite State Transducers are formally very similar to Finite State Automata. • They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages. CSA3050 NLP Algorithms

More Related