1 / 51

CMSC 723 / LING 645: Intro to Computational Linguistics

CMSC 723 / LING 645: Intro to Computational Linguistics. September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee. More about FSAs. Transducers Equivalence of DFSAs and NFSAs

yoshi
Download Presentation

CMSC 723 / LING 645: Intro to Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. DorrDr. Christof MonzTA: Adam Lee

  2. More about FSAs • Transducers • Equivalence of DFSAs and NFSAs • Recognition as search: depth-first, breadth-search

  3. Recognition using NFSAs

  4. NFSA Recognition of “baaa!”

  5. should be q2 Breadth-first Recognition of “baaa!”

  6. Regular languages • Regular languages are characterized by FSAs • For every NFSA, there is an equivalent DFSA. • Regular languages are closed under concatenation, Kleene closure, union.

  7. Concatenation

  8. Kleene Closure

  9. Union

  10. Morphology • Definitions and Problems • What is Morphology? • Topology of Morphologies • Approaches to Computational Morphology • Lexicons and Rules • Computational Morphology Approaches

  11. Morphology • The study of the way words are built up from smaller meaning units called Morphemes • Abstract versus RealizedHOP +PAST hop +ed  hopped  /hapt/

  12. Phonology and Morphology • Phonology vs. Orthography • Historical spelling • night, nite • attention, mission, fish • Script Limitations • Spoken English has 14 vowels • heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough • English Alphabet has 5 • Use vowel combinatios: far fair fare • Consonantal doubling (hopping vs. hoping)

  13. conj prep noun article plural poss Syntax and Morphology • Phrase-level agreement • Subject-Verb • John studies hard (STUDY+3SG) • Noun-Adjective • Las vacas hermosas • Sub-word phrasal structures • שבספרינו • ש+ב+ספר+ים+נו • That+in+book+PL+Poss:1PL • Which are in our books

  14. Topology of Morphologies • Concatenative vs. Templatic • Derivational vs. Inflectional • Regular vs. Irregular

  15. Concatenative Morphology • Morpheme+Morpheme+Morpheme+… • Stems: also called lemma, base form, root, lexeme • hope+ing  hoping hop  hopping • Affixes • Prefixes: Antidisestablishmentarianism • Suffixes: Antidisestablishmentarianism • Infixes: hingi (borrow) – humingi (borrower) in Tagalog • Circumfixes: sagen (say) – gesagt (said) in German • Agglutinative Languages • uygarlaştıramadıklarımızdanmışsınızcasına • uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına • Behaving as if you are among those whom we could not cause to become civilized

  16. Templatic Morphology • Roots and Patterns ب ت ك ב ת כ K T B ? و ? ? مَ ? ו ? ? כתוב مكتوب maktuubwritten ktuuvwritten

  17. Templatic Morphology: Root Meaning • KTB: writing “stuff” كتاب book write كتب כתב כתיב spelling مكتبة library letter מכתב مكتوب כתובת address مكتب office writer كاتب כתב

  18. Derivational vs. Inflectional • Word Classes • Parts of speech: noun, verb, adjectives, etc. • Word class dictates how a word combines with morphemes to form new words

  19. Derivational morphology • Nominalization: computerization, appointee, killer, fuzziness • Formation of adjectives: computational, clueless, embraceable • CatVar: Categorial Variation Database http://clipdemos.umiacs.umd.edu/catvar/

  20. Inflectional morphology • Adds: Tense, number, person, mood, aspect • Word class doesn’t change • Word serves new grammatical role • Five verb forms in English • Other languages have (lots more)

  21. Nouns and Verbs (in English) • Nouns have simple inflectional morphology • cat • cat+s, cat+’s • Verbs have more complex morphology

  22. Regulars and Irregulars • Nouns • Cat/Cats • Mouse/Mice, Ox, Oxen, Goose, Geese • Verbs • Walk/Walked • Go/Went, Fly/Flew

  23. Regular (English) Verbs

  24. Irregular (English) Verbs

  25. “To love” in Spanish

  26. Computational Morphology • Finite State Morphology • Finite State Transducers (FST) • Input/Output • Analysis/Generation

  27. Computational Morphology WORD STEM (+FEATURES)* • cats cat +N +PL • cat cat +N +SG • cities city +N +PL • geese goose +N +PL • ducks (duck +N +PL) or (duck +V +3SG) • merging merge +V +PRES-PART • caught (catch +V +PAST-PART) or (catch +V +PAST)

  28. Building a Morphological Parser • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only

  29. Lexicon-only Morphology • The lexicon lists all surface level and lexical level pairs • No rules …? • Analysis/Generation is easy • Very large for English • What about Arabic or Turkish? • Chinese? acclaim acclaim $N$ acclaim acclaim $V+0$ acclaimed acclaim $V+ed$ acclaimed acclaim $V+en$ acclaiming acclaim $V+ing$ acclaims acclaim $N+s$ acclaims acclaim $V+s$ acclamation acclamation $N$ acclamations acclamation $N+s$ acclimate acclimate $V+0$ acclimated acclimate $V+ed$ acclimated acclimate $V+en$ acclimates acclimate $V+s$ acclimating acclimate $V+ing$

  30. Building a Morphological Parser • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only

  31. Lexicon and Rules:FSA Inflectional Noun Morphology • English Noun Lexicon • English Noun Rule

  32. Lexicon and Rules: FSA English Verb Inflectional Morphology

  33. FSA for Derivational Morphology: Adjectival Formation

  34. More Complex Derivational Morphology

  35. Using FSAs for Recognition: English Nouns and their Inflection

  36. Morphological Parsing • Finite-state automata (FSA) • Recognizer • One-level morphology • Finite-state transducers (FST) • Two-level morphology • PC-Kimmo (Koskenniemi 83) • input-output pair

  37. Terminology for PC-Kimmo • Upper = lexical tape • Lower = surface tape • Characters correspond to pairs, written a:b • If “a:a”, write “a” for shorthand • Two-level lexical entries • # = word boundary • ^ = morpheme boundary • Other = “any feasible pair that is not in this transducer” • Final states indicated with “:” and non-final states indicated with “.”

  38. Four-Fold View of FSTs • As a recognizer • As a generator • As a translator • As a set relater

  39. Nominal Inflection FST

  40. Lexical and Intermediate Tapes

  41. Spelling Rules

  42. Chomsky and Halle Notation x s z ^ __ s # ε → e /

  43. Intermediate-to-Surface Transducer

  44. State Transition Table

  45. Two-Level Morphology

  46. Sample Run KIMMO DEMO

  47. FSTs and ambiguity • Parse Example 1: unionizable • union +ize +able • un+ ion +ize +able • Parse Example 2: assess • assessv • assN +essN • Parse Example 3: tender • tenderAJ • tenNum+dAJ+erCMP

  48. What to do about Global Ambiguity? • Accept first successful structure • Run parser through all possible paths • Bias the search in some manner

  49. Computational Morphology • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only

  50. Computational Morphology • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only (next time!!)

More Related