1 / 26

Morphology: Words and their Parts

Morphology: Words and their Parts. CS 4705. Basic Uses of Morphology. The study of how words are composed from smaller, meaning-bearing units ( morphemes ) Applications: Spelling correction: referece Hyphenation algorithms: refer-ence Part-of-speech analysis: googler

steffi
Download Presentation

Morphology: Words and their Parts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Morphology: Wordsand their Parts CS 4705 CS 4705

  2. Basic Uses of Morphology • The study of how words are composed from smaller, meaning-bearing units (morphemes) • Applications: • Spelling correction: referece • Hyphenation algorithms: refer-ence • Part-of-speech analysis: googler • Text-to-speech: grapheme-to-phoneme conversion • hothouse (/T/ or /D/)

  3. Speech recognition: phoneme-to-grapheme conversion • Amusing poetry and artificial languages in standardized tests • ‘Twas brillig and the slithy toves… • Muggles moogled migwiches

  4. What is a word? • In formal languages, words are arbitrary strings • In natural languages, words are made up of meaningful subunits called morphemes • Allows for productivity: googled, texted • Abstract concepts denoting entities or relationships in the world • Roots + • Syntactic or grammatical elements • Realizations of morphemes: morphs • Door realizes door; take and took realize take

  5. Allomorphs are classes of related morphs that realize a given morpheme • Allomorphs of s include en, men, es in English • Take and took are allomorphs of take • Sum: Morpheme [s] is realized by an allomorph class that includes the related morphs {en,men,es} • Syntactic or grammatical morphemes can convey many things • In Italian, mark nouns for gender and number Singular Plural Masc pomodoro pomodori Fem cipolla cipolle pomodor- cipoll-: stems, may or may not occur on their own as words • Stem may not occur as a word: derivative/deriv • Base form (lemma) occurs as word: derivative/derive • Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too

  6. What useful information does morphology give us? • Different things in different languages • Spanish: hablo, hablaré/ English: I speak, I will speak • English: book, books/ Japanese: hon, hon • Languages differ in how they encode morphological information • Isolating languages (e.g. Cantonese) have no affixes: each word usually has 1 morpheme • Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem (like beads on a string) – each feature realized by a single affix, e.g. Finnish

  7. epäjärjestelmällistyttämättömyydellänsäkäänköhän ‘Wonder if he can also ... with his capability of not causing things to be unsystematic’ • Inflectional languages (e.g. English) merge different features into a single affix (e.g. ‘s’ in likes indicates both person and tense); and the same feature can be realized by different affixes • Polysynthetic languages (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g. Western Greenlandic Aliikusersuillammassuaanerartassagaluarpaalli.aliiku-sersu-i-llammas-sua-a-nerar-ta-ssa-galuar-paal-lientertainment-provide-SEMITRANS-one.good.at-COP-say.that-REP-FUT-sure.but-3.PL.SUBJ/3SG.OBJ-but'However, they will say that he is a great entertainer, but ...' • So….different languages may require very different morphological analyzers

  8. Morphology Can Help Define Word Classes • AKA morphological classes, parts-of-speech • Closed vs. open (function vs. content) class words • Pronoun, preposition, conjunction, determiner,… • Noun, verb, adverb, adjective,… • Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic detection…very basic semantics

  9. (English) Inflectional Morphology Word stem + grammatical morpheme  different forms of same word • Usually produces word of same class • Usually serves a syntactic or grammatical function (e.g. agreement) like  likes or liked bird  birds • Nominal morphology • Plural forms • s or es • Irregular forms (goose/geese)

  10. Mass vs. count nouns (fish/fish(es), email or emails?) • Possessives (cat’s, cats’) • Verbal inflection • Main verbs (sleep, like, fear) relatively regular • -s, ing, ed • And productive: emailed, instant-messaged, faxed, homered • But some are not: • eat/ate/eaten, catch/caught/caught • Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive • Be: am/is/are/were/was/been/being • Irregular verbs few (~250) but frequently occurring

  11. Particles occur in only one form: in English • Prepositions: to, from • Adverbs: happily, quickly • Conjunctions: but, and • Articles: the, a, an • Japanese? • So….English inflectional morphology is fairly easy to model….with some special cases...

  12. Derivational Morphology • Word stem + syntactic/grammaticalmorpheme  new words • Usually produces word ofdifferent class • Incomplete process: derivational morphs cannot be applied to just any member of a class • Verbs --> nouns • -ize verbs  -ation nouns • generalize, realize  generalization, realization • synthesize but no synthesization

  13. Verbs, nouns  adjectives • embrace, pity embraceable, pitiable • care, wit  careless, witless • Adjective  adverb • happy  happily • Process selective in unpredictable ways • Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness • Meanings of derived terms harder to predict by rule • clueless, careless, nerveless, sleepless

  14. Derivation can be applied recursively: • Hospital  hospitalize  hospitalization  prehospitalization  … • Morphological analysisidentifies concatenative processes as well as morphemes [pre[[[hospital]ize]ation]] • But there are bracketing paradoxes unhappier [un[happier]: not happier [[unhappy]er]: more unhappy

  15. Compounding • Two base forms join to form a new word • Bedtime, Weinerschnitzel, Rotwein • Careful? Compound or derivation?

  16. Affixes can be attached to stems in different ways • Prefixation • Immaterial • Suffixation: more common across languages than prefixation • Trying • Circumfixation: combine prefixation and suffixation • Gesagt

  17. Infixation • English: Absobl**dylutely • Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red)  kumilad (to be red))

  18. Concatenative vs. Non-concatenative Morphology • Semitic root-and-pattern morphology • Root (2-4 consonants) conveys basic semantics (e.g. Arabic /ktb/) • Vowel pattern conveys voice and aspect • Derivational template (binyan) identifies word class

  19. Template Vowel Pattern active passive CVCVC katabkutib write CVCCVC kattabkuttib cause to write CVVCVC ka:tab ku:tib correspond tVCVVCVC taka:tab tuku:tib write each other nCVVCVC nka:tab nku:tib subscribe CtVCVC ktatab ktutib write stVCCVC staktab stuktib dictate

  20. Morphotactics • What are the ‘rules’ for constructing a word in a given language? • Pseudo-intellectual vs. *intellectual-pseudo • Rational-ize vs *ize-rational • Cretin-ous vs. *cretin-ly vs. *cretin-acious • Possible ‘rules’ • Suffixes are suffixes and prefixes are prefixes • Certain affixes attach to certain types of stems (nouns, verbs, etc.) • Certain stems can/cannot take certain affixes

  21. Semantics: In English, un- cannot attach to adjectives that already have a negative connotation: • Unhappy vs. *unsad • Unhealthy vs. *unsick • Unclean vs. *undirty • Phonology: In English, -er cannot attach to words of more than two syllables • great, greater • Happy, happier • Competent, *competenter • Elegant, *eleganter • Unruly, ?unrulier

  22. Morphological Parsing • These regularities enable us to create software to parse words into their component parts • Known words and new ones (e.g. Pneumonoultramicroscopicsilicovolcanoconiosis, Columbianize, Columbianization)

  23. Morphological Representations: Evidence from Human Performance • Hypotheses: • Full listing hypothesis: words listed • Minimum redundancy hypothesis: morphemes listed • Experimental evidence: • Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither • Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)

  24. But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart) • Speech errors suggest affixes must be represented separately in the mental lexicon • ‘easy enoughly’ for ‘easily enough’

  25. Summing Up • Different languages have different morphological systems • If we can discover how to decode such a system, we can identify useful information about the word class and the semantic meaning of a word • Morphological regularities provide basis for building (automatic) morphological analyzers • Next time: Read Ch 3.2-3.6 • HW1 will be assigned (check the course syllabus and courseworks)

  26. Announcements • HW1 will now be due 9/25/07 • WICS lunch tomorrow at noon in the CS Lounge, 452 MUDD (rsvp to hila@cs.columbia.edu)

More Related