morphology words and their parts n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Morphology: Words and their Parts PowerPoint Presentation
Download Presentation
Morphology: Words and their Parts

Loading in 2 Seconds...

play fullscreen
1 / 26

Morphology: Words and their Parts - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Morphology: Words and their Parts. CS 4705. Basic Uses of Morphology. The study of how words are composed from smaller, meaning-bearing units ( morphemes ) Applications: Spelling correction: referece Hyphenation algorithms: refer-ence Part-of-speech analysis: googler

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Morphology: Words and their Parts' - steffi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
basic uses of morphology
Basic Uses of Morphology
  • The study of how words are composed from smaller, meaning-bearing units (morphemes)
  • Applications:
    • Spelling correction: referece
    • Hyphenation algorithms: refer-ence
    • Part-of-speech analysis: googler
    • Text-to-speech: grapheme-to-phoneme conversion
      • hothouse (/T/ or /D/)
slide3
Speech recognition: phoneme-to-grapheme conversion
  • Amusing poetry and artificial languages in standardized tests
    • ‘Twas brillig and the slithy toves…
    • Muggles moogled migwiches
what is a word
What is a word?
  • In formal languages, words are arbitrary strings
  • In natural languages, words are made up of meaningful subunits called morphemes
    • Allows for productivity: googled, texted
    • Abstract concepts denoting entities or relationships in the world
      • Roots +
      • Syntactic or grammatical elements
    • Realizations of morphemes: morphs
      • Door realizes door; take and took realize take
slide5
Allomorphs are classes of related morphs that realize a given morpheme
      • Allomorphs of s include en, men, es in English
      • Take and took are allomorphs of take
  • Sum: Morpheme [s] is realized by an allomorph class that includes the related morphs {en,men,es}
  • Syntactic or grammatical morphemes can convey many things
      • In Italian, mark nouns for gender and number

Singular Plural

Masc pomodoro pomodori

Fem cipolla cipolle

pomodor- cipoll-: stems, may or may not occur on their own as words

  • Stem may not occur as a word: derivative/deriv
  • Base form (lemma) occurs as word: derivative/derive
  • Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too
what useful information does morphology give us
What useful information does morphology give us?
  • Different things in different languages
    • Spanish: hablo, hablaré/ English: I speak, I will speak
    • English: book, books/ Japanese: hon, hon
  • Languages differ in how they encode morphological information
    • Isolating languages (e.g. Cantonese) have no affixes: each word usually has 1 morpheme
    • Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem (like beads on a string) – each feature realized by a single affix, e.g. Finnish
slide7
epäjärjestelmällistyttämättömyydellänsäkäänköhän

‘Wonder if he can also ... with his capability of not causing things to be unsystematic’

  • Inflectional languages (e.g. English) merge different features into a single affix (e.g. ‘s’ in likes indicates both person and tense); and the same feature can be realized by different affixes
  • Polysynthetic languages (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g. Western Greenlandic

Aliikusersuillammassuaanerartassagaluarpaalli.aliiku-sersu-i-llammas-sua-a-nerar-ta-ssa-galuar-paal-lientertainment-provide-SEMITRANS-one.good.at-COP-say.that-REP-FUT-sure.but-3.PL.SUBJ/3SG.OBJ-but'However, they will say that he is a great entertainer, but ...'

  • So….different languages may require very different morphological analyzers
morphology can help define word classes
Morphology Can Help Define Word Classes
  • AKA morphological classes, parts-of-speech
  • Closed vs. open (function vs. content) class words
    • Pronoun, preposition, conjunction, determiner,…
    • Noun, verb, adverb, adjective,…
  • Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic detection…very basic semantics
english inflectional morphology
(English) Inflectional Morphology

Word stem + grammatical morpheme  different forms of same word

    • Usually produces word of same class
    • Usually serves a syntactic or grammatical function (e.g. agreement)

like  likes or liked

bird  birds

  • Nominal morphology
    • Plural forms
      • s or es
      • Irregular forms (goose/geese)
slide10
Mass vs. count nouns (fish/fish(es), email or emails?)
    • Possessives (cat’s, cats’)
  • Verbal inflection
    • Main verbs (sleep, like, fear) relatively regular
      • -s, ing, ed
      • And productive: emailed, instant-messaged, faxed, homered
      • But some are not:
        • eat/ate/eaten, catch/caught/caught
    • Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive
          • Be: am/is/are/were/was/been/being
    • Irregular verbs few (~250) but frequently occurring
slide11
Particles occur in only one form: in English
    • Prepositions: to, from
    • Adverbs: happily, quickly
    • Conjunctions: but, and
    • Articles: the, a, an
    • Japanese?
  • So….English inflectional morphology is fairly easy to model….with some special cases...
derivational morphology
Derivational Morphology
  • Word stem + syntactic/grammaticalmorpheme  new words
    • Usually produces word ofdifferent class
    • Incomplete process: derivational morphs cannot be applied to just any member of a class
  • Verbs --> nouns
    • -ize verbs  -ation nouns
    • generalize, realize  generalization, realization
    • synthesize but no synthesization
slide13
Verbs, nouns  adjectives
    • embrace, pity embraceable, pitiable
    • care, wit  careless, witless
  • Adjective  adverb
    • happy  happily
  • Process selective in unpredictable ways
    • Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness
    • Meanings of derived terms harder to predict by rule
      • clueless, careless, nerveless, sleepless
slide14
Derivation can be applied recursively:
    • Hospital  hospitalize  hospitalization  prehospitalization  …
    • Morphological analysisidentifies concatenative processes as well as morphemes

[pre[[[hospital]ize]ation]]

    • But there are bracketing paradoxes

unhappier

[un[happier]: not happier

[[unhappy]er]: more unhappy

compounding
Compounding
  • Two base forms join to form a new word
    • Bedtime, Weinerschnitzel, Rotwein
    • Careful? Compound or derivation?
affixes can be attached to stems in different ways
Affixes can be attached to stems in different ways
  • Prefixation
    • Immaterial
  • Suffixation: more common across languages than prefixation
    • Trying
  • Circumfixation: combine prefixation and suffixation
    • Gesagt
slide17
Infixation
    • English: Absobl**dylutely
    • Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red)  kumilad (to be red))
concatenative vs non concatenative morphology
Concatenative vs. Non-concatenative Morphology
  • Semitic root-and-pattern morphology
    • Root (2-4 consonants) conveys basic semantics (e.g. Arabic /ktb/)
    • Vowel pattern conveys voice and aspect
    • Derivational template (binyan) identifies word class
slide19
Template Vowel Pattern

active passive

CVCVC katabkutib write

CVCCVC kattabkuttib cause to write

CVVCVC ka:tab ku:tib correspond

tVCVVCVC taka:tab tuku:tib write each other

nCVVCVC nka:tab nku:tib subscribe

CtVCVC ktatab ktutib write

stVCCVC staktab stuktib dictate

morphotactics
Morphotactics
  • What are the ‘rules’ for constructing a word in a given language?
    • Pseudo-intellectual vs. *intellectual-pseudo
    • Rational-ize vs *ize-rational
    • Cretin-ous vs. *cretin-ly vs. *cretin-acious
  • Possible ‘rules’
    • Suffixes are suffixes and prefixes are prefixes
    • Certain affixes attach to certain types of stems (nouns, verbs, etc.)
    • Certain stems can/cannot take certain affixes
slide21
Semantics: In English, un- cannot attach to adjectives that already have a negative connotation:
    • Unhappy vs. *unsad
    • Unhealthy vs. *unsick
    • Unclean vs. *undirty
  • Phonology: In English, -er cannot attach to words of more than two syllables
    • great, greater
    • Happy, happier
    • Competent, *competenter
    • Elegant, *eleganter
    • Unruly, ?unrulier
morphological parsing
Morphological Parsing
  • These regularities enable us to create software to parse words into their component parts
    • Known words and new ones (e.g. Pneumonoultramicroscopicsilicovolcanoconiosis, Columbianize, Columbianization)
morphological representations evidence from human performance
Morphological Representations: Evidence from Human Performance
  • Hypotheses:
    • Full listing hypothesis: words listed
    • Minimum redundancy hypothesis: morphemes listed
  • Experimental evidence:
    • Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither
    • Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)
slide24
But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart)
  • Speech errors suggest affixes must be represented separately in the mental lexicon
    • ‘easy enoughly’ for ‘easily enough’
summing up
Summing Up
  • Different languages have different morphological systems
    • If we can discover how to decode such a system, we can identify useful information about the word class and the semantic meaning of a word
    • Morphological regularities provide basis for building (automatic) morphological analyzers
  • Next time: Read Ch 3.2-3.6
    • HW1 will be assigned (check the course syllabus and courseworks)
announcements
Announcements
  • HW1 will now be due 9/25/07
  • WICS lunch tomorrow at noon in the CS Lounge, 452 MUDD (rsvp to hila@cs.columbia.edu)