Morphology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 50

Morphology PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Morphology. What is morphology? Finite State Transducers Two Level Morphology. What is morphology?. Decomposition of words into meaningful units: anti dis establish ment arian ism Interacts with- syntax( categories and word order)

Download Presentation

Morphology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Morphology

What is morphology?

Finite State Transducers

Two Level Morphology


What is morphology?

  • Decomposition of words into meaningful units:

  • anti dis establish ment arian ism

  • Interacts with- syntax( categories and word order)

  • [establish] = verb + ment = noun

  • phonology: divine divinity

  • obscene obscenity

  • Interacts with semantics:

  • boy boys

  • Peter Peterchen


Phonological String

morphological analyzer

dictionary lookup

syntactic analyzer

lexical- semantic analysis

discourse processing


Why store all words as morphemes rather than all

Morphological combinations as words?

What does the morphological analyzer have to output?


The what and the how:

  • Efficient and effective algorithm to decompose categories into,

  • or build categories from, component morphemes.

  • What this algorithm will be depends on problems it has to solve.

  • In turn depends on representations computed.

  • Given stem /lemma ( e.g. ‘jump’ add material to change category

  • Or grammatical properties of word ‘jumped’, ‘jumpable’

  • order of composition matters:

  • ride/ riding

  • enoble/ enobling/*nobling Adj ---> V, V===> V+ing

  • trance/*trancing/entrance/entrancing


CONCATENATIVE MORPHOLOGICAL PROCESSES:

COMPOUNDING:

firefighter

PREFIXATION:

Un+ well

INFIXATION: ( TAGALOG)

fikas - strong

fumikas - be strong

SUFFIXATION:

Kick + er

CIRCUMFIXATION: ( German)

ge [sag] t past prefix [say] past suffix


Inflectional Morphology

  • non category changing, required by syntax

  • Agreement: person/number:

  • Je parle

  • Nous parlons

  • Ils parlent

  • Gender:

  • la petite ( the little one (fem))

  • le petit ( the little one (masc))

  • la squelette ( the skeleton)


Derivational Morphology

  • changes category. Not required by syntax

  • Deverbal Nominal:

  • bak+er tion: destroy/destruction

  • catch+ er Roman's destruction of the city

  • 'er' = agent of action Catcher of the ball

  • John’s catcher of the ball

  • 'John" ~= one who caught


Regular vs Irregular

Jump/jumped hit/hit bring/brought sing/sang

Productive/Non-Productive

adore/adorable, kick/kickable, fax/faxable

produce/production destroy/destruction *graft/graftuction

Bring/ brought


Regular (English) Verbs


Irregular (English) Verbs


“To love” in Spanish


  • Productive and rule governed:

  • fax fax +er

  • ??? Crudoy cruduction

  • Category sensitivity:

  • breakable/* manable

  • sensitivity/ *hittivity

  • Semantic sensitivity:

  • un + well un + happy

  • *un + ill *un+ sad


Store morphemes or words?

lebensversicherungsgesellschaftsangesteller

leben+ versicherung + gesellschaft+s+angesteller

life insurance company +Poss employee

Turkish:

Turkish verns have 40k forms


Non- concatenative Morphology

  • Templatic morphology (Semitic languages):lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)


Concatenation: Beads on a string

Agglutinative ( concatenative) languages are well behaved for FSAs

as long as we don’t include phonological or spelling changes

Verb Lexicon:

jump+edjump

kiss+ed kiss

stream+ed stream

*hopp+ed hop, ???

verb

ed

q

q 1

q

q1

q2

0


Pieces of a Morphological Analyzer

-er,est,ly

un

adj-root

q2

q3

q0

q1

The lexicon stores the lemmas, and divides them into adjective classes

really/clearly *bigly/redly

Morphotactics:

State sequence indicates order of morpheme composition

e.g. comparative or adverb formation is by suffixation


Lexicon

  • Arranged as TRIE ( letter strings in common relative to position

  • n-k-e-y

  • D-o

  • -g

  • Classed by part of speech category ( noun, verb) and morphotactic

  • (which other affixes can precede or follow)

  • or orthographic considerations.


Orthography

  • spelling rules- handle phonological or spelling variation in

  • orthographic a morpheme

  • Try /trying/tries

  • Cringe/cringing/cringes


FSA for Inflectional Morphology: English Nouns


FSA for Inflectional Morphology: English Verbs


FSA for Derivational Morphology: Adjectival Formation


More Complex Derivational Morphology


Using FSAs for Recognition: English Nouns and their Inflection


  • Orthographic

  • Want association between morpheme and semantic function

  • Want association between allographs or allophones of the same

  • phoneme

  • Allographs:

  • city -cities

  • bake- baking

  • divine-divinity

  • try tried


Finite State Transducers (FSTs)- the Big Idea

Need to relate lexical level, the level that gives us the morphological

analysis (+plural,+able to the surface level that keeps track of

phonological/

or graphological (spelling_ changes)


Parsing vs recognition

  • An FSA can give you the string composition of a morphological sequence, and can tell you whether a given morphological string is or is not in the language. It recognizes the string

  • An FST parses the string. It tells you the morphological structure associated with the string. Other instances of parsing?


Formal definition

  • An FST defines a relation between sets of pairs of strings:

  • It contains at least a lexical level that is a concatenation of morphemes

  • and a surface level that shows the correct spelling for each

  • morpheme in a given context

  • cat/sheep ^ s

  • e.g. noun (instanciated from lexicon) + plural

  • E s

  • cats/sheep


Q= finite set of states q0 to qn

finite alphabet of complex symbols (feasible pairs)

i:o with one symbol from the input alphabet

Q0 = the start state

F= set of final states

 = (q, i:o) the transition function or matrixbetween

states. Takes a state from Q and a complex symbol

i:o from and returns a new state.

feasible pair: a relation of a symbol on one tape to a symbol

on the other tape.

e.g. can + [pl:^s]


  • default pair- the upper tape is the same as the lower tape

  • same input as output :c*a*t/c:c*a:a*t:t*pl:^s

  • feasible pairs either stated in lexicon if irregular

  • g:g*o:e*o:e*s:s*e:e goose:geese

  • or by an automaton that stipulates correspondence in rule

  • governed way if the relation is regular. If regular, indicated as

  • Default paris and usually represented by one symbol.

  • FSTs are closed under:

  • inversion: switches i/o labels

  • composition: union of two transducers

  • one after the other.


trie: in lexicon, categories arranged by letter one at a time with

class at end. Allows parallel search as long as things match

e.g. m*e*t*a*l <N> m*e*t*a <root>

metal, meta-language


Kimmo-BasedMorphological Parsing

  • Two-level morphology: lexical level + surface level (Koskenniemi 83)

  • Finite-state transducers (FST): input-output pair


Four-Fold View of FSTs

  • As a recognizer

  • As a generator

  • As a translator

  • As a set relater


Terminology for Kimmo

  • Upper = lexical tape

  • Lower = surface tape

  • Characters correspond to pairs, written a:b

  • If “a=b”, write “a” for shorthand

  • Two-level lexical entries

  • # = word boundary

  • ^ = morpheme boundary

  • Other = “any feasible pair that is not in this tranducer”


Nominal Inflection FST


Lexical and Intermediate Tapes


Spelling Rules


Notation

x

s

z

^ __ s #

e --> e /


Intermediate-to-Surface Transducer


Two-Level Morphology


Sample Run


FSTs and ambiguity

Parse Example 1: unionizable

Parse Example 2: assess


What to do about Global Ambiguity?

  • Accept first successful structure

  • Run parser through all possible paths

  • Bias the search in some manner


Some Limitations


Stemming

  • For some applications,don’t need full morphological analysis.

  • IR- don’t care that e.g ‘logician’ is related to ‘logical’ Just want

  • to know that if you are interested in articles about ‘logic’

  • may want former two classes as well. So just want to ‘get back

  • to root list.

  • Relate two forms by having a literal relation rule. E.g

  • al#---> 0

  • Is it useful: in a big document may not be necessary because the

  • will appear in many forms including form in query


  • stemming is morphologically impoverished so error driven

  • - can’t distinguish rules that apply at morpheme boundaries

  • versus internal to root:

  • patronization = patron + ize + ation

  • organization = organize+ ation

  • But the stemmer will treat these as a single class and derive

  • “organ” as an underlying root.

  • -’adverse’/’adversity

  • ‘universe / university


Psycholinguistics

  • Is the human lexicon efficient in the way computational lexica

  • are?

  • -Stanners et al (1979) :where two words are related inflection-

  • ally,then root stored and other forms rule derived. Where

  • there is a derivational relationship, then both forms are stored

  • paradigm = repetition priming

  • ‘great, happy, peachy, adorable , round, short, great

  • small

  • Repetition priming for ‘turns’ given ‘turning’ but not

  • ‘select’, ‘selective’


  • Marslen- Wilson et al (1994): May have priming for

  • Semantically similar derivationally related words:

  • permit/permission

  • * create/creativity

  • On-line versus long term storage lexicon:

  • Speech errors: ‘we have screw looses’


  • Login