Morphology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 50

Morphology PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Morphology. What is morphology? Finite State Transducers Two Level Morphology. What is morphology?. Decomposition of words into meaningful units: anti dis establish ment arian ism Interacts with- syntax( categories and word order)

Download Presentation

Morphology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Morphology

Morphology

What is morphology?

Finite State Transducers

Two Level Morphology


What is morphology

What is morphology?

  • Decomposition of words into meaningful units:

  • anti dis establish ment arian ism

  • Interacts with- syntax( categories and word order)

  • [establish] = verb + ment = noun

  • phonology: divine divinity

  • obscene obscenity

  • Interacts with semantics:

  • boy boys

  • Peter Peterchen


Morphology

Phonological String

morphological analyzer

dictionary lookup

syntactic analyzer

lexical- semantic analysis

discourse processing


Morphology

Why store all words as morphemes rather than all

Morphological combinations as words?

What does the morphological analyzer have to output?


The what and the how

The what and the how:

  • Efficient and effective algorithm to decompose categories into,

  • or build categories from, component morphemes.

  • What this algorithm will be depends on problems it has to solve.

  • In turn depends on representations computed.

  • Given stem /lemma ( e.g. ‘jump’ add material to change category

  • Or grammatical properties of word ‘jumped’, ‘jumpable’

  • order of composition matters:

  • ride/ riding

  • enoble/ enobling/*nobling Adj ---> V, V===> V+ing

  • trance/*trancing/entrance/entrancing


Morphology

CONCATENATIVE MORPHOLOGICAL PROCESSES:

COMPOUNDING:

firefighter

PREFIXATION:

Un+ well

INFIXATION: ( TAGALOG)

fikas - strong

fumikas - be strong

SUFFIXATION:

Kick + er

CIRCUMFIXATION: ( German)

ge [sag] t past prefix [say] past suffix


Inflectional morphology

Inflectional Morphology

  • non category changing, required by syntax

  • Agreement: person/number:

  • Je parle

  • Nous parlons

  • Ils parlent

  • Gender:

  • la petite ( the little one (fem))

  • le petit ( the little one (masc))

  • la squelette ( the skeleton)


Derivational morphology

Derivational Morphology

  • changes category. Not required by syntax

  • Deverbal Nominal:

  • bak+er tion: destroy/destruction

  • catch+ er Roman's destruction of the city

  • 'er' = agent of action Catcher of the ball

  • John’s catcher of the ball

  • 'John" ~= one who caught


Morphology

Regular vs Irregular

Jump/jumped hit/hit bring/brought sing/sang

Productive/Non-Productive

adore/adorable, kick/kickable, fax/faxable

produce/production destroy/destruction *graft/graftuction

Bring/ brought


Regular english verbs

Regular (English) Verbs


Morphology

Irregular (English) Verbs


To love in spanish

“To love” in Spanish


Morphology

  • Productive and rule governed:

  • fax fax +er

  • ??? Crudoy cruduction

  • Category sensitivity:

  • breakable/* manable

  • sensitivity/ *hittivity

  • Semantic sensitivity:

  • un + well un + happy

  • *un + ill *un+ sad


Morphology

Store morphemes or words?

lebensversicherungsgesellschaftsangesteller

leben+ versicherung + gesellschaft+s+angesteller

life insurance company +Poss employee

Turkish:

Turkish verns have 40k forms


Non concatenative morphology

Non- concatenative Morphology

  • Templatic morphology (Semitic languages):lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)


Concatenation beads on a string

Concatenation: Beads on a string

Agglutinative ( concatenative) languages are well behaved for FSAs

as long as we don’t include phonological or spelling changes

Verb Lexicon:

jump+edjump

kiss+ed kiss

stream+ed stream

*hopp+ed hop, ???

verb

ed

q

q 1

q

q1

q2

0


Pieces of a morphological analyzer

Pieces of a Morphological Analyzer

-er,est,ly

un

adj-root

q2

q3

q0

q1

The lexicon stores the lemmas, and divides them into adjective classes

really/clearly *bigly/redly

Morphotactics:

State sequence indicates order of morpheme composition

e.g. comparative or adverb formation is by suffixation


Morphology

Lexicon

  • Arranged as TRIE ( letter strings in common relative to position

  • n-k-e-y

  • D-o

  • -g

  • Classed by part of speech category ( noun, verb) and morphotactic

  • (which other affixes can precede or follow)

  • or orthographic considerations.


Orthography

Orthography

  • spelling rules- handle phonological or spelling variation in

  • orthographic a morpheme

  • Try /trying/tries

  • Cringe/cringing/cringes


Fsa for inflectional morphology english nouns

FSA for Inflectional Morphology: English Nouns


Fsa for inflectional morphology english verbs

FSA for Inflectional Morphology: English Verbs


Fsa for derivational morphology adjectival formation

FSA for Derivational Morphology: Adjectival Formation


More complex derivational morphology

More Complex Derivational Morphology


Using fsas for recognition english nouns and their inflection

Using FSAs for Recognition: English Nouns and their Inflection


Morphology

  • Orthographic

  • Want association between morpheme and semantic function

  • Want association between allographs or allophones of the same

  • phoneme

  • Allographs:

  • city -cities

  • bake- baking

  • divine-divinity

  • try tried


Morphology

Finite State Transducers (FSTs)- the Big Idea

Need to relate lexical level, the level that gives us the morphological

analysis (+plural,+able to the surface level that keeps track of

phonological/

or graphological (spelling_ changes)


Parsing vs recognition

Parsing vs recognition

  • An FSA can give you the string composition of a morphological sequence, and can tell you whether a given morphological string is or is not in the language. It recognizes the string

  • An FST parses the string. It tells you the morphological structure associated with the string. Other instances of parsing?


Formal definition

Formal definition

  • An FST defines a relation between sets of pairs of strings:

  • It contains at least a lexical level that is a concatenation of morphemes

  • and a surface level that shows the correct spelling for each

  • morpheme in a given context

  • cat/sheep ^ s

  • e.g. noun (instanciated from lexicon) + plural

  • E s

  • cats/sheep


Morphology

Q= finite set of states q0 to qn

finite alphabet of complex symbols (feasible pairs)

i:o with one symbol from the input alphabet

Q0 = the start state

F= set of final states

 = (q, i:o) the transition function or matrixbetween

states. Takes a state from Q and a complex symbol

i:o from and returns a new state.

feasible pair: a relation of a symbol on one tape to a symbol

on the other tape.

e.g. can + [pl:^s]


Morphology

  • default pair- the upper tape is the same as the lower tape

  • same input as output :c*a*t/c:c*a:a*t:t*pl:^s

  • feasible pairs either stated in lexicon if irregular

  • g:g*o:e*o:e*s:s*e:e goose:geese

  • or by an automaton that stipulates correspondence in rule

  • governed way if the relation is regular. If regular, indicated as

  • Default paris and usually represented by one symbol.

  • FSTs are closed under:

  • inversion: switches i/o labels

  • composition: union of two transducers

  • one after the other.


Morphology

trie: in lexicon, categories arranged by letter one at a time with

class at end. Allows parallel search as long as things match

e.g. m*e*t*a*l <N> m*e*t*a <root>

metal, meta-language


Kimmo based morphological parsing

Kimmo-BasedMorphological Parsing

  • Two-level morphology: lexical level + surface level (Koskenniemi 83)

  • Finite-state transducers (FST): input-output pair


Four fold view of fsts

Four-Fold View of FSTs

  • As a recognizer

  • As a generator

  • As a translator

  • As a set relater


Terminology for kimmo

Terminology for Kimmo

  • Upper = lexical tape

  • Lower = surface tape

  • Characters correspond to pairs, written a:b

  • If “a=b”, write “a” for shorthand

  • Two-level lexical entries

  • # = word boundary

  • ^ = morpheme boundary

  • Other = “any feasible pair that is not in this tranducer”


Nominal inflection fst

Nominal Inflection FST


Lexical and intermediate tapes

Lexical and Intermediate Tapes


Spelling rules

Spelling Rules


Notation

Notation

x

s

z

^ __ s #

e --> e /


Intermediate to surface transducer

Intermediate-to-Surface Transducer


Two level morphology

Two-Level Morphology


Sample run

Sample Run


Fsts and ambiguity

FSTs and ambiguity

Parse Example 1: unionizable

Parse Example 2: assess


What to do about global ambiguity

What to do about Global Ambiguity?

  • Accept first successful structure

  • Run parser through all possible paths

  • Bias the search in some manner


Some limitations

Some Limitations


Morphology

Stemming

  • For some applications,don’t need full morphological analysis.

  • IR- don’t care that e.g ‘logician’ is related to ‘logical’ Just want

  • to know that if you are interested in articles about ‘logic’

  • may want former two classes as well. So just want to ‘get back

  • to root list.

  • Relate two forms by having a literal relation rule. E.g

  • al#---> 0

  • Is it useful: in a big document may not be necessary because the

  • will appear in many forms including form in query


Morphology

  • stemming is morphologically impoverished so error driven

  • - can’t distinguish rules that apply at morpheme boundaries

  • versus internal to root:

  • patronization = patron + ize + ation

  • organization = organize+ ation

  • But the stemmer will treat these as a single class and derive

  • “organ” as an underlying root.

  • -’adverse’/’adversity

  • ‘universe / university


Morphology

Psycholinguistics

  • Is the human lexicon efficient in the way computational lexica

  • are?

  • -Stanners et al (1979) :where two words are related inflection-

  • ally,then root stored and other forms rule derived. Where

  • there is a derivational relationship, then both forms are stored

  • paradigm = repetition priming

  • ‘great, happy, peachy, adorable , round, short, great

  • small

  • Repetition priming for ‘turns’ given ‘turning’ but not

  • ‘select’, ‘selective’


Morphology

  • Marslen- Wilson et al (1994): May have priming for

  • Semantically similar derivationally related words:

  • permit/permission

  • * create/creativity

  • On-line versus long term storage lexicon:

  • Speech errors: ‘we have screw looses’


  • Login