morphology l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Morphology PowerPoint Presentation
Download Presentation
Morphology

Loading in 2 Seconds...

play fullscreen
1 / 31

Morphology - PowerPoint PPT Presentation


  • 298 Views
  • Uploaded on

Morphology. See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics , Oxford (2004): OUP D Jurafsky & JH Martin: Speech and Language Processing , Upper Saddle River NJ (2000): Prentice Hall, Chapter 3 [quite technical].

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Morphology' - magnar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
morphology

Morphology

See

Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP

D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall, Chapter 3 [quite technical]

morphology reminder
Morphology - reminder
  • Internal analysis of word forms
  • morpheme – allomorphic variation
  • Words usually consist of a root plus affix(es), though some words can have multiple roots, and some can be single morphemes
  • lexeme – abstract notion of group of word forms that ‘belong’ together
    • lexeme ~ root ~ stem ~ base form ~ dictionary (citation) form
role of morphology
Role of morphology
  • Commonly made distinction: inflectional vs derivational
  • Inflectional morphology is grammatical
    • number, tense, case, gender
  • Derivational morphology concerns word building
    • part-of-speech derivation
    • words with related meaning
inflectional morphology
Inflectional morphology
  • Grammatical in nature
  • Does not carry meaning, other than grammatical meaning
  • Highly systematic, though there may be irregularities and exceptions
    • Simplifies lexicon, only exceptions need to be listed
    • Unknown words may be guessable
  • Language-specific and sometimes idiosyncratic
  • (Mostly) helpful in parsing
derivational morphology
Derivational morphology
  • Lexical in nature
  • Can carry meaning
  • Fairly systematic, and predictable up to a point
    • Simplifies description of lexicon: regularly derived words need not be listed
    • Unknown words may be guessable
  • But …
    • Apparent derivations have specialised meaning
    • Some derivations missing
  • Languages often have parallel derivations which may be translatable
morphological processes
Morphological processes
  • Affixes: prefix, suffix, infix, circumfix
  • Vowel change (umlaut, ablaut)
  • Gemination, (partial) reduplication
  • Root and pattern
  • Stress (or tone) change
  • Sandhi
morphophonemics
Morphophonemics
  • Morphemes and allomorphs
    • eg {plur}: +(e)s, vowel change, yies, fves, um a,, ...
  • Morphophonemic variation
    • Affixes and stems may have variants which are conditioned by context
      • eg +ing in lifting, swimming, boxing, raining, hoping, hopping
    • Rules may be generalisable across morphemes
      • eg +(e)s in cats, boxes, tomatoes, matches, dishes, buses
      • Applies to both {plur} (nouns) and {3rd sing pres} (verbs)
morphology in nlp
Morphology in NLP
  • Analysis vs synthesis
    • what does dogs mean? vs what is the plural of dog?
  • Analysis
    • Need to identify lexeme
      • Tokenization
      • To access lexical information
    • Inflections (etc) carry information that will be needed by other processes (eg agreement useful in parsing, inflections can carry meaning (eg tense, number)
    • Morphology can be ambiguous
      • May need other process to disambiguate (eg German –en)
  • Synthesis
    • Need to generate appropriate inflections from underlying representation
morphology in nlp9
Morphology in NLP
  • String-handling programs can be written
  • More general approach
    • formalism to write rules which express correspondence between surface and underlying form (eg dogs = dog +{plur})
    • Computational algorithm (program) which can apply those rules to actual instances
    • Especially of interest if rules (though not program) is independent of direction: analysis or synthesis
role of lexicon in morphology
Role of lexicon in morphology
  • Rules interact with the lexicon
    • Obviously category information
      • eg rules that apply to nouns
    • Note also morphology-related subcategories
      • eg “er” verbs in French, rules for gender agreement
    • Other lexical information can impact on morphology
      • eg all fish have two forms of the plural (+s and )
      • in Slavic languages case inflections differ for inanimate and animate nouns)
problems with rules
Problems with rules
  • Exceptions have to be covered
    • Including systematic irregularities
    • May be a trade-off between treating something as a small group of irregularities or as a list of unrelated exceptions (eg French irregular verbs, English fves)
  • Rules must not over/under-generate
    • Must cover all and only the correct cases
    • May depend on what order the rules are applied in
tokenization
Tokenization
  • The simplest form of analysis is to reduce different word forms into tokens
  • Also called “normalization”
  • For example, if you want to count how many times a given ‘word’ occurs in a text
  • Or you want to search for texts containing certain ‘words’ (e.g. Google)
morphological processing
Morphological processing
  • Stemming
  • String-handling approaches
    • Regular expressions
    • Mapping onto finite-state automata
  • 2-level morphology
    • Mapping between surface form and lexical representation
stemming
Stemming
  • Stemming is the particular case of tokenization which reduces inflected forms to a single base form or stem
  • (Recall our discussion of stem ~ base form ~ dictionary form ~ citation form)
  • Stemming algorithms are basic string-handling algorithms, which depend on rules which identify affixes that can be stripped
finite state automata
Finite state automata
  • A finite state automaton is a simple and intuitive formalism with straightforward computational properties (so easy to implement)
  • A bit like a flow chart, but can be used for both recognition (analysis) and generation
  • FSAs have a close relationship with “regular expressions”, a formalism for expressing strings, mainly used for searching texts, or stipulating patterns of strings
finite state automata16
Finite state automata
  • A bit like a flow chart, but can be used for both recognition and generation
  • “Transition network”
  • Unique start point
  • Series of states linked by transitions
  • Transitions represent input to be accounted for, or output to be generated
  • Legal exit-point(s) explicitly identified
example jurafsky martin figure 2 10

a

b

a

a

!

q0

q1

q2

q3

q4

ExampleJurafsky & Martin, Figure 2.10
  • Loop on q3 means that it can account for infinite length strings
  • “Deterministic” because in any state, its behaviour is fully predictable
non deterministic fsa jurafsky martin figure 2 18

2.19

a

b

a

a

!

ε

q0

q1

q2

q3

q4

Non-deterministic FSAJurafsky & Martin, Figure 2.18
  • At state q2 with input “a” there is a choice of transitions
  • We can also have “jump” arcs (or empty transitions), which also introduce non-determinism
an fsa to handle morphology

c

e

x

f

o

s

i

q6

q4

q5

q0

q1

q2

q3

q7

r

y

An FSA to handle morphology

Spot the deliberate mistake: overgeneration

finite state transducers
Finite State Transducers
  • A “transducer” defines a relationship (a mapping) between two things
  • Typically used for “two-level morphology”, but can be used for other things
  • Like an FSA, but each state transition stipulates a pair of symbols, and thus a mapping
finite state transducers21
Finite State Transducers
  • Three functions:
    • Recognizer (verification): takes a pair of strings and verifies if the FST is able to map them onto each other
    • Generator (synthesis): can generate a legal pair of strings
    • Translator (transduction): given one string, can generate the corresponding string
  • Mapping usually between levels of representation
    • spy+s : spies
    • Lexical:intermediate foxNPs : fox^s
    • Intermediate:surface fox^s : foxes
some conventions
Some conventions
  • Transitions are marked by “:”
  • A non-changing transition “x:x” can be shown simply as “x”
  • Wild-cards are shown as “@”
  • Empty string shown as “ε”
an example based on trost p 42
An examplebased on Trost p.42

#spy+s# : spies

#:ε

s

p

y:i

+:e

s

#:ε

#toy+s# : toys

#:ε

t

o

y

+:0

s

#:ε

#:ε

s

h

e

l

f:v

+:e

s

#:ε

#:ε

w

i

f:v

e

s

#:ε

using wild cards and loops

@

#:0

y:i

+:e

s

#:0

y

+:0

Using wild cards and loops

#:0

s

p

y:i

+:e

s

#:0

#:0

t

o

y

+:0

s

#:0

Can be collapsed into a single FST:

another example j m fig 3 9 p 74
Another example (J&M Fig. 3.9, p.74)

f o x

c a t

d o g

P:^ s #

N:ε

q4

q1

g o o s e

s h e e p

m o u s e

S:#

N:ε

q0

q2

q5

q7

S:#

g o:e o:e s e

s h e e p

m o:i u:εs:c e

N:ε

P:#

q3

q6

lexical:intermediate

slide26

f o x

c a t

d o g

q1

q0

o

s1

s2

f

x

a

c

t

q0

s3

s4

q1

d

g

o

s5

s6

slide27

0] f:f o:o x:x [1] N:ε [4] P:^ s:s #:# [7]

  • 0] f:f o:o x:x [1] N:ε [4] S:# [7]
  • 0] c:c a:a t:t [1] N:ε [4] P:^ s:s #:# [7]
  • 0] s:s h:h e:e p:p [2] N:ε [5] S:# [7]
  • 0] g:g o:e o:e s:s e:e [3] N:ε [5] P:# [7]

f o x N P s # : f o x ^ s #

f o x N S : f o x #

c a t N P s # : c a t ^ s #

s h e e p N S : s h e e p #

g o o s e N P : g e e s e #

f o x

c a t

d o g

P:^ s #

N:ε

q4

q1

g o o s e

s h e e p

m o u s e

S:#

N:ε

q0

q2

q5

q7

S:#

g o:e o:e s e

s h e e p

m o:i u:εs:c e

N:ε

P:#

q3

q6

lexical surface mapping j m fig 3 14 p 78

other

^: ε

#

other

q5

z, s, x

s

^:ε

z, s, x

^:ε

ε:e

s

q0

q1

q2

q3

q4

#, other

z, x

#

Lexical:surface mappingJ&M Fig. 3.14, p.78

f o x N P s # : f o x ^ s #

c a t N P s # : c a t ^ s #

ε  e / {x s z} ^ __ s #

slide29

[0] f:f [0] o:o [0] x:x [1] ^:ε [2] ε:e [3] s:s [4] #:# [0]

[0] c:c [0] a:a [0] t:t [0] ^:ε [0] s:s [0] #:# [0]

f o x ^ s # f o x e s #

c a t ^ s # : c a t ^ s #

other

^: ε

#

other

q5

z, s, x

s

^:ε

z, s, x

^:ε

ε:e

s

q0

q1

q2

q3

q4

#, other

z, x

#

slide30
FST
  • But you don’t have to draw all these FSTs
  • They map neatly onto rule formalisms
  • What is more, these can be generated automatically
  • Therefore, slightly different formalism
slide31

c

s1

d

s2

s0

f

s3

g

s4

FST compiler

http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fsinput.html

[d o g N P .x. d o g s ] |

[c a t N P .x. c a t s ] |

[f o x N P .x. f o x e s ] |

[g o o s e N P .x. g e e s e]

s0: c -> s1, d -> s2, f -> s3, g -> s4.

s1: a -> s5.

s2: o -> s6.

s3: o -> s7.

s4: <o:e> -> s8.

s5: t -> s9.

s6: g -> s9.

s7: x -> s10.

s8: <o:e> -> s11.

s9: <N:s> -> s12.

s10: <N:e> -> s13.

s11: s -> s14.

s12: <P:0> -> fs15.

s13: <P:s> -> fs15.

s14: e -> s16.

fs15: (no arcs)

s16: <N:0> -> s12.