COMP 4060 Natural Language Processing. Morphology, Word Classes, POS Tagging. Overview . Morphology Stemming Word Classes POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3). Morphology. Morphemes and Words. Morpheme = "minimal meaning-bearing unit in a language"
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
word stem + grammatical morpheme cat + s
only for nouns, verbs, and some adjectives
regular: +s, +es irregular:mouse -mice;ox-oxen
rules for exceptions: e.g.-y -> -ies like: butterfly - butterflies
Verb Inflections only for:
main verbs (sleep, eat, walk); primary verbs (be, have, do)
Morpholog. FormRegularly Inflected Form
Morph. FormIrregularly Inflected Form
Adjective Inflections and Derivations:
-er happier adjective, comparative 1
-est happiest adjective, comparative 2
plus combinations, like unhappiest, unhappiness.
Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.
e.g., relational→ relate
if stem contains vowel, e.g., motoring→ motor
Fig. 3.5 More detailed FSA for adjective inflection.
Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.
Sort words into categories according to:
Which types of morphological forms do they take?
e.g. form plural: noun+s; 3rd person: verb+s
What other words or phrases can occur nearby?
e.g. possessive pronoun before noun
Classify according to similar semantic type.
e.g. nouns refer to object-like entities
Open Class Types
The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download
Open Class Types:
nouns, verbs, adjectives, adverbs
Closed Class Types
The set of words in these classes are very much determined and hardly ever change for one language.
Closed Class Types:
prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals
denote objects, concepts, entities, events
Names for specific individual objects, entities
e.g. the Eiffel Tower, Dr. Kemke
Names for categories, classes, abstracts, events
e.g. fruit, banana, table, freedom, sleep, race, ...
enumerable entities, e.g. two bananas
not countable items, e.g. water, salt, freedom
denote actions, processes, and states,e.g. smoke, dream, rest, run
several morphological forms,e.g.
non-3rd person - eat, sleep
3rd person - eats, sleeps,
progressive/ - eating,sleeping
past participle - eaten, slept
simple past - ate, slept
non-3rd person eatI eat. We eat. They eat.
3rd personeats He eats. She eats. It eats.
progressive eating He is eating.
He will be eating.
He has been eating.
e.g. present participleHe is eating.
gerundiveEating scorpions [NP] is common in China.
use as adjectiveEating children [NP] are common at McDonalds.
past participleeaten He has eaten the scorpion.
The scorpion was eaten.
simple past ate He ate the scorpion.
Fig.2.6. The five verb forms. (Allen, 1995, p.28)
Fig.2.7. The basic tenses. (Allen, 1995, p.29)
Fig.2.8. The progressive tenses. (Allen, 1995, p.29)
Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm
denote qualities or properties of objects
e.g. heavy, blue, content
most languages have concepts for
colour - white, green, ...
age - young, old, ...
value - good, bad, ...
not all languages have adjectives as separate class
denote modifications of actions (verbs) or qualities (adjectives)
e.g. walk slowlyorheavily drunk
Directionalor Locational adverbs
specify direction or location
e.g. go home, stay here
specify extent of process, action, property
e.g. extremely slow, very modest
specify manner of action or process
e.g. walk slowly, run fast
specify time of event or action
e.g. yesterday, Monday
Closed Class Types:
Prepositions: on, under, over, at, from, to, with, ...
Determiners: a, an, the, ...
Pronouns: he, she, it, his, her, who, I, ...
Conjunctions: and, or, as, if, when, ...
Auxiliary verbs: can, may, should, are, …
Particles: up, down, on, off, in, out, …
Numerals:one, two, three, ..., first, second, ...
occur before noun phrases;
often spatial or temporal relations
e.g. on the table spatial
in two hours temporal
reference to entities, events, relations etc.
refer to persons or entities,
e.g. you, he, it, ...
possession or relation between person and object,
e.g. his, her, my, its, ...
reference in question or back reference,
e.g. Who did this ..., Frieda, who is 80 years old ...
join phrases or sentences; semantics is varied and complex
Join two phrases or sentences on the same level through conjunctions like and, or, but, ...
e.g. He takes a cat and a dog.
He takes a dog and she takes a cat.
Connect embedded phrases through e.g. that
e.g. He thinks that the cat is nicer than the dog.
Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult.
addition expressing present, past or future, ...
e.g. He will take the cat home.
addition expressing completion of action
e.g. He is taking the cat home. (incomplete)
addition expressing necessityof action
e.g. He can take the cat home. (possible)
Copula(be, do, have)andModal Verbs(can, should, ...) are subclasses of Auxiliary Verbs.
Describe state, process, or tense / modality of action.
Semantics: difficult (e.g. modal logic)
State / Process: be and do
e.g. He is at home. He does nothing.
e.g. He has taken the cat home.
Modality: can, ought to, should, must
e.g. He can take the cat home. (possibility)
Tagsets for English
For references see Jurafsky, p.296
C5 and C7 tagsets are listed in Appendix C
Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)
Methodsfor POS Tagging:
use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun)
determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines).
Learn tagging rules
Problem in POS Tagging: Ambiguity
Problem in POS Tagging: Which tag set to use?