comp 4060 natural language processing l.
Skip this Video
Loading SlideShow in 5 Seconds..
COMP 4060 Natural Language Processing PowerPoint Presentation
Download Presentation
COMP 4060 Natural Language Processing

Loading in 2 Seconds...

play fullscreen
1 / 54

COMP 4060 Natural Language Processing - PowerPoint PPT Presentation

  • Uploaded on

COMP 4060 Natural Language Processing. Morphology, Word Classes, POS Tagging. Overview . Morphology Stemming Word Classes POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3). Morphology. Morphemes and Words. Morpheme = "minimal meaning-bearing unit in a language"

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

COMP 4060 Natural Language Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
comp 4060 natural language processing

COMP 4060 Natural Language Processing


Word Classes,

POS Tagging


  • Morphology
  • Stemming
  • Word Classes
  • POS Tagging
  • (Jurafsky, 2nd edition, Ch. 2, 3, 5; Allen Ch. 2,3)





morphemes and words
Morphemes and Words
  • Morpheme = "minimal meaning-bearing unit in a language"
  • Combine morphemes to create words
    • Inflection
      • combination of a word stem with a grammatical morpheme
      • same word class, e.g. clean (verb), clean-ing (verb)
    • Derivation
      • combination of a word stem with a grammatical morpheme
      • Yields different word class, e.g. clean (verb), clean-ing (noun)
    • Compounding
      • combination of multiple word stems
    • Cliticization
      • combination of a word stem with a clitic
      • different words from different syntactic categories, e.g. I’ve = I + have


inflectional morphology
Inflectional Morphology

Inflectional Morphology

word stem + grammatical morpheme cat + s

only for nouns, verbs, and some adjectives

  • Nouns
    • plural:

regular: +s, +es irregular:mouse -mice;ox-oxen

rules for exceptions: e.g.-y -> -ies like: butterfly - butterflies

    • possessive: +'s, +'
  • Verbs
    • main verbs (sleep, eat, walk)
    • modal verbs (can, will, should)
    • primary verbs (be, have, do)


inflectional morphology verbs
Inflectional Morphology (verbs)

Verb Inflections only for:

main verbs (sleep, eat, walk); primary verbs (be, have, do)

Morpholog. FormRegularly Inflected Form

  • stem walk merge try map
  • -s form walks merges tries maps
  • -ing participle walking merging trying mapping
  • past; -ed participle walked merged tried mapped

Morph. FormIrregularly Inflected Form

  • stem eat catch cut
  • -s form eats catches cuts
  • -ing participle eating catching cutting
  • -ed past atecaughtcut
  • -ed participle eaten caught cut


inflectional and derivational morphology adjectives
Inflectional and Derivational Morphology (adjectives)

Adjective Inflections and Derivations:

  • prefix un- unhappy adjective, negation
  • suffix -ly happily adverb, mode

-er happier adjective, comparative 1

-est happiest adjective, comparative 2

  • suffix -ness happinessnoun

plus combinations, like unhappiest, unhappiness.

Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.


noun derivation
Noun Derivation





verb clitics
Verb Clitics


  • Stemming algorithms strip off word affixes
  • yield stem only, no additional information (like plural, 3rd person etc.)
  • used, e.g. in web search engines
  • famous stemming algorithm: the Porter stemmer


stemming methods
Stemming Methods
  • Rule-based stemming
  • Example rules:

e.g., relational→ relate

    • ING→ 

if stem contains vowel, e.g., motoring→ motor


tokenization word segmentation
Tokenization, Word Segmentation
  • Tokenization or word segmentation
  • separate out “words” (lexical entries) from running text
  • expand abbreviated terms
    • E.g. I’m into I am, it’s into it is
  • collect tokens forming single lexical entry
    • E.g. New York marked as one single entry


tokenization word segmentation21
Tokenization, Word Segmentation
  • Finite state transducer (FST)
  • Modifies input string (rules)
  • Recognizes (stored) abbreviations and composite words
  • See Fig.3.22 in Jurafsky, Ch.3
  • More of an issue in languages like Chinese


  • Lemmatization maps words with same root but different surface appearances onto the same lexeme
  • e.g. buys, bought, buying -> buy


word reccognition
Word Reccognition
  • Spelling Errors
  • Mark non-words based on dictionary/lexicon
  • Use “minimum editing distance”
    • Dynamic programming
    • Table-based
    • Transform operations
      • deletion, substitution, insertion
    • Calculate minimum path
  • Morphological Parser = FST


morphological processing25
Morphological Processing
  • Knowledge
    • lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above)
    • rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs
    • orthographic rules: spelling, e.g. double consonant as in mapping
  • Processing: Finite State Transducers
    • take information above and analyze word token / generate word form



Fig. 3.4 Simple FSA for adjective inflection.

Fig. 3.5 More detailed FSA for adjective inflection.



Fig. 3.12 Lexical and intermediate tape of a FS Transducer

Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.


word classes
Word Classes

Sort words into categories according to:

  • morphological properties

Which types of morphological forms do they take?

e.g. form plural: noun+s; 3rd person: verb+s

  • distributional properties

What other words or phrases can occur nearby?

e.g. possessive pronoun before noun

  • semantic coherence

Classify according to similar semantic type.

e.g. nouns refer to object-like entities


open vs closed word classes
Open vs. Closed Word Classes

Open Class Types

The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download

Open Class Types:

nouns, verbs, adjectives, adverbs


open vs closed word classes33
Open vs. Closed Word Classes

Closed Class Types

The set of words in these classes are very much determined and hardly ever change for one language.

Closed Class Types:

prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals


open class words nouns
Open Class Words: Nouns


denote objects, concepts, entities, events

Proper Nouns

Names for specific individual objects, entities

e.g. the Eiffel Tower, Dr. Kemke

Common Nouns

Names for categories, classes, abstracts, events

e.g. fruit, banana, table, freedom, sleep, race, ...

Count Nouns

enumerable entities, e.g. two bananas

Mass Nouns

not countable items, e.g. water, salt, freedom


open class words verbs
Open Class Words: Verbs


denote actions, processes, and states,e.g. smoke, dream, rest, run

several morphological forms,e.g.

non-3rd person - eat, sleep

3rd person - eats, sleeps,

progressive/ - eating,sleeping

present participle/


past participle - eaten, slept

simple past - ate, slept


open class words verbs 2
Open Class Words: Verbs (2)

non-3rd person eatI eat. We eat. They eat.

3rd personeats He eats. She eats. It eats.

progressive eating He is eating.

He will be eating.

He has been eating.

e.g. present participleHe is eating.

gerundiveEating scorpions [NP] is common in China.

use as adjectiveEating children [NP] are common at McDonalds.

past participleeaten He has eaten the scorpion.

The scorpion was eaten.

simple past ate He ate the scorpion.


verb forms 1 the five verb forms
Verb Forms 1 - The five verb forms

Fig.2.6. The five verb forms. (Allen, 1995, p.28)


verb forms 2 the basic tenses
Verb Forms 2 - The basic tenses

Fig.2.7. The basic tenses. (Allen, 1995, p.29)


verb forms 3 the progressive tenses
Verb Forms 3 - The progressive tenses

Fig.2.8. The progressive tenses. (Allen, 1995, p.29)



Verb Tense Chart. From:

open class words adjectives
Open Class Words: Adjectives


denote qualities or properties of objects

e.g. heavy, blue, content

most languages have concepts for

colour - white, green, ...

age - young, old, ...

value - good, bad, ...

not all languages have adjectives as separate class


open class words adverbs 1
Open Class Words: Adverbs 1


denote modifications of actions (verbs) or qualities (adjectives)

e.g. walk slowlyorheavily drunk

Directionalor Locational adverbs

specify direction or location

e.g. go home, stay here


open class words adverbs 2
Open Class Words: Adverbs 2

Degree Adverbs

specify extent of process, action, property

e.g. extremely slow, very modest

Manner Adverbs

specify manner of action or process

e.g. walk slowly, run fast

Temporal Adverbs

specify time of event or action

e.g. yesterday, Monday


closed word classes
Closed Word Classes

Closed Class Types:

Prepositions: on, under, over, at, from, to, with, ...

Determiners: a, an, the, ...

Pronouns: he, she, it, his, her, who, I, ...

Conjunctions: and, or, as, if, when, ...

Auxiliary verbs: can, may, should, are, …

Particles: up, down, on, off, in, out, …

Numerals:one, two, three, ..., first, second, ...


closed word class prepositions
Closed Word Class: Prepositions


occur before noun phrases;

describe relations;

often spatial or temporal relations

e.g. on the table spatial

in two hours temporal


closed word class pronouns
Closed Word Class: Pronouns


reference to entities, events, relations etc.

Personal Pronouns

refer to persons or entities,

e.g. you, he, it, ...

Possessive Pronouns

possession or relation between person and object,

e.g. his, her, my, its, ...


reference in question or back reference,

e.g. Who did this ..., Frieda, who is 80 years old ...


closed word class conjunctions
Closed Word Class: Conjunctions


join phrases or sentences; semantics is varied and complex

Coordinating Conjunction

Join two phrases or sentences on the same level through conjunctions like and, or, but, ...

e.g. He takes a cat and a dog.

He takes a dog and she takes a cat.

Subordinating Conjunction

Connect embedded phrases through e.g. that

e.g. He thinks that the cat is nicer than the dog.


closed word class auxiliary verbs
Closed Word Class: Auxiliary Verbs

Auxiliary Verbs

Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult.


addition expressing present, past or future, ...

e.g. He will take the cat home.


addition expressing completion of action

e.g. He is taking the cat home. (incomplete)


addition expressing necessityof action

e.g. He can take the cat home. (possible)


closed word class copula modal verbs
Closed Word Class: Copula, Modal Verbs

Copula(be, do, have)andModal Verbs(can, should, ...) are subclasses of Auxiliary Verbs.

Describe state, process, or tense / modality of action.

Semantics: difficult (e.g. modal logic)

State / Process: be and do

e.g. He is at home. He does nothing.

Tense: have

e.g. He has taken the cat home.

Modality: can, ought to, should, must

e.g. He can take the cat home. (possibility)


pos tagging tagsets
POS Tagging - Tagsets

Tagsets for English

  • Penn Treebank, 45 tags
  • Brown corpus, 87 tags
  • C5 tagset, 61 tags
  • C7 tagset, 146 tags

For references see Jurafsky, p.296

C5 and C7 tagsets are listed in Appendix C


ambiguity in pos tagging
Ambiguity in POS Tagging

Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)


pos tagging taggers
POS Tagging - Taggers

Methodsfor POS Tagging:

Rule-Based Tagging

use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun)

Stochastic Tagging

determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines).

Learn tagging rules

Problem in POS Tagging: Ambiguity

Problem in POS Tagging: Which tag set to use?