comp 4060 natural language processing l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
COMP 4060 Natural Language Processing PowerPoint Presentation
Download Presentation
COMP 4060 Natural Language Processing

Loading in 2 Seconds...

play fullscreen
1 / 54

COMP 4060 Natural Language Processing - PowerPoint PPT Presentation


  • 244 Views
  • Uploaded on

COMP 4060 Natural Language Processing. Morphology, Word Classes, POS Tagging. Overview . Morphology Stemming Word Classes POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3). Morphology. Morphemes and Words. Morpheme = "minimal meaning-bearing unit in a language"

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'COMP 4060 Natural Language Processing' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
comp 4060 natural language processing

COMP 4060 Natural Language Processing

Morphology,

Word Classes,

POS Tagging

Morphology

overview
Overview
  • Morphology
  • Stemming
  • Word Classes
  • POS Tagging
  • (Jurafsky, 2nd edition, Ch. 2, 3, 5; Allen Ch. 2,3)

Morphology

morphology

Morphology

Morphology

morphemes and words
Morphemes and Words
  • Morpheme = "minimal meaning-bearing unit in a language"
  • Combine morphemes to create words
    • Inflection
      • combination of a word stem with a grammatical morpheme
      • same word class, e.g. clean (verb), clean-ing (verb)
    • Derivation
      • combination of a word stem with a grammatical morpheme
      • Yields different word class, e.g. clean (verb), clean-ing (noun)
    • Compounding
      • combination of multiple word stems
    • Cliticization
      • combination of a word stem with a clitic
      • different words from different syntactic categories, e.g. I’ve = I + have

Morphology

inflectional morphology
Inflectional Morphology

Inflectional Morphology

word stem + grammatical morpheme cat + s

only for nouns, verbs, and some adjectives

  • Nouns
    • plural:

regular: +s, +es irregular:mouse -mice;ox-oxen

rules for exceptions: e.g.-y -> -ies like: butterfly - butterflies

    • possessive: +'s, +'
  • Verbs
    • main verbs (sleep, eat, walk)
    • modal verbs (can, will, should)
    • primary verbs (be, have, do)

Morphology

inflectional morphology verbs
Inflectional Morphology (verbs)

Verb Inflections only for:

main verbs (sleep, eat, walk); primary verbs (be, have, do)

Morpholog. FormRegularly Inflected Form

  • stem walk merge try map
  • -s form walks merges tries maps
  • -ing participle walking merging trying mapping
  • past; -ed participle walked merged tried mapped

Morph. FormIrregularly Inflected Form

  • stem eat catch cut
  • -s form eats catches cuts
  • -ing participle eating catching cutting
  • -ed past atecaughtcut
  • -ed participle eaten caught cut

Morphology

inflectional and derivational morphology adjectives
Inflectional and Derivational Morphology (adjectives)

Adjective Inflections and Derivations:

  • prefix un- unhappy adjective, negation
  • suffix -ly happily adverb, mode

-er happier adjective, comparative 1

-est happiest adjective, comparative 2

  • suffix -ness happinessnoun

plus combinations, like unhappiest, unhappiness.

Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.

Morphology

noun derivation
Noun Derivation

Morphology

clitics

Clitics

Morphology

verb clitics
Verb Clitics

Morphology

stemming
Stemming
  • Stemming algorithms strip off word affixes
  • yield stem only, no additional information (like plural, 3rd person etc.)
  • used, e.g. in web search engines
  • famous stemming algorithm: the Porter stemmer

Morphology

stemming methods
Stemming Methods
  • Rule-based stemming
  • Example rules:
    • ATIONAL→ ATE

e.g., relational→ relate

    • ING→ 

if stem contains vowel, e.g., motoring→ motor

Morphology

tokenization word segmentation
Tokenization, Word Segmentation
  • Tokenization or word segmentation
  • separate out “words” (lexical entries) from running text
  • expand abbreviated terms
    • E.g. I’m into I am, it’s into it is
  • collect tokens forming single lexical entry
    • E.g. New York marked as one single entry

Morphology

tokenization word segmentation21
Tokenization, Word Segmentation
  • Finite state transducer (FST)
  • Modifies input string (rules)
  • Recognizes (stored) abbreviations and composite words
  • See Fig.3.22 in Jurafsky, Ch.3
  • More of an issue in languages like Chinese

Morphology

lemmatization
Lemmatization
  • Lemmatization maps words with same root but different surface appearances onto the same lexeme
  • e.g. buys, bought, buying -> buy

Morphology

word reccognition
Word Reccognition
  • Spelling Errors
  • Mark non-words based on dictionary/lexicon
  • Use “minimum editing distance”
    • Dynamic programming
    • Table-based
    • Transform operations
      • deletion, substitution, insertion
    • Calculate minimum path
  • Morphological Parser = FST

Morphology

morphological processing25
Morphological Processing
  • Knowledge
    • lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above)
    • rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs
    • orthographic rules: spelling, e.g. double consonant as in mapping
  • Processing: Finite State Transducers
    • take information above and analyze word token / generate word form

Morphology

slide27

Fig. 3.4 Simple FSA for adjective inflection.

Fig. 3.5 More detailed FSA for adjective inflection.

Morphology

slide29

Fig. 3.12 Lexical and intermediate tape of a FS Transducer

Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.

Morphology

word classes
Word Classes

Sort words into categories according to:

  • morphological properties

Which types of morphological forms do they take?

e.g. form plural: noun+s; 3rd person: verb+s

  • distributional properties

What other words or phrases can occur nearby?

e.g. possessive pronoun before noun

  • semantic coherence

Classify according to similar semantic type.

e.g. nouns refer to object-like entities

Morphology

open vs closed word classes
Open vs. Closed Word Classes

Open Class Types

The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download

Open Class Types:

nouns, verbs, adjectives, adverbs

Morphology

open vs closed word classes33
Open vs. Closed Word Classes

Closed Class Types

The set of words in these classes are very much determined and hardly ever change for one language.

Closed Class Types:

prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals

Morphology

open class words nouns
Open Class Words: Nouns

Nouns

denote objects, concepts, entities, events

Proper Nouns

Names for specific individual objects, entities

e.g. the Eiffel Tower, Dr. Kemke

Common Nouns

Names for categories, classes, abstracts, events

e.g. fruit, banana, table, freedom, sleep, race, ...

Count Nouns

enumerable entities, e.g. two bananas

Mass Nouns

not countable items, e.g. water, salt, freedom

Morphology

open class words verbs
Open Class Words: Verbs

Verbs

denote actions, processes, and states,e.g. smoke, dream, rest, run

several morphological forms,e.g.

non-3rd person - eat, sleep

3rd person - eats, sleeps,

progressive/ - eating,sleeping

present participle/

gerundive

past participle - eaten, slept

simple past - ate, slept

Morphology

open class words verbs 2
Open Class Words: Verbs (2)

non-3rd person eatI eat. We eat. They eat.

3rd personeats He eats. She eats. It eats.

progressive eating He is eating.

He will be eating.

He has been eating.

e.g. present participleHe is eating.

gerundiveEating scorpions [NP] is common in China.

use as adjectiveEating children [NP] are common at McDonalds.

past participleeaten He has eaten the scorpion.

The scorpion was eaten.

simple past ate He ate the scorpion.

Morphology

verb forms 1 the five verb forms
Verb Forms 1 - The five verb forms

Fig.2.6. The five verb forms. (Allen, 1995, p.28)

Morphology

verb forms 2 the basic tenses
Verb Forms 2 - The basic tenses

Fig.2.7. The basic tenses. (Allen, 1995, p.29)

Morphology

verb forms 3 the progressive tenses
Verb Forms 3 - The progressive tenses

Fig.2.8. The progressive tenses. (Allen, 1995, p.29)

Morphology

slide40

Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm

open class words adjectives
Open Class Words: Adjectives

Adjectives

denote qualities or properties of objects

e.g. heavy, blue, content

most languages have concepts for

colour - white, green, ...

age - young, old, ...

value - good, bad, ...

not all languages have adjectives as separate class

Morphology

open class words adverbs 1
Open Class Words: Adverbs 1

Adverbs

denote modifications of actions (verbs) or qualities (adjectives)

e.g. walk slowlyorheavily drunk

Directionalor Locational adverbs

specify direction or location

e.g. go home, stay here

Morphology

open class words adverbs 2
Open Class Words: Adverbs 2

Degree Adverbs

specify extent of process, action, property

e.g. extremely slow, very modest

Manner Adverbs

specify manner of action or process

e.g. walk slowly, run fast

Temporal Adverbs

specify time of event or action

e.g. yesterday, Monday

Morphology

closed word classes
Closed Word Classes

Closed Class Types:

Prepositions: on, under, over, at, from, to, with, ...

Determiners: a, an, the, ...

Pronouns: he, she, it, his, her, who, I, ...

Conjunctions: and, or, as, if, when, ...

Auxiliary verbs: can, may, should, are, …

Particles: up, down, on, off, in, out, …

Numerals:one, two, three, ..., first, second, ...

Morphology

closed word class prepositions
Closed Word Class: Prepositions

Prepositions

occur before noun phrases;

describe relations;

often spatial or temporal relations

e.g. on the table spatial

in two hours temporal

Morphology

closed word class pronouns
Closed Word Class: Pronouns

Pronouns

reference to entities, events, relations etc.

Personal Pronouns

refer to persons or entities,

e.g. you, he, it, ...

Possessive Pronouns

possession or relation between person and object,

e.g. his, her, my, its, ...

Wh-Pronouns

reference in question or back reference,

e.g. Who did this ..., Frieda, who is 80 years old ...

Morphology

closed word class conjunctions
Closed Word Class: Conjunctions

Conjunctions

join phrases or sentences; semantics is varied and complex

Coordinating Conjunction

Join two phrases or sentences on the same level through conjunctions like and, or, but, ...

e.g. He takes a cat and a dog.

He takes a dog and she takes a cat.

Subordinating Conjunction

Connect embedded phrases through e.g. that

e.g. He thinks that the cat is nicer than the dog.

Morphology

closed word class auxiliary verbs
Closed Word Class: Auxiliary Verbs

Auxiliary Verbs

Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult.

Tense

addition expressing present, past or future, ...

e.g. He will take the cat home.

Aspect

addition expressing completion of action

e.g. He is taking the cat home. (incomplete)

Mood

addition expressing necessityof action

e.g. He can take the cat home. (possible)

Morphology

closed word class copula modal verbs
Closed Word Class: Copula, Modal Verbs

Copula(be, do, have)andModal Verbs(can, should, ...) are subclasses of Auxiliary Verbs.

Describe state, process, or tense / modality of action.

Semantics: difficult (e.g. modal logic)

State / Process: be and do

e.g. He is at home. He does nothing.

Tense: have

e.g. He has taken the cat home.

Modality: can, ought to, should, must

e.g. He can take the cat home. (possibility)

Morphology

pos tagging tagsets
POS Tagging - Tagsets

Tagsets for English

  • Penn Treebank, 45 tags
  • Brown corpus, 87 tags
  • C5 tagset, 61 tags
  • C7 tagset, 146 tags

For references see Jurafsky, p.296

C5 and C7 tagsets are listed in Appendix C

Morphology

ambiguity in pos tagging
Ambiguity in POS Tagging

Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)

Morphology

pos tagging taggers
POS Tagging - Taggers

Methodsfor POS Tagging:

Rule-Based Tagging

use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun)

Stochastic Tagging

determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines).

Learn tagging rules

Problem in POS Tagging: Ambiguity

Problem in POS Tagging: Which tag set to use?

Morphology