Pos tagging theory and terminology
Download
1 / 23

PoS-Tagging theory and terminology - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

School of Computing FACULTY OF ENGINEERING . PoS-Tagging theory and terminology. COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors). Reminder: PoS-tagging programs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' PoS-Tagging theory and terminology' - yvonne-lott


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Pos tagging theory and terminology

School of Computing

FACULTY OF ENGINEERING

PoS-Tagging theory and terminology

COMP3310 Natural Language Processing

Eric Atwell, Language Research Group

(with thanks to Katja Markert, Marti Hearst, and other contributors)


Reminder pos tagging programs
Reminder: PoS-tagging programs

  • Models behind some example PoS-tagging methods in NLTK:

  • Hand-coded

  • Statistical taggers

  • Brill (transformation-based) tagger

  • NB you don’t have to use NLTK – useful to illustrate


Training and testing of machine learning algorithms
Training and Testing ofMachine Learning Algorithms

  • Algorithms that “learn” from data see a set of examples and try to generalize from them.

  • Training set:

    • Examples trained on

  • Test set:

    • Also called held-out data and unseen data

    • Use this for evaluating your algorithm

    • Must be separate from the training set

      • Otherwise, you cheated!

  • “Gold standard” evaluation set

    • A test set that a community has agreed on and uses as a common benchmark. DO NOT USE IN TRAINING OR TESTING


Pos word classes in english
PoS word classes in English

  • Word classes, also called syntactic categories or grammatical categories or Parts of Speech

  • closed class type: classes with fixed and few members, function words e.g. prepositions;

  • open class type: large class of members, many new additions, content words e.g. nouns

  • 8 major word classes: nouns, verbs, adjectives, adverbs,

  • prepositions, determiners, conjunctions, pronouns

  • In English, also most (?all) Natural Languages


What properties define noun
What properties define “noun”?

  • Semantic properties: refer to people, places and things

  • Distributional properties: ability to occur next to determiners, possessives, adjectives (specific locations)

  • Morphological properties: most occur in singular and plural

  • These are properties of a word TYPE,

  • eg “man” is a noun (usually)

  • Sometimes a given TOKEN may not meet all these criteria …

  • The men are happy … the man is happy …

  • They man the lifeboat (?)


Subcategories
Subcategories

  • Noun

  • Proper Noun v Common Noun

  • (Mass noun v Count Noun)

  • singular v plural

  • Count v mass (often not covered in PoS-tagsets)

  • Some tag-sets may have other subcategories,

  • Eg NNP = common noun with Word Initial Capital

  • (eg Englishman)

  • PoS-tagset Often encodes morphological categories like person, number, gender, tense, case . . .


Verb action or process
Verb: action or process

  • VB present/infinitive teach, eat

  • VBZ 3rd-person-singular present (s-form) teaches, eats

  • VBG progressive (ing-form) teaching, eating

  • VBD/VBN past taught, ate/eaten

  • Intransitive he died, transitive she killed him, …

  • (transitivity usually not marked in PoS-tags)

  • Auxiliaries:Modal verb e.g. can, must, may

  • Have, be, do can be modal or ma verbs

  • e.g. I have a present v I have given you a present


Adjective quality or property of a thing noun phrase
Adjective: quality or property (of a thing: noun phrase)

  • English is simple:

  • JJ big, JJR comparative bigger, JJT superlative biggest

  • More features in other languages, eg

  • Agreement (number, gender) with noun

  • Before a noun v after “be”


Adverb quality or property of verb or adjective or other functions
Adverb: quality or property of verb or adjective (or other functions…)

  • A hodge-podge (!)

  • General adverb often ends –ly slowly, happily (but NOT early)

  • Place adverb home, downhill

  • Time adverb now, tomorrow

  • Degree adverbs very, extremely, somewhat


Function words
Function words functions…)

  • Preposition e.g. in of on for over with (to)

  • Determiner e.g. this that, article the a

  • Conjunction e.g. and or but because that

  • Pronoun e.g. personal pronouns

  • I we (1st person),

  • you (2nd person),

  • he she it they (3rd person)

  • Possessive pronouns my, your, our, their

  • WH-pronouns what who whoever

  • Others: negatives (not), interjections (oh), existential there, …


Parts of multi word expressions
Parts of “multi word expressions” functions…)

  • Particle – like preposition but “part of” a phrasal verb

  • I looked up her address v I looked up her skirt

  • I looked her address up v *I looked her skirt up

  • Big problem for PoS-tagging: common, and ambiguous

  • Other multi-word idioms: ditto tags


Bigram markov model tagger
Bigram Markov Model tagger functions…)

  • Naive Method

  • 1. Get all possible tag sequences of the sentence

  • 2. Compute the probability of each tag sequence given the

  • Sentence, using word-tag and tag-bigram probabilites

  • 3. Take the maximum probability

  • Problem: This method has exponential complexity!

  • Solution: Viterbi Algorithm (not discussed in this module)


N gram tagger
N-gram tagger functions…)

  • Uses the preceding N-1 predicted tags

  • Also uses the unigram estimate for the current word


Example
Example functions…)

  • p(AT NN BEZ IN AT NN|The bear is on the move) =

  • p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . .

  • ×p(move|NN)p(NN|AT)

  • p(AT NN BEZ IN AT VB|The bear is on the move) =

  • p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . .

  • ×p(move|VB)p(VB|AT)


Bigram tagger problems
Bigram tagger: problems functions…)

  • Unknown words in new input

  • Parameter estimation: need a tagged training text, what if this is different genre/dialect/language-type from new input?

  • Tokenization of training text and new input: contractions (isn’t), multi-word tokens (New York)

  • crude assumptions

  • very short distance dependencies

  • tags are not conditioned on previous words

  • Unintuitive


Transformation based tagging
Transformation-based tagging functions…)

  • Markov model tagging: small range of regularities only

  • TB tagging first used by Brill, 1995

  • Encodes more complex interdependencies between words

  • and tags

  • by learning intuitive rules from a training corpus

  • exploits linguistic knowledge; rules can be tuned manually


Transformation templates
Transformation Templates functions…)

  • Templates specify general, admissible transformations:

  • Change Tag1 to Tag2 if

  • The preceding (following) word is tagged Tag3

  • The word two before (after) is tagged Tag3

  • One of the two preceding (following) words is tagged Tag3

  • One of the three preceding (following) words is tagged Tag3

  • The preceding word is tagged Tag3

  • and the following word is tagged Tag4

  • The preceding (following) word is tagged Tag3

  • and the word two before (after) is tagged Tag4


Machine learning algorithm
Machine Learning Algorithm functions…)

  • Learns rules from tagged training corpus by specialising in templates

  • 1. Assume you do not know the precise tagging sequence in your training corpus

  • 2. Tag each word in the training corpus with its most frequent tag, e.g. move => VB

  • 3. Consider all possible transformations and apply the one that

  • improves tagging most (greedy search) ,

  • e.g. Change VB to NN if the preceding word is tagged AT

  • 4. Retag whole corpus applying that rule

  • 5. Go back to 3 and repeat until no significant improvements are reached

  • 6. Output all the rules you learnt in order!


Example 1 st cycle
Example: 1 functions…)st cycle

  • First approximation: Initialise with most frequent tag (lexical information)

  • The/AT

  • bear/VB

  • is/BEZ

  • on/IN

  • the/AT

  • move/VB

  • to/TO

  • race/NN

  • there/RN


Change vb to nn if previous tag is at
Change VB to NN if previous tag is AT functions…)

  • Try all possible transformations, choose the most useful one and apply it:

  • The/AT

  • bear/NN

  • is/BEZ

  • on/IN

  • the/AT

  • move/NN

  • to/TO

  • race/NN

  • there/RN


Change nn to vb if previous tag is to
Change NN to VB if previous tag is TO functions…)

  • Try all possible transformations, choose the most useful one and apply it:

  • The/AT

  • bear/NN

  • is/BEZ

  • on/IN

  • the/AT

  • move/NN

  • to/TO

  • race/VB

  • there/RN


Final set of learnt rules
Final set of learnt rules functions…)

  • Brill rules corresponding to syntagmatic patterns

  • 1. Change VB to NN if previous tag is AT

  • 2. Change NN to VB if previous tag is TO

  • Can now be applied to an untagged corpus!

  • uses pre-encoded linguistic knowledge explicitly

  • uses wider context + following context

  • can be expanded to word-driven templates

  • can be expanded to morphology-driven templates (for unknown words)

  • learnt rules are intuitive, easy to understand


Combining taggers
Combining taggers functions…)

  • Can be combined via backoff: if first tagger finds no tag (None) then try another tagger

  • This really only makes sense with N-gram taggers:

  • If trigram tagger finds no tag, backoff to bigram tagger,

  • if bigram tagger fails then backoff to unigram tagger

  • Better: combine tagger results by a voting system

  • Combinatory Hybrid Elementary Analysis of Text

  • (combines results of morphological analysers / taggers)