pos tagging theory and terminology
Download
Skip this Video
Download Presentation
PoS-Tagging theory and terminology

Loading in 2 Seconds...

play fullscreen
1 / 23

PoS-Tagging theory and terminology - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

School of Computing FACULTY OF ENGINEERING . PoS-Tagging theory and terminology. COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors). Reminder: PoS-tagging programs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' PoS-Tagging theory and terminology' - yvonne-lott


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pos tagging theory and terminology

School of Computing

FACULTY OF ENGINEERING

PoS-Tagging theory and terminology

COMP3310 Natural Language Processing

Eric Atwell, Language Research Group

(with thanks to Katja Markert, Marti Hearst, and other contributors)

reminder pos tagging programs
Reminder: PoS-tagging programs
  • Models behind some example PoS-tagging methods in NLTK:
  • Hand-coded
  • Statistical taggers
  • Brill (transformation-based) tagger
  • NB you don’t have to use NLTK – useful to illustrate
training and testing of machine learning algorithms
Training and Testing ofMachine Learning Algorithms
  • Algorithms that “learn” from data see a set of examples and try to generalize from them.
  • Training set:
    • Examples trained on
  • Test set:
    • Also called held-out data and unseen data
    • Use this for evaluating your algorithm
    • Must be separate from the training set
      • Otherwise, you cheated!
  • “Gold standard” evaluation set
    • A test set that a community has agreed on and uses as a common benchmark. DO NOT USE IN TRAINING OR TESTING
pos word classes in english
PoS word classes in English
  • Word classes, also called syntactic categories or grammatical categories or Parts of Speech
  • closed class type: classes with fixed and few members, function words e.g. prepositions;
  • open class type: large class of members, many new additions, content words e.g. nouns
  • 8 major word classes: nouns, verbs, adjectives, adverbs,
  • prepositions, determiners, conjunctions, pronouns
  • In English, also most (?all) Natural Languages
what properties define noun
What properties define “noun”?
  • Semantic properties: refer to people, places and things
  • Distributional properties: ability to occur next to determiners, possessives, adjectives (specific locations)
  • Morphological properties: most occur in singular and plural
  • These are properties of a word TYPE,
  • eg “man” is a noun (usually)
  • Sometimes a given TOKEN may not meet all these criteria …
  • The men are happy … the man is happy …
  • They man the lifeboat (?)
subcategories
Subcategories
  • Noun
  • Proper Noun v Common Noun
  • (Mass noun v Count Noun)
  • singular v plural
  • Count v mass (often not covered in PoS-tagsets)
  • Some tag-sets may have other subcategories,
  • Eg NNP = common noun with Word Initial Capital
  • (eg Englishman)
  • PoS-tagset Often encodes morphological categories like person, number, gender, tense, case . . .
verb action or process
Verb: action or process
  • VB present/infinitive teach, eat
  • VBZ 3rd-person-singular present (s-form) teaches, eats
  • VBG progressive (ing-form) teaching, eating
  • VBD/VBN past taught, ate/eaten
  • Intransitive he died, transitive she killed him, …
  • (transitivity usually not marked in PoS-tags)
  • Auxiliaries:Modal verb e.g. can, must, may
  • Have, be, do can be modal or ma verbs
  • e.g. I have a present v I have given you a present
adjective quality or property of a thing noun phrase
Adjective: quality or property (of a thing: noun phrase)
  • English is simple:
  • JJ big, JJR comparative bigger, JJT superlative biggest
  • More features in other languages, eg
  • Agreement (number, gender) with noun
  • Before a noun v after “be”
adverb quality or property of verb or adjective or other functions
Adverb: quality or property of verb or adjective (or other functions…)
  • A hodge-podge (!)
  • General adverb often ends –ly slowly, happily (but NOT early)
  • Place adverb home, downhill
  • Time adverb now, tomorrow
  • Degree adverbs very, extremely, somewhat
function words
Function words
  • Preposition e.g. in of on for over with (to)
  • Determiner e.g. this that, article the a
  • Conjunction e.g. and or but because that
  • Pronoun e.g. personal pronouns
  • I we (1st person),
  • you (2nd person),
  • he she it they (3rd person)
  • Possessive pronouns my, your, our, their
  • WH-pronouns what who whoever
  • Others: negatives (not), interjections (oh), existential there, …
parts of multi word expressions
Parts of “multi word expressions”
  • Particle – like preposition but “part of” a phrasal verb
  • I looked up her address v I looked up her skirt
  • I looked her address up v *I looked her skirt up
  • Big problem for PoS-tagging: common, and ambiguous
  • Other multi-word idioms: ditto tags
bigram markov model tagger
Bigram Markov Model tagger
  • Naive Method
  • 1. Get all possible tag sequences of the sentence
  • 2. Compute the probability of each tag sequence given the
  • Sentence, using word-tag and tag-bigram probabilites
  • 3. Take the maximum probability
  • Problem: This method has exponential complexity!
  • Solution: Viterbi Algorithm (not discussed in this module)
n gram tagger
N-gram tagger
  • Uses the preceding N-1 predicted tags
  • Also uses the unigram estimate for the current word
example
Example
  • p(AT NN BEZ IN AT NN|The bear is on the move) =
  • p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . .
  • ×p(move|NN)p(NN|AT)
  • p(AT NN BEZ IN AT VB|The bear is on the move) =
  • p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . .
  • ×p(move|VB)p(VB|AT)
bigram tagger problems
Bigram tagger: problems
  • Unknown words in new input
  • Parameter estimation: need a tagged training text, what if this is different genre/dialect/language-type from new input?
  • Tokenization of training text and new input: contractions (isn’t), multi-word tokens (New York)
  • crude assumptions
  • very short distance dependencies
  • tags are not conditioned on previous words
  • Unintuitive
transformation based tagging
Transformation-based tagging
  • Markov model tagging: small range of regularities only
  • TB tagging first used by Brill, 1995
  • Encodes more complex interdependencies between words
  • and tags
  • by learning intuitive rules from a training corpus
  • exploits linguistic knowledge; rules can be tuned manually
transformation templates
Transformation Templates
  • Templates specify general, admissible transformations:
  • Change Tag1 to Tag2 if
  • The preceding (following) word is tagged Tag3
  • The word two before (after) is tagged Tag3
  • One of the two preceding (following) words is tagged Tag3
  • One of the three preceding (following) words is tagged Tag3
  • The preceding word is tagged Tag3
  • and the following word is tagged Tag4
  • The preceding (following) word is tagged Tag3
  • and the word two before (after) is tagged Tag4
machine learning algorithm
Machine Learning Algorithm
  • Learns rules from tagged training corpus by specialising in templates
  • 1. Assume you do not know the precise tagging sequence in your training corpus
  • 2. Tag each word in the training corpus with its most frequent tag, e.g. move => VB
  • 3. Consider all possible transformations and apply the one that
  • improves tagging most (greedy search) ,
  • e.g. Change VB to NN if the preceding word is tagged AT
  • 4. Retag whole corpus applying that rule
  • 5. Go back to 3 and repeat until no significant improvements are reached
  • 6. Output all the rules you learnt in order!
example 1 st cycle
Example: 1st cycle
  • First approximation: Initialise with most frequent tag (lexical information)
  • The/AT
  • bear/VB
  • is/BEZ
  • on/IN
  • the/AT
  • move/VB
  • to/TO
  • race/NN
  • there/RN
change vb to nn if previous tag is at
Change VB to NN if previous tag is AT
  • Try all possible transformations, choose the most useful one and apply it:
  • The/AT
  • bear/NN
  • is/BEZ
  • on/IN
  • the/AT
  • move/NN
  • to/TO
  • race/NN
  • there/RN
change nn to vb if previous tag is to
Change NN to VB if previous tag is TO
  • Try all possible transformations, choose the most useful one and apply it:
  • The/AT
  • bear/NN
  • is/BEZ
  • on/IN
  • the/AT
  • move/NN
  • to/TO
  • race/VB
  • there/RN
final set of learnt rules
Final set of learnt rules
  • Brill rules corresponding to syntagmatic patterns
  • 1. Change VB to NN if previous tag is AT
  • 2. Change NN to VB if previous tag is TO
  • Can now be applied to an untagged corpus!
  • uses pre-encoded linguistic knowledge explicitly
  • uses wider context + following context
  • can be expanded to word-driven templates
  • can be expanded to morphology-driven templates (for unknown words)
  • learnt rules are intuitive, easy to understand
combining taggers
Combining taggers
  • Can be combined via backoff: if first tagger finds no tag (None) then try another tagger
  • This really only makes sense with N-gram taggers:
  • If trigram tagger finds no tag, backoff to bigram tagger,
  • if bigram tagger fails then backoff to unigram tagger
  • Better: combine tagger results by a voting system
  • Combinatory Hybrid Elementary Analysis of Text
  • (combines results of morphological analysers / taggers)
ad