1 / 35

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics. Lecture 6 Giuseppe Carenini. Knowledge-Formalisms Map. State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models ). Morphology. Syntax. Logical formalisms (First-Order Logics).

dholland
Download Presentation

CPSC 503 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 503Computational Linguistics Lecture 6 Giuseppe Carenini CPSC503 Winter 2007

  2. Knowledge-Formalisms Map State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Semantics Pragmatics Discourse and Dialogue AI planners Markov Chains -> n-grams Markov Models Hidden Markov Models (HMM) MaxEntropy Markov Models (MEMM) CPSC503 Winter 2007

  3. Today 27/9 • Finish Markov Chains • Hidden Markov Models: • definition • the three key problems (only one in detail) • Part-of-speech tagging • What it is • Why we need it • How to do it CPSC503 Winter 2007

  4. Example of a Markov Chain .6 1 a p h .4 .4 .3 .6 1 .3 e t 1 i .4 .6 .4 Start Start CPSC503 Winter 2007

  5. Markov-Chain t i p a h e Formal description: 0 .3 0 .3 .4 0 t 1 Stochastic Transition matrix A .4 0 .6 0 0 0 i 0 0 1 0 0 0 p 0 0 .4 .6 0 0 a 0 0 0 0 0 1 h 1 0 0 0 0 0 e 2 Probability of initial states t .6 i .4 CPSC503 Winter 2007 Manning/Schütze, 2000: 318

  6. Markov Assumptions • Let X=(X1, .., Xt) be a sequence of random variables taking values in some finite set S={s1, …, sn}, the state space, the Markov properties are: • (a) Limited Horizon: For all t,P(Xt+1|X1, .., Xt)=P(X t+1 | Xt) • (b)Time Invariant: For all t, P(X t+1 |Xt)=P(X2 | X1) i.e., the dependency does not change over time. CPSC503 Winter 2007

  7. Markov-Chain Probability of a sequence of states X1 … XT Example: Similar to …….? CPSC503 Winter 2007 Manning/Schütze, 2000: 320

  8. Today 27/9 • Finish Markov Chains • Hidden Markov Models: • definition • the three key problems (only one in detail) • Part-of-speech tagging • What it is • Why we need it • How to do it CPSC503 Winter 2007

  9. HMMs (and MEMM) intro They are probabilistic sequence-classifier / sequence-lablers: assign a class/label to each unit in a sequence We have already seen a non-prob. version... Used extensively in NLP • Part of Speech Tagging • Partial parsing • Named entity recognition • Information Extraction CPSC503 Winter 2007 Manning/Schütze, 2000: 325

  10. Hidden Markov Model(State Emission) .6 1 a .1 .5 b b s4 .4 s3 .4 .5 a .5 i .7 .6 .3 .1 1 i b s1 s2 a .9 .4 .6 .4 Start Start CPSC503 Winter 2007

  11. Hidden Markov Model Formal Specification as five-tuple Set of States Output Alphabet Initial State Probabilities State Transition Probabilities Symbol Emission Probabilities CPSC503 Winter 2007

  12. Three fundamental questions for HMMs Decoding: Finding the probability of an observation • brute force or Forward/Backward-Algorithm Finding the best state sequence • Viterbi-Algorithm Training: findmodel parameters which best explain the observations CPSC503 Winter 2007 Manning/Schütze, 2000: 325

  13. Computing the probability of an observation sequence O= o1 ... oT e.g., P(b,i | sample HMM ) X = all sequences of T states CPSC503 Winter 2007

  14. Complexity Decoding Example s1, s1 = 0 ? s1, s2 = 1 * .1 * .6 * .3 ………. ………. s1, s4 = 1 * .5 * .6 * .7 s2, s4 = 0? ………. ………. CPSC503 Winter 2007 Manning/Schütze, 2000: 327

  15. 1. Initialization 2. Induction 3. Total Complexity The forward procedure CPSC503 Winter 2007

  16. Three fundamental questions for HMMs Decoding: Finding the probability of an observation • brute force or Forward Algorithm If interested in details of the next two questions, read (Sections 6.4 – 6.5) Finding the best state sequence • Viterbi-Algorithm Training: findmodel parameters which best explain the observations CPSC503 Winter 2007

  17. Today 27/9 • Finish Markov Chains • Hidden Markov Models: • definition • the three key problems (only one in detail) • Part-of-speech tagging • What it is • Why we need it • How to do it CPSC503 Winter 2007

  18. Parts of Speech Tagging • What is it? • Why do we need it? • Word classes (Tags) • Distribution • Tagsets • How to do it • Rule-based • Stochastic • Transformation-based CPSC503 Winter 2007

  19. Input • Brainpower, not physical plant, is now a firm's chief asset. Parts of Speech Tagging: What • Brainpower_NN ,_, not_RB physical_JJ plant_NN ,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN ._. Output Tag meanings • NNP (Proper N sing), RB (Adv), JJ (Adj), NN (N sing. or mass), VBZ (V 3sg pres), DT (Determiner), POS (Possessive ending), . (sentence-final punct) CPSC503 Winter 2007

  20. Parts of Speech Tagging: Why? • Part-of-speech (word class, morph. class, syntactic category) gives a significant amount of info about the word and its neighbors • As a basis for (Partial) Parsing • Information Retrieval • Word-sense disambiguation • Speech synthesis • Improve language models (Spelling/Speech) Useful in the following NLP tasks: CPSC503 Winter 2007

  21. Parts of Speech • Eight basic categories • Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction • These categories are based on: • morphological properties (affixes they take) • distributional properties (what other words can occur nearby) • e.g, green It is so… , both…, The… is • Not semantics! CPSC503 Winter 2007

  22. Parts of Speech Very short, frequent and important • Two kinds of category • Closed class (generally are function words) • Prepositions, articles, conjunctions, pronouns, determiners, aux, numerals • Open class • Nouns (proper/common; mass/count), verbs, adjectives, adverbs Objects, actions, events, properties • If you run across an unknown word….?? CPSC503 Winter 2007

  23. PoS Distribution (unfortunately very frequent) • Parts of speech follow a usual behavior in Language >2 PoS Words ~4k 1 PoS 2 PoS ~35k ~4k …but luckily different tags associated with a word are not equally likely CPSC503 Winter 2007

  24. Sets of Parts of Speech:Tagsets • Most commonly used: • 45-tag Penn Treebank, • 61-tag C5, • 146-tag C7 • The choice of tagset is based on the application (do you care about distinguishing between “to” as a prep and “to” as a infinitive marker?) • Accurate tagging can be done with even large tagsets CPSC503 Winter 2007

  25. Tagger PoS Tagging Input text • Brainpower, not physical plant, is now a firm's chief asset. ………… • Brainpower_NN ,_, not_RB physical_JJ plant_NN ,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN ._. ………. Dictionary wordi -> set of tags from Tagset Output CPSC503 Winter 2007

  26. Tagger Types • Rule-based ‘95 • Stochastic • HMM tagger ~ >= ’92 • Transformation-based tagger (Brill) ~ >= ’95 • Maximum Entropy Models ~ >= ’97 CPSC503 Winter 2007

  27. Sample Constraint Step 1: sample I/O Example: Adverbial “that” ruleGiven input: “that”If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADV “Pavlov had show that salivation….” PavlovN SG PROPER hadHAVE V PAST SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO …… thatADV PRON DEM SG CS …….. ……. Rule-Based (ENGTWOL ‘95) • A lexicon transducer returns for each word all possible morphological parses • A set of ~1,000 constraints is applied to rule out inappropriate PoS CPSC503 Winter 2007

  28. 1) From hand-tagged corpus HMM Stochastic Tagging • Tags corresponds to an HMMstates • Words correspond to the HMM alphabet symbols Tagging: given a sequence of words (observations), find the most likely sequence of tags (states) But this is…..! We need: State transition and symbol emission probabilities 2) No tagged corpus: parameter estimation (Baum-Welch) CPSC503 Winter 2007

  29. Transformation Based Learning(the Brill Tagger 95-97) Combines rule-based and stochastic approaches • Rules specify tags for words based on context • Rules are automatically induced from a pre-tagged training corpus CPSC503 Winter 2007

  30. VB NNP VBB VBP TO NN RB DT NN TBL: How TBL rules are applied Step 1: Assign each word the tag that is most likely given no contextual information. Race example: P(NN|race) = .98 P(VB|race) = .02 Step 2: Apply transformation rules that use the context that was just established. Race example: Change NN to VB when the previous tag is TO. Johanna is expected to race tomorrow. The race is already over. …. …. …. CPSC503 Winter 2007

  31. How TBL Rules are learned • Major stages (supervised!): • 0. Save hand-tagged corpus • Label every word with its most-likely tag. • Examine every possible transformation and select the one with the most improved tagging. • Retag the data according to this rule. • Repeat 2-3 until some stopping point is reached. Output: an ordered list of transformations CPSC503 Winter 2007

  32. The Universe of Possible Transformations? Change tag a to b if: Huge search space!... Try only transformation that are improving the tagging of at least one word CPSC503 Winter 2007

  33. Evaluating Taggers • Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!* • Human Celing: agreement rate of humans on classification (96-7%) • Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%) • What is causing the errors? Build a confusion matrix… CPSC503 Winter 2007

  34. Knowledge-Formalisms Map(next three lectures) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Logical formalisms (First-Order Logics) Syntax Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Semantics Pragmatics Discourse and Dialogue AI planners CPSC503 Winter 2007

  35. Next Time • Read Chapter 12 (syntax & Context Free Grammars) CPSC503 Winter 2007

More Related