1 / 56

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics. Lecture 7 Giuseppe Carenini. Today 29/9. Finish POS tagging Start Syntax / Parsing (Chp 12!). Tagger. PoS Tagging. Input text. Brainpower, not physical plant, is now a firm's chief asset. ………….

galdamez
Download Presentation

CPSC 503 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 503Computational Linguistics Lecture 7 Giuseppe Carenini CPSC503 Winter 2008

  2. Today 29/9 • Finish POS tagging • Start Syntax / Parsing (Chp 12!) CPSC503 Winter 2008

  3. Tagger PoS Tagging Input text • Brainpower, not physical plant, is now a firm's chief asset. ………… • Brainpower_NN ,_, not_RB physical_JJ plant_NN ,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN ._. ………. Dictionary wordi -> set of tags from Tagset Output CPSC503 Winter 2008

  4. Tagger Types • Rule-based ‘95 • Stochastic • HMM tagger ~ >= ’92 • Transformation-based tagger (Brill) ~ >= ’95 • Maximum Entropy Models ~ >= ’97 CPSC503 Winter 2008

  5. Sample Constraint Step 1: sample I/O Example: Adverbial “that” ruleGiven input: “that”If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADV “Pavlov had show that salivation….” PavlovN SG PROPER hadHAVE V PAST SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO …… thatADV PRON DEM SG CS …….. ……. Rule-Based (ENGTWOL ’95-99) • A lexicon transducer returns for each word all possible morphological parses • A set of ~3,000 constraints is applied to rule out inappropriate PoS CPSC503 Winter 2008

  6. HMM Stochastic Tagging • Tags corresponds to an HMMstates • Words correspond to the HMM alphabet symbols Tagging: given a sequence of words (observations), find the most likely sequence of tags (states) But this is…..! We need: State transition and symbol emission probabilities 1) From hand-tagged corpus 2) No tagged corpus: parameter estimation (Baum-Welch) CPSC503 Winter 2008

  7. Evaluating Taggers • Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!* • Human Celing: agreement rate of humans on classification (96-7%) • Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%) • What is causing the errors? Build a confusion matrix… CPSC503 Winter 2008

  8. Confusion matrix • Look at a confusion matrix • Precision ? • Recall ? Speech and Language Processing - Jurafsky and Martin

  9. Error Analysis (textbook) • Look at a confusion matrix • See what errors are causing problems • Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) • Preterite (VBD) vs Participle (VBN) vs Adjective (JJ) Speech and Language Processing - Jurafsky and Martin

  10. Today 29/9 • Finish POS tagging • English Syntax • Context-Free Grammar for English • Rules • Trees • Recursion • Problems • Start Parsing CPSC503 Winter 2008

  11. Knowledge-Formalisms Map(next three lectures) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Logical formalisms (First-Order Logics) Syntax Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Semantics Pragmatics Discourse and Dialogue AI planners CPSC503 Winter 2008

  12. Syntax Def. The study of how sentences are formed by grouping and ordering words Example: Ming and Sue prefer morning flights * Ming Sue flights morning and prefer Groups behave as single unit wrt Substitution, Movement, Coordination CPSC503 Winter 2008

  13. Syntactic Notions so far... • N-grams: prob. distr. for next word can be effectively approximated knowing previous n words • POS categories are based on: • distributional properties (what other words can occur nearby) • morphological properties (affixes they take) CPSC503 Winter 2008

  14. Syntax: Useful tasks • Why should you care? • Grammar checkers • Basis for semantic interpretation • Question answering • Information extraction • Summarization • Machine translation • …… CPSC503 Winter 2008

  15. Grammars and Constituency • Of course, there’s nothing easy or obvious about how we come up with right set of constituents and the rules that govern how they combine... • That’s why there are so many different theories of grammar and competing analyses of the same data. • The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar). From Speech and Language Processing - Jurafsky and Martin

  16. Key Constituents – with heads(English) (Specifier) X (Complement) • (Det) N (PP) • (Qual) V (NP) • (Deg) P (NP) • (Deg) A (PP) • (NP) (I) (VP) • Noun phrases • Verb phrases • Prepositional phrases • Adjective phrases • Sentences Some simplespecifiers Category Typical function Examples Determiner specifier of N the, a, this, no.. Qualifier specifier of V never, often.. Degree word specifier of A or P very, almost.. Complements? CPSC503 Winter 2008

  17. Key Constituents: Examples • (Det) N (PP) • the cat on the table • (Qual) V (NP) • never eat a cat • (Deg) P (NP) • almost in the net • (Deg) A (PP) • very happy about it • (NP) (I) (VP) • a mouse -- ate it • Noun phrases • Verb phrases • Prepositional phrases • Adjective phrases • Sentences CPSC503 Winter 2008

  18. Start-symbol Context Free Grammar (Example) • S -> NP VP • NP -> Det NOMINAL • NOMINAL -> Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left • Non-terminal • Terminal CPSC503 Winter 2008

  19. CFG more complex Example • Grammar with example phrases • Lexicon CPSC503 Winter 2008

  20. CFGs • Define a Formal Language (un/grammatical sentences) • Generative Formalism • Generate strings in the language • Reject strings not in the language • Impose structures (trees) on strings in the language CPSC503 Winter 2008

  21. CFG: Formal Definitions • 4-tuple (non-term., term., productions, start) • (N, , P, S) • P is a set of rules A; AN, (N)* • A derivation is the process of rewriting 1into  m (both strings in (N)*) by applying a sequence of rules: 1*  m • L G = W|w* and S * w CPSC503 Winter 2008

  22. Nominal Nominal flight Derivations as Trees Context Free? CPSC503 Winter 2008

  23. I prefer a morning flight Nominal Nominal Parser flight CFG Parsing • It is completely analogous to running a finite-state transducer with a tape • It’s just more powerful • Chpt. 13 CPSC503 Winter 2008

  24. Other Options • Regular languages (FSA) A xB or A x • Too weak (e.g., cannot deal with recursion in a general way – no center-embedding) • CFGs A  (also produce more understandable and “useful” structure) • Context-sensitive A ; ≠ • Can be computationally intractable • Turing equiv. ; ≠ • Too powerful / Computationally intractable CPSC503 Winter 2008

  25. Common Sentence-Types • Declaratives: A plane left S -> NP VP • Imperatives: Leave! S -> VP • Yes-No Questions: Did the plane leave? S -> Aux NP VP • WH Questions: Which flights serve breakfast? S -> WH NP VP When did the plane leave? S -> WH Aux NP VP CPSC503 Winter 2008

  26. NP: more details • NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom • e.g., all the other cheap cars NP -> Specifiers N Complements • Nom -> Nom PP (PP) (PP) • e.g., reservation on BA456 from NY to YVR Nom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening CPSC503 Winter 2008

  27. Conjunctive Constructions • S -> S and S • John went to NY and Mary followed him • NP -> NP and NP • John went to NY and Boston • VP -> VP and VP • John went to NY and visited MOMA • … • In fact the right rule for English is • X -> X and X CPSC503 Winter 2008

  28. Problems with CFGs • Agreement • Subcategorization CPSC503 Winter 2008

  29. Agreement • In English, • Determiners and nouns have to agree in number • Subjects and verbs have to agree in person and number • Many languages have agreement systems that are far more complex than this (e.g., gender). CPSC503 Winter 2008

  30. This dog Those dogs This dog eats You have it Those dogs eat *This dogs *Those dog *This dog eat *You has it *Those dogs eats Agreement CPSC503 Winter 2008

  31. S -> NP VP NP -> Det Nom VP -> V NP … SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP ->SgV NP … Possible CFG Solution OLD Grammar NEW Grammar Sg = singular Pl = plural CPSC503 Winter 2008

  32. CFG Solution for Agreement • It works and stays within the power of CFGs • But it doesn’t scale all that well (explosion in the number of rules) CPSC503 Winter 2008

  33. Subcategorization • Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments(see first table) • *John sneezed the book • *I prefer United has a flight • *Give with a flight CPSC503 Winter 2008

  34. Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: I prefer [to leave earlier]TO-VP • Told: I was told [United has a flight]S • … CPSC503 Winter 2008

  35. So? • So the various rules for VPs overgenerate. • They allow strings containing verbs and arguments that don’t go together • For example: • VP -> V NP therefore Sneezed the book • VP -> V S therefore go she will go there CPSC503 Winter 2008

  36. Possible CFG Solution OLD Grammar NEW Grammar • VP -> V • VP -> V NP • VP -> V NP PP • … • VP -> IntransV • VP -> TransV NP • VP -> TransPPto NP PPto • … • TransPPto -> hand,give,.. This solution has the same problem as the one for agreement CPSC503 Winter 2008

  37. CFG for NLP: summary • CFGs cover most syntactic structure in English. • But there are problems (overgeneration) • That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • There are simpler, more elegant, solutions that take us out of the CFG framework: LFG, XTAGS… Chpt 15 “Features and Unification” CPSC503 Winter 2008

  38. Dependency Grammars • Syntactic structure: binary relations between words • Links: grammatical function or very general semantic relation • Abstract away from word-order variations (simpler grammars) • Useful features in many NLP applications (for classification, summarization and NLG) CPSC503 Winter 2008

  39. Today 29/9 • English Syntax • Context-Free Grammar for English • Rules • Trees • Recursion • Problems • Start Parsing CPSC503 Winter 2008

  40. Nominal Nominal flight Parsing with CFGs • Valid parse trees • Sequence of words Assign valid trees: covers all and only the elements of the input and has an S at the top I prefer a morning flight Parser CFG CPSC503 Winter 2008

  41. CFG Parsing as Search • Search space of possible parse trees • S -> NP VP • S -> Aux NP VP • NP -> Det Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left, arrive • Aux -> do, does Parsing: find all trees that cover all and only the words in the input • defines CPSC503 Winter 2008

  42. Nominal Nominal flight Constraints on Search • Sequence of words • Valid parse trees Search Strategies: • Top-down or goal-directed • Bottom-up or data-directed I prefer a morning flight Parser CFG (search space) CPSC503 Winter 2008

  43. Input: flight Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words. CPSC503 Winter 2008

  44. …….. • …….. • …….. Next step: Top Down Space • When POS categories are reached, reject trees whose leaves fail to match all words in the input CPSC503 Winter 2008

  45. flight flight flight Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there. CPSC503 Winter 2008

  46. …….. • …….. • …….. flight flight flight flight flight flight flight Two more steps: Bottom-Up Space CPSC503 Winter 2008

  47. Top-Down vs. Bottom-Up • Top-down • Only searches for trees that can be answers • But suggests trees that are not consistent with the words • Bottom-up • Only forms trees consistent with the words • Suggest trees that make no sense globally CPSC503 Winter 2008

  48. So Combine Them • Top-down: control strategy to generate trees • Bottom-up: to filter out inappropriate parses • Top-down Control strategy: • Depth vs. Breadth first • Which node to try to expand next • Which grammar rule to use to expand a node • (left-most) • (textual order) CPSC503 Winter 2008

  49. Top-Down, Depth-First, Left-to-Right Search Sample sentence: “Does this flight include a meal?” CPSC503 Winter 2008

  50. Example “Does this flight include a meal?” CPSC503 Winter 2008

More Related