1 / 32

LING 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics. Lecture Notes February 23rd. Homework Task 2. Part 1 Run the examples you showed on your slides from Homework Task 1 using the Bikel Collins parser. Evaluate how close the parses are to the “gold standard” Part 2

yuval
Download Presentation

LING 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 581: Advanced Computational Linguistics Lecture Notes February 23rd

  2. Homework Task 2 • Part 1 • Run the examples you showed on your slides from Homework Task 1 using the Bikel Collins parser. • Evaluate how close the parses are to the “gold standard” • Part 2 • WSJ corpus: sections 00 through 24 • Evaluation: on section 23 • Training: normally 02-21 (20 sections) • How does the Bikel Collins vary in accuracy if you randomly pick 1, 2, 3,…20 sections to do the training with… plot graph with evalb…

  3. Task 2: Picking sections for training • In directories • Use • cat *.mrg > wsj-XX.mrg • Merging sections • cat wsj-XX.mrgwsj-XY.mrg … > wsj-K.mrg (for K=2 to 20)

  4. Task 2: Picking sections for training

  5. Task 2: Section 23 • Sentences for parsing No , it was n't Black Monday . But while the New York Stock Exchange did n't fall apart Friday as the Dow Jones Industrial Average plunged 190.58 points -- most of it in the final hour -- it barely managed to stay this side of chaos . Some `` circuit breakers '' installed after the October 1987 crash failed their first test , traders say , unable to cool the selling panic in both stocks and futures . The 49 stock specialist firms on the Big Board floor -- the buyers and sellers of last resort who were criticized after the 1987 crash -- once again could n't handle the selling pressure . Big investment banks refused to step up to the plate to support the beleaguered floor traders by buying big blocks of stock , traders say . Heavy selling of Standard & Poor 's 500-stock index futures in Chicago relentlessly beat stocks downward . Seven Big Board stocks -- UAL , AMR , BankAmerica , Walt Disney , Capital Cities/ABC , Philip Morris and Pacific Telesis Group -- stopped trading and never resumed . The finger-pointing has already begun . `` The equity market was illiquid . Once again -LCB- the specialists -RCB- were not able to handle the imbalances on the floor of the New York Stock Exchange , '' said Christopher Pedersen , senior vice president at Twenty-First Securities Corp . • Bikel Collins input sentences ((No (RB)) (, (,)) (it (PRP)) (was (VBD)) (n't (RB)) (Black (NNP)) (Monday (NNP)) (. (.)) ) ((But (CC)) (while (IN)) (the (DT)) (New (NNP)) (York (NNP)) (Stock (NNP)) (Exchange (NNP)) (did (VBD)) (n't (RB)) (fall (VB)) (apart (RB)) (Friday (NNP)) (as (IN)) (the (DT)) (Dow (NNP)) (Jones (NNP)) (Industrial (NNP)) (Average (NNP)) (plunged (VBD)) (190.58 (CD)) (points (NNS)) (-- (:)) (most (JJS)) (of (IN)) (it (PRP)) (in (IN)) (the (DT)) (final (JJ)) (hour (NN)) (-- (:)) (it (PRP)) (barely (RB)) (managed (VBD)) (to (TO)) (stay (VB)) (this (DT)) (side (NN)) (of (IN)) (chaos (NN)) (. (.)) ) ((Some (DT)) (`` (``)) (circuit (NN)) (breakers (NNS)) ('' ('')) (installed (VBN)) (after (IN)) (the (DT)) (October (NNP)) (1987 (CD)) (crash (NN)) (failed (VBD)) (their (PRP$)) (first (JJ)) (test (NN)) (, (,)) (traders (NNS)) (say (VBP)) (, (,)) (unable (JJ)) (to (TO)) (cool (VB)) (the (DT)) (selling (NN)) (panic (NN)) (in (IN)) (both (DT)) (stocks (NNS)) (and (CC)) (futures (NNS)) (. (.)) ) ((The (DT)) (49 (CD)) (stock (NN)) (specialist (NN)) (firms (NNS)) (on (IN)) (the (DT)) (Big (NNP)) (Board (NNP)) (floor (NN)) (-- (:)) (the (DT)) (buyers (NNS)) (and (CC)) (sellers (NNS)) (of (IN)) (last (JJ)) (resort (NN)) (who (WP)) (were (VBD)) (criticized (VBN)) (after (IN)) (the (DT)) (1987 (CD)) (crash (NN)) (-- (:)) (once (RB)) (again (RB)) (could (MD)) (n't (RB)) (handle (VB)) (the (DT)) (selling (NN)) (pressure (NN)) (. (.)) ) ((Big (JJ)) (investment (NN)) (banks (NNS)) (refused (VBD)) (to (TO)) (step (VB)) (up (IN)) (to (TO)) (the (DT)) (plate (NN)) (to (TO)) (support (VB)) (the (DT)) (beleaguered (JJ)) (floor (NN)) (traders (NNS)) (by (IN)) (buying (VBG)) (big (JJ)) (blocks (NNS)) (of (IN)) (stock (NN)) (, (,)) (traders (NNS)) (say (VBP)) (. (.)) ) ((Heavy (JJ)) (selling (NN)) (of (IN)) (Standard (NNP)) (& (CC)) (Poor (NNP)) ('s (POS)) (500-stock (JJ)) (index (NN)) (futures (NNS)) (in (IN)) (Chicago (NNP)) (relentlessly (RB)) (beat (VBD)) (stocks (NNS)) (downward (RB)) (. (.)) ) ((Seven (CD)) (Big (NNP)) (Board (NNP)) (stocks (NNS)) (-- (:)) (UAL (NNP)) (, (,)) (AMR (NNP)) (, (,)) (BankAmerica (NNP)) (, (,)) (Walt (NNP)) (Disney (NNP)) (, (,)) (Capital (NNP)) (Cities/ABC (NNP)) (, (,)) (Philip (NNP)) (Morris (NNP)) (and (CC)) (Pacific (NNP)) (Telesis (NNP)) (Group (NNP)) (-- (:)) (stopped (VBD)) (trading (VBG)) (and (CC)) (never (RB)) (resumed (VBD)) (. (.)) ) ((The (DT)) (finger-pointing (NN)) (has (VBZ)) (already (RB)) (begun (VBN)) (. (.)) ) ((`` (``)) (The (DT)) (equity (NN)) (market (NN)) (was (VBD)) (illiquid (JJ)) (. (.)) ) wsj-23.txt RAW wsj-23.lsp Bikel Collins (Lisp SEXPs) wsj-23.mrg Gold Standard parses

  6. Q: What do statistical parsers do? Research 2415/2416 sentences Bracketing Recall: 88.98 Precision: 84.88 F-measure: 86.88 Berkeley out of the box performance… 2416/2416 sentences Bracketing Recall: 88.45 Precision: 88.56 F-measure: 88.50 Bikel-Collins 2412/2416 sentences** Bracketing Recall: 84.92 Precision: 81.87 F-measure: 83.37 Stanford *using COLLINS.prm settings **after fix to allow EVALB to run to completion

  7. Research Often assumed that statistical models • are less brittle than symbolic models • parses for ungrammatical data • are they sensitive to noise or small perturbations?

  8. Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (mix) (drink) f(water)=117, f(milk)=21

  9. Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (high) logprob = -50.4 (low) logprob = -47.2 different PP attachment choices

  10. Robustness and Sensitivity First thoughts... • does milk forces low attachment? (high attachment for other nouns like water, toys, etc.) Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun

  11. Robustness and Sensitivity First thoughts... Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun • but just one sentence (#5212) with PP attachment for milk Could just one sentence out of 39,832 training examples affect the attachment options?

  12. sentences derived counts wsj-02-21.obj.gz parser parses Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain

  13. Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain ✕ delete the PP with 4% butterfat altogether

  14. Treebank sentences wsj-02-21.mrg training Derived counts wsj-02-21.obj.gz Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain the Bikel/Collins parser can be retrained in less time than it takes to make a cup of tea or bump it up to the VP level

  15. Robustness and Sensitivity • Result: • high attachment for previous PP adjunct to milk Could just one sentence out of 39,832 training examples affect the attachment options? YES • Why such extreme sensitivity to perturbation? • logprobs are conditioned on many things; hence, lots of probabilities to estimate • smoothing • need every piece of data, even low frequency ones

  16. Robustness and Sensitivity • (Bikel 2004): • “it may come as a surprise that the [parser] needs to access more than 219 million probabilities during the course of parsing the 1,917 sentences of Section 00 [of the PTB].''

  17. Robustness and Sensitivity • Trainer has a memory like a phone book:

  18. Robustness and Sensitivity • (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0) • modHeadWord(with IN) • headWord(milk NN) • modifier PP • previousMods(+START+) • previousWords((+START+ +START+)) • parent NP-A • head NPB • subcat() • verbIntervening false • side right • (mod ((+STOP+ +STOP+) (milk NN) +STOP+ (PP) ((with IN)) NP-A NPB () false right) 1.0) • modHeadWord (+STOP+ +STOP+) • headWord (milk NN) • modifier +STOP+ • previousMods (PP) • previousWords ((with IN)) • parent NP-A • head NPB • subcat () • verbIntervening false • side right • Frequency 1 observed data for: • (NP (NP (DT a)(NNmilk))(PP(IN with)(NP (ADJP (CD 4)(NN %))(NN butterfat))))

  19. Robustness and Sensitivity 76.8% singular events 94.2% 5 or fewer occurrences

  20. Robustness and Sensitivity • Full story more complicated than described here... • by picking different combinations of verbs and nouns, you can get a range of behaviors f(drank)=0 might as well have picked flubbed

  21. An experiment with PTB Passives • Documentation … • Passives in the PTB.pdf

  22. 0 training sections 2–21 23 evaluation 24 Penn Treebank and Passive Sentences • Wall Street Journal (WSJ) section of the Penn Treebank (PTB) • one million words from articles published in 1989 • All sentences • Total: nearly 50,000 (49,208) divided into 25 sections (0–24) • Training sections:39,832 • Test section: 2,416 • Passive Sentences • Total: (approx.) 6773 • Training sections: 5507 (14%) • Test section: 327 (14%) • Standard training/test set split

  23. Experiment • Experiment • Use Wow! as the 4th word of a sentence (as the only clue sentence is a passive sentence) • Remove standard English passive signaling • i.e. no passive be + -enmorphology • Example • By 1997 , almost all remaining uses of cancer-causing asbestos will be outlawed .

  24. Experiment • Experiment • Use Wow! as the 4th word of a sentence (as the only clue sentence is a passive sentence) • Remove standard English passive signaling • i.e. no passive be + -en morphology • Example (sentence 00–64) • By 1997 , almost all remaining uses of cancer-causing asbestos will outlaw . ⁁ Wow! Wow! POS tag WOW Strategy: attach Wow! at the same syntactic level as the preceding word or lexeme

  25. Results and Discussion • Original: The book was delivered . • Modified: 1The 2book 3delivered 4 . • Test: insert Wow! at positions 1–4 • compare the probability of the (best) parse in each case ❶ ❸ ❷ ❹

  26. Results and Discussion • Original: The book was delivered . • Modified: 1The 2book 3delivered 4 . • Test: insert Wow! at positions 1–4 • compare the probability of the (best) parse in each case Wow! at position better Parser using 4th word Wow! training data logprobscore

  27. Results and Discussion • Original: The book was delivered . • Modified: 1The 2book 3delivered 4 . • Test: insert Wow! at positions 1–4 • compare the probability of the (best) parse in each case Wow! at position better Parser using original training data logprobscore Wow! is an unknown word

  28. Results and Discussion • Original: A buffet breakfast was held in the art museum . • Modified: 1A2buffet 3breakfast 4 held 5 in 6 the 7 art 8 museum 9 . • Test: insert Wow! at positions 1–9 Wow! at position better Parser using 4th word Wow! training data logprobscore

  29. Results and Discussion • Original: A buffet breakfast was held in the art museum . • Modified: 1A2buffet 3breakfast 4 held 5 in 6 the 7 art 8 museum 9 . • Test: insert Wow! at positions 1–9 Wow! at position better Parser using original training data logprobscore

  30. Results and Discussion • On section 23 (test section) • 327 passive sentences • 2416 total OTD = original training data (i.e. no Wow!) Test data: -passwow1 = passives signaled using Wow!, -pass = passives signaled normally

  31. Results and Discussion • Comparison: • Wow! as object plus passive morphology • Wow! Inserted as NP object trace • Baseline (passive morphology) • Wow! 4th word

  32. Results and Discussion • Comparison: • Wow! as object plus passive morphology • Wow! Inserted as NP object trace • Baseline (passive morphology) • Wow! 4th word 87.0 88.0 89.0 Wow! as object + passive morphology Wow! as object Regular passive morphology

More Related