1 / 52

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 31– Parser comparison)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 31– Parser comparison). Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th March, 2011. Parsers Comparison (Charniack, Collins, Stanford, RASP).

alesia
Download Presentation

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 31– Parser comparison)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 31– Parser comparison) Pushpak BhattacharyyaCSE Dept., IIT Bombay 28thMarch, 2011

  2. Parsers Comparison(Charniack, Collins, Stanford, RASP) Study by masters students: Avishek, Nikhilesh, Abhishek and Harshada

  3. Parser comparison: Handling ungrammatical sentences

  4. Charniak (ungrammatical 1) S • Here has is tagged as AUX NP VP NNP AUX VP Joe has VBG NP reading DT NN the book Joe has reading the book

  5. Charniak (ungrammatical 2) S • Win is treated as a verb and it does not make any difference whether it is in the present or the past tense NP VP DT NN AUX ADJP The book was VB PP win IN S by NNP The book was win by Joe Joe

  6. Collins (ungrammatical 1) • Has should have been AUX.

  7. Collins (ungrammatical 2) • Same as charniack

  8. Stanford (ungrammatical 1) • hasis treated as VBZ and not AUX.

  9. Stanford (ungrammatical 2) • Same as Charniak

  10. RASP (ungrammatical 1) • Inaccurate tree

  11. Observation • For the sentence ‘Joe has reading the book’ Charniak performs the best; it is able to predict that the word ‘has’ in the sentence should actually be an AUX • Though both the RASP and Collins can produce a parse tree, they both cannot predict that the sentence is not grammatically correct • Stanford performs the worst, it inserts extra ‘S’ nodes into the parse tree.

  12. Observation (contd.) • For the sentence ‘The book was win by Joe’, all the parsers give the same parse structure which is correct.

  13. Ranking in case of multiple parses

  14. Charniak (Multiple Parses 1) S • The parse produced is semantically correct NP VP SBAR NNP VBD S John said NP VP VB VBD NP PP Marry sang DT NN IN NP the song with NNP MaX John said Marry sang the song with Max

  15. Charniak (Multiple Parses 2) S • PP is attached to NP which is one of the correct meanings NP VP PRP VBD NP I saw NP PP DT NN IN NP a boy with NN telescope I saw a boy with telescope

  16. Collins (Multiple Parses 1) • Same as Charniak.

  17. Collins (Multiple Parses 2) • Same as Charniak

  18. Stanford (Multiple Parses 1) • PP is attached to VP which is one of the correct meanings possible

  19. Stanford (Multiple Parses 2) • Same as Charniak.

  20. RASP (Multiple Parses 1) • PP is attached to VP.

  21. RASP (Multiple Parses 2) • The change in the pos tags as compared to charniak is due to the different corpora but the parse trees are comparable.

  22. Observation • All of them create one of the correct parses whenever multiple parses are possible. • All of them produce multiple parse trees and the best is displayed based on the type of the parser • CharniakProbablisticLexicalised Bottom-Up Chart Parser • Collins Head-driven statistical Beam Search Parser • Stanford Probalistic A* Parser • RASP Probablistic GLR Parser

  23. Time taken • 54 instances of the sentence ‘This is just to check the time’ is used to check the time • Time taken • Collins : 40s • Stanford : 14s • Charniak : 8s • RASP : 5s

  24. Embedding Handling

  25. Charniak (Embedding 1) A S VBD PP NP NP NP NP NP SBAR VP spilled IN NP SBAR DT NN WHNP S VBD on DT NN IN S The cat WDT VP escaped the floor that VP that VBD NP AUX ADJP killed NP SBAR was slippery DT NN WHNP S the rat WDT VP that VBD NP stole NP SBAR The cat that killed the rat that stole the milk that spilled on the floor that was slippery escaped. DT NN WHNP S that WDT VP A

  26. Charniak (Embedding 2)

  27. Collins (Embedding 1)

  28. Collins (Embedding 2)

  29. Stanford (Embedding 1)

  30. Stanford (Embedding 2)

  31. RASP (Embedding 1)

  32. RASP (Embedding 2)

  33. Observation • For the sentence ‘The cat that killed the rat that stole the milk that spilled on the floor that was slippery escaped.’ all the parsers give the correct results. • For the sentence ‘John the president of USA which is the most powerful country likes jokes’: RASP , Charniak and Collins give correct parse, i.e., it attaches the verb phrase ‘likes jokes’ to the top NP ‘John’ . • Stanford produces incorrect parse tree; attaches the VP ‘likes’ to wrong NP ‘the president of …’

  34. Handling multiple POS tags

  35. Charniak (multiple pos 1) S S NP VP VP Fire him immediately NNP VBZ PP VB NP ADVP Time flies IN NP Fire PRP RB like DT NN him immediately an arrow Time flies like an arrow

  36. Charniak (multiple pos 2) S NP VP NNP NN PP Dont toy IN NP with DT NN Don’t toy with the pen the pen

  37. Collins (multiple pos 1)

  38. Collins (multiple pos 2)

  39. Stanford (multiple pos 1)

  40. Stanford (multiple pos 2)

  41. RASP (multiple pos 1)

  42. RASP (multiple pos 2)

  43. Observation • All but RASP give comparable pos tags. In the sentence ‘Time flies like an arrrow’ RASP give flies as noun. • In sentence ‘Don’t toy with the pen’, all parsers are tagging ‘toy’ as noun.

  44. Repeated Word handling

  45. Charniak S NP VP NNP VBZ SBAR Buffalo buffaloes S NP VP NNP VBZ SBAR Buffalo buffaloes S Buffalo buffaloes Buffalo buffaloes buffalo buffalo Buffalo buffaloes NP VP NN NNP NNP VBZ buffalo buffalo Buffalo buffaloes

  46. Collins

  47. Stanford

  48. RASP

  49. Observation • Collins and Charniak come close to producing the correct parse. • RASP tags all the words as nouns.

  50. Long sentences • Given a sentence of 394 words, only RASP was able to parse.

More Related