1 / 15

Dependency Hashing for n-best CCG Parsing

Dependency Hashing for n-best CCG Parsing. Dominick Ng and James R. Curran Presented by Yun Huang. CCG derivation Dependency Evaluation All components of a dep. structure must match golden standard Prec./Recall/F-score. Background: CCG. Background: CCGbank.

holleb
Download Presentation

Dependency Hashing for n-best CCG Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependency Hashing for n-best CCG Parsing Dominick Ng and James R. Curran Presented by Yun Huang

  2. CCG derivation Dependency Evaluation All components of a dep. structure must match golden standard Prec./Recall/F-score Background: CCG

  3. Background: CCGbank • CCGbank was created by converting the phrase-structure trees in the PTB into normal-form CCG derivations. (99.44% covered)

  4. Background: C&C parser • Supertagger: assign possible lexical categories to word (eg. S\NP, (S\NP)/PP for swim) • Tag dictionary extracted from training data • Adaptive supertagging: β and k • C&C parser: log-linear model parser • POS tags and lexical categories as input. • CKY chart parsing • N-best reranking

  5. Ambiguity in n-best CCG parsing • Spurious ambiguity • Norm-form (usually right branching) • Absorption ambiguity • Diversity problem: n-best CCG derivations, but with duplicated dependencies

  6. Dependency Hashing (1) • Constraint: any n-best candidate must not have the same dependencies as any candidate already in the list. • Similar in SMT: remove duplicated strings • Delete which: later inserted? lower score?

  7. Dependency Hashing (2) • Implementation: • 32-bit hash value for each dependency • Bit-wise XOR to combine sub-derivations • Only hash value, no hash table • Collision: miss some useful dependencies

  8. Dependency Grammatical relation Diversity experiments

  9. Parsing Results • Oracle • Reranking upper bound • Reranking Gap

  10. Three types of error • Grammar error • Only a subset of CCGbank rules are used • Seen rule constraint • Supertagger error • Restricted categories by frequency cutoff • Probability threshold βand cutoff k • Model error • Suboptimal parse

  11. Grammar Error • Given gold-standard categories, the parser F-score is 99.49%, with 95.61% coverage • Grammar error accounts about 0.5% of overall parser errors, and 4.4% drop in coverage

  12. Supertagger and model error • Supertagger error : differ from oracle • Model error : differ from baseline

  13. More experiments • Tradeoff of speed and accuracy • Gold/automatic POS tags

  14. Conclusion • Dependency hashing for n-best CCG • Avoid derivations with same dependency • Increase diversity in n-best list • Comprehensive error analysis • Grammar error: 0.5% • Supertagger error: 5% • Model error: 7.5%

  15. Thank you Q & A

More Related