1 / 17

Training dependency parsers by jointly optimizing multiple objectives

Training dependency parsers by jointly optimizing multiple objectives. Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard. Evaluation. Intrinsic How well does system replicate gold annotations? Precision/recall/F1, accuracy, BLEU, ROUGE, etc. Extrinsic

jamar
Download Presentation

Training dependency parsers by jointly optimizing multiple objectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training dependency parsers by jointly optimizing multiple objectives Keith HallRyanMcDonaldJason Katz-BrownMichaelRinggaard

  2. Evaluation • Intrinsic • How well does system replicate gold annotations? • Precision/recall/F1, accuracy, BLEU, ROUGE, etc. • Extrinsic • How useful is system for some downstream task? • High performance on one doesn’t necessarily mean high performance on the other • Can be hard to evaluate extrinsically

  3. Dependency Parsing • Given a sentence, label the dependencies • (from nltk.org) • Output is useful for downstream tasks like machine translation • Also of interest to NLP reaserchers

  4. Overview of paper • Optimize parser for two metrics • Intrinsic evalutation • Downstream task (here reranker in machine translation system) • Algorithm to do this • Experiments

  5. Perceptron Algorithm • Takes: set of labeled training examples; loss function • For each example, predicts output, updates model if the output is incorrect • Rewards features that fire in gold standard model • Penalizes those that fire in predicted output

  6. Augmented Loss Perceptron Algorithm • Similar to perceptron, except takes: multiple loss functions; multiple datasets (one for each loss function); scheduler to weight loss functions • Perceptron is an instance of ALP with one loss function, one dataset, and a trivial scheduler • Will look at ALP with 2 loss functions • Can use extrinsic evaluator as loss function

  7. Reranker loss function • Takes k-best output from parser • Assign cost to each parse • Take lowest cost parse to be “correct” parse • If 1-best parse is lowest cost do nothing • Otherwise update parameters based on correct parse • Standard loss function is instance of this in which the cost is always lowest for 1-best

  8. Experiment 1 • English to Japanese MT system, specifically word reordering step • Given a parse, reorder the English sentence into Japanese word order • Transition-based and graph-based dependency parsers • 17,260 manually annotated word reorderings • 10,930 training, 6,338 test • These are cheaper to produce than dependency parses

  9. Experiment 1 • 2nd loss function based off of METEOR • Score = 1 – (#chunks – 1)/(#unigrams matched – 1) • Cost = 1 – score • Unigrams matched are those in reference and hypothesis • Chunks are sets of unigrams that are adjacent in reference and hypothesis • Vary weights of primary and secondary loss

  10. Experiment 1 • As ratio of extrinsic loss : intrinsic loss increases, performance on reordering task improves • Transition based parser

  11. Experiment 2 • Semi-supervised adaptation: Penn Treebank (PTB) to Question Treebank (QTB) • PTB trained parser bombs on QTB • QTB trained parser does much better on QTB • Ask annotators a simple question about QTB sentences • What is the main verb? • ROOT usually attaches to main verb • Use answers and PTB to adapt to QTB

  12. Experiment 2 • Augmented loss data set: QTB data with ROOT attached to main verb • No other labels on QTB data • Loss function: 0 if ROOT dependency correct, 1 otherwise • Secondary loss function looks at k-best, chooses highest ranked parse with correct ROOT dependency

  13. Experiment 2 • Results for transition parser • Huge improvement with data that is very cheap to collect • Cheaper to get Turkers to annotate main verbs than grad students to manually parse sentences

  14. Experiment 3 • Improving accuracy on labeled and unlabeled dependency parsing (all intrinsic) • Use labeled attachment score as primary loss function • Secondary loss function weights lengths of incorrect and correct arcs • One version uses labeled arcs, the other unlabeled • Idea is to have model account for arc length • Parsers tend to do poorly on long dependencies (McDonald and Nivre, 2007)

  15. Experiment 3 • Weighted Arc Length Score (ALS) • Sum of lengths of all correct arcs divided by sum of lengths of all arcs • In unlabeled version only head (and dependency) need to match • In labeled version arc label must match too

  16. Experiment 3 • Results with transition parser • Small improvement likely due to fact that ALS is similar to LAS and UAS

  17. Conclusions • Possible to train tools for particular downstream tasks • Might not want to use the same parses for MT as for information extraction • Can leverage cheap(er) data to improve task performance • Japanese translations/word orderings for MT • Main verb identification instead of dependency parses for domain adaptation • Not necessarily easy to define the task or a good extrinsic evaluation metric • MT to word reordering score • METEOR-based metric

More Related