1 / 42

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture # 31: Dependency Parsing.

page
Download Presentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #31: Dependency Parsing Thanks to Joakim Nivre and Sandar Kuebler for many of the materials used in this lecture,with additions by Dan Roth.

  2. Announcements • Final Project • Three options: • Propose (possibly as a team) • No proposal – just decide • Project #4 • Project #5 • Proposals • Early: today • Due: Friday • Note: must discuss with me before submitting written proposal

  3. Objectives • Become acquainted with dependency parsing, in contrast to constituent parsing • See the relationship between the two approaches • Understand an algorithm for non-projective dependency parsing • Have a starting point to understand the rest of the dependency parsing literature • Think about uses of dependency parsing

  4. Your Questions

  5. Big Ideas fromMcDonald et al., 2006?

  6. Big Ideas fromMcDonald et al., 2006? • Dependency parsing • Non-projective vs. projective parse trees • Generalization to other languages • Labeled vs. unlabeled dependencies • Problem: Maximum Spanning Tree • Algorithm: Chu-Liu-Edmonds • Edge scores • Machine Learning: MIRA • Large Margin Learners • Online vs. Batch learning • Feature engineering

  7. Outline • Dependency Parsing: • Formalism • Dependency Parsing algorithms • Semantic Role Labeling • Dependency Formalism

  8. Formalization by Lucien Tesniere [Tesniere, 1959] • Idea known long before (e.g., Panini, India, >2000 yrs ago) • Studied extensively in the Prague School approach in syntax • (in US, research was focused more on constituent formalism)

  9. (or Constituent Structure)

  10. Constituent vs Dependency • There are advantages of dependency structures: • for free (or semi-free) order languages • easier to convert to predicate-argument structure • ... • But there are drawbacks too... • You can try to convert one representation into another • but, in general, these formalisms are not equivalent

  11. Dependency structures for NLP tasks • Most of approaches have been focused on constituent tree-based features • But now dependency parsing is in the spotlight: • Machine Translation (e.g., Menezes & Quirk, 07) • Summarization and sentence compression (e.g., Fillippova & Strube, 08) • Opinion mining, (e.g., Lerman et al, 08) • Information extraction, Question Answering (e.g., Bouma et al, 06)

  12. All these conditions will be violated for semantic dependency graphs we will consider later

  13. You can think of it as (related) planarity

  14. Algorithms • Global inference algorithms: • graph-based approaches • transition-based approaches • We will not consider • rule-based systems • constraint satisfaction

  15. Converting to Constituent Formalism Idea: • Convert dependency structures to constituent structures • easy for projective dependency structures • Apply algorithms for constituent parsing to them • E.g., CKY/ PCKY

  16. Converting to Constituent Formalism • Different independence assumption lead to different statistical models • both accuracy and parsing time (dynamic programming) varies

  17. Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent) • But still the score decomposes over edges in the graph • Strong independence assumption

  18. Online Learning:Structured Perceptron • Joint feature representation: • we will talk about it more later • Algorithm: Features over edges only Here we run MST or Eisner’s algorithm

  19. Parsing Algorithms • Here, when we say parsing algorithm (=derivation order) we often mean mapping: • Given a tree map it to a sequence of actions which create this tree • Tree T is equivalent to these sequence of actions: • d1, ..., dn • Therefore, P(T) = P(d1, ..., dn) • P(T) = P(d1, ..., dn) = P(d1) P(d2|d1)... P(dn|dn-1, ..., d1) • Ambigous: some times “parsing algorithms” refers to the decoding algorithm to find the most likely sequence You can use classifiers here and search for most likely sequence.

  20. Most algorithms are restricted to projective structures, but not all

  21. It can handle only projective structures

  22. How to learn in this case? • Your training examples are • -- collections of parsing contexts • Your want to predict correct actions • How to define feature representation of • You can think instead of () in terms of: • partial tree corresponding to them • current contents of queue (Q) and stack (S) • The most important features are top of S and front of Q (only between them you can potentially create links) • Inference: • Greedily • With beam search

  23. Results: Transition-based vs Graph-Based • CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score) • McDonald et al (MST): 80.27 • Nivre et al (Transitions): 80.19 • Results are the same • A lot of research in both directions, • e.g., Latent Variable Models for Transition Based Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall)

  24. Non-Projective Parsing • Graph-Based Algorithms (McDonald) • Post-Processing of Projective Algorithms (Hall and Novak, 05) • Transition-Based Algorithms which handle non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08) • Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

  25. Non-Projective Parsing • Graph-Based Algorithms (McDonald) • Post-Processing of Projective Algorithms (Hall and Novak, 05) • Transition-Based Algorithms which handle non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08) • Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

  26. Next • Document Clustering • Unsupervised learning • Expectation Maximization (EM) • Machine Translation! • Word alignment • Phrase alignment • Semantics • Co-reference

More Related