1 / 36

Parsing: computing the grammatical structure of English sentences

School of Computing FACULTY OF ENGINEERING . Parsing: computing the grammatical structure of English sentences. COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors).

anisa
Download Presentation

Parsing: computing the grammatical structure of English sentences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. School of Computing FACULTY OF ENGINEERING Parsing: computing the grammatical structure of English sentences COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors)

  2. Reminder: Outline for Grammar/Parsing • Context-Free Grammars and Constituency • Some common CFG phenomena for English • Sentence-level constructions • NP, PP, VP • Coordination • Subcategorization • Top-down and Bottom-up Parsing • Chart Parsing

  3. S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left Alternatively… S -> NP VP NP -> A N VP -> V A -> a N -> flight V -> left CFG example

  4. Derivations • A derivation is a sequence of rules applied to a string that accounts for that string • Covers all the elements in the string • Covers only the elements in the string

  5. S NP VP NP Nom Pro Verb Det Noun Noun I prefer a morning flight Bracketed Notation • [S [NP [PRO I]] [VP [V prefer] [NP [Det a] [Nom [N morning] [N flight] ] ] ] ]

  6. CFGs: a summary • CFGs appear to be just about what we need to account for a lot of basic syntactic structure in English. • But there are problems • That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • There are simpler, more elegant, solutions that take us out of the CFG framework (beyond its formal power). Syntactic theories: TG, HPSG, LFG, GPSG, etc.

  7. Other syntactic stuff • Grammatical relations or functions • Subject • I booked a flight to New York • The flight was booked by my agent • Object • I booked a flight to New York • Complement • I said that I wanted to leave

  8. Dependency parsing • Word to word links instead of constituency • Based on the European rather than American traditions • But dates back to ancient Greek and Arab scholars • Eg see Quranic Arabic Corpus • The original notions of Subject, Object and the progenitor of subcategorization (called ‘valence’) came out of Dependency theory. • Dependency parsing is quite popular as a computational model since relationships between words are quite useful

  9. S submitted VP NP agent nsubjpass auxpass VP VBD PP NP VBN PP Brownback Bills were IN NP prep_on nn NP IN NNS NN CC NNS ports NNP NNP Senator conj_and Bills on ports and immigration were submitted by Senator Brownback immigration Dependency parsing Parse tree: Nesting of multi-word constituents Typed dep parse: Grammatical relations between individual words

  10. Why are dependency parses useful? • Example: multi-document summarization Need to identify sentences from different documents that each say roughly the same thing: phrase structure trees of paraphrasing sentences which differ in word order can be significantly different but dependency representations will be very similar

  11. Parsing • Parsing: assigning correct trees to input strings • Correct tree: a tree that covers all and only the elements of the input and has an S at the top • For now: enumerate all possible trees • A further task: disambiguation: means choosing the correct tree from among all the possible trees.

  12. Treebanks • Parsed corpora in the form of trees • The Penn Treebank • The Brown corpus • The WSJ corpus • Tgrep http://www.ldc.upenn.edu/ldc/online/treebank/ • Tregex http://www-nlp.stanford.edu/nlp/javadoc/javanlp/

  13. Parsing involves search • As with everything of interest, parsing involves a search which involves the making of choices • We’ll start with some basic (meaning bad) methods before moving on to the one or two that you need to know

  14. For Now • Assume… • You have all the words already in some buffer • The input isn’t pos tagged • We won’t worry about morphological analysis • All the words are known

  15. Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words.

  16. S S S VP NP VP Aux NP VP S S S S S S Aux Aux NP VP NP VP NP VP NP VP VP VP NP Top Down Space

  17. Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there.

  18. Book that flight Book that flight Book that flight Book that flight Book that flight Book that flight Book that flight Bottom-Up Space

  19. Control • Of course, in both cases we left out how to keep track of the search space and how to make choices • Which node to try to expand next • Which grammar rule to use to expand a node

  20. Top-Down, Depth-First, Left-to-Right Search

  21. Example

  22. [flight] [flight] Example

  23. flight flight Example

  24. Top-Down and Bottom-Up • Top-down • Only searches for trees that can be answers (i.e. S’s) • But also suggests trees that are not consistent with the words • Bottom-up • Only forms trees consistent with the words • Suggest trees that make no sense globally

  25. So Combine Them • There are a million ways to combine top-down expectations with bottom-up data to get more efficient searches • Most use one kind as the control and the other as a filter • As in top-down parsing with bottom-up filtering

  26. Adding Bottom-Up Filtering

  27. 3 problems with TDDFLtR Parser • Left-Recursion • Ambiguity • Inefficient reparsing of subtrees

  28. Left-Recursion • What happens in the following situation • S -> NP VP • S -> Aux NP VP • NP -> NP PP • NP -> Det Nominal • … • With the sentence starting with • Did the flight…

  29. Ambiguity “One morning I shot an elephant in my pajamas. How he got into my pajamas I don’t know.” Groucho Marx

  30. Lots of ambiguity • VP -> VP PP • NP -> NP PP Show me the meal on flight 286 from SF to Denver • 14 parses!

  31. Lots of ambiguity • Church and Patil (1982) • Number of parses for such sentences grows at rate of number of parenthesizations of arithmetic expressions • Which grow with Catalan numbers PPs Parses 1 2 2 5 3 14 4 132 5 469 6 1430

  32. Chart Parser:Avoiding Repeated Work • Parsing is hard, and slow. It’s wasteful to redo stuff over and over and over. • A CHART PARSER maintains a CHART – a table of partial parses found so far, to “re-use” if required. • Consider an attempt to top-down parse the following as an NP: A flight from Indianapolis to Houston on TWA

  33. flight flight

  34. flight flight

  35. flight flight

  36. Grammars and Parsing • Context-Free Grammars and Constituency • Some common CFG phenomena for English • Basic parsers: Top-down and Bottom-up Parsing • Chart Parser – keep a CHART of partial parses

More Related