1 / 30

Parsing context-free grammars

Parsing context-free grammars. Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given context-free grammar. We will review a top-down parsing algorithm a bottom-up parsing algorithm We will present the Earley algorithm.

theta
Download Presentation

Parsing context-free grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing context-free grammars • Context-free grammars specify structure, not process. • There are many different ways to parse input in accordance with a given context-free grammar. • We will review • a top-down parsing algorithm • a bottom-up parsing algorithm • We will present the Earley algorithm

  2. S  NP VP S  Aux NP VP S  VP NP  Det Nominal NP  ProperNoun VP  Verb VP  Verb NP Det  that | this | a Noun  book | flight | meal | money Verb  book | include | prefer Aux  does Prep  from | to | on ProperNoun  Houston | TWA PP  P NP Nominal  Nominal PP Nominal  Noun Nominal  Noun Nominal A simple grammar Figure 10.2

  3. Bottom-up parsing • Yngve (1955) presented a bottom-up algorithm • Example (figure 10.4): Book that flight.

  4. Look up words in lexicon Book is ambiguous – there are two possible POS tags for the word “Book”. Noun Det Noun Verb Det Noun Book that flight Book that flight

  5. Build structure from bottom up NOM NOM NOM Noun Det Noun Verb Det Noun Book that flight Book that flight

  6. Build structure from bottom up Now we have three possible structures: NPNP NOM NOM VP NOM NOM Noun Det Noun Verb Det Noun Verb Det Noun Book that flight Book that flight Book that flight

  7. Build structure from bottom up The Noun interpretation of Book leads to a dead end, so only two parse trees survive: VP NP NP VP NOM NOM Verb Det Noun Verb Det Noun Book that flight Book that flight

  8. Build structure from bottom up There is way to combine a VP and an NP to form an S, so only one parse tree survives: S VP NP NOM Verb Det Noun Book that flight

  9. Build structure from top down When parsing top-down, we start with the grammar’s start symbol and apply productions to try to match input: S Book that flight

  10. Build structure from top down Here we show only the successful choices: S VP Book that flight

  11. Build structure from top down Here we show only the successful choices: S VP NP Verb Book that flight

  12. Build structure from top down Here we show only the successful choices: S VP NP Verb Book that flight

  13. Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Book that flight

  14. Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Book that flight

  15. Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Noun Book that flight

  16. Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Noun Book that flight

  17. Top-down advantages Doesn’t explore trees which cannot be S Subtrees fit under S Top-down disadvantages Many fruitless trees are explored: trees explored may have no hope of matching input Bottom-up advantages All trees explored are consistent with input Bottom-up disadvantages Builds structure even if S cannot be formed Builds neighboring structures which can never combine Top-down versus bottom-up approaches

  18. Approaches to dealing with ambiguity • parallel exploration • depth-first strategy with backtracking

  19. Improving top-down parsing • Make top-down parser pay attention to input with bottom-up filtering (left-corner parsing) • “The parser should not consider any grammar rule if he current input cannot serve as the first word along the left edge of some derivation from this rule.” [pg. 369] • Left corners are pre-compiled.

  20. Problems with top-down parsers • left-recursion X * X  * Infinite loop in derivation! • ambiguity not efficiently handled • recomputation subtrees can be built multiple times (built, then thrown away during backtracking)

  21. Earley’s algorithm • Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. • Dynamic programming involves storing of results so they don’t ever need to be recomputed. • Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N3), where N is length of input in words.

  22. Data structure • Earley’s algorithm uses a data structure called a chart to store information about the progress of the parse. • A chart contains an entry for each position in the input • A position occurs before the first word, between words, and after the last word.  word1  word2  …  wordN  • A position is represented by a number; positions in the input are numbered from 0 (at the left) to N (at the right).

  23. Chart details • A chart entry consists of a sequence of states. • A state represents • a subtree corresponding to a single grammar rule • information about how much of a rule has been processed • information about the span of the subtree w.r.t. the input • A state is represented by an annotated grammar rule • a dot () is used to show how much of the rule has been processed • a pair of positions, [x,y], indicates the span of the subtree w.r.t. the input; x is the position of the left edge of the subtree, and y is the position of the dot.

  24. Three operators on a chart • Predictor • applies when NonTerminal to right of  in a state is not a POS category (i.e. is not a pre-terminal) • adds states to current chart entry • Scanner • applies when NonTerminal to right of  in a state is a POS category (i.e. is a pre-terminal) • adds states to next chart entry • Completer • applies when there is no NonTermial (and hence no Terminal) to right of  in a state (i.e.  is at end) • adds states to current chart entry

  25. Predictor • Suppose rule to which Predicator applies is: X   NT  [x,y] • Predictor adds, to the current chart entry, a new state for each possible expansion of NT • For each expansion EX of NT, state added is NT   EX [y,y]

  26. Scanner • Suppose rule to which Scanner applies is: X   POS  [x,y] • Scanner adds, to the next chart entry, a new state for each possible expansion of POS • The new state added is X   POS  [x,y+1]

  27. Completer • Suppose rule to which Completer applies is: X  [x,y] • Completer adds, to the current chart entry, a new state for each possible reduction using the (now completed) state • For each state (from any earlier chart entry) of the form Y   X [w,x] a new state of the following form is added Y   X [w,y]

  28. Completer (modification) • In order to recover parse tree information from the chart once parsing is complete, we need to modify the completer slightly. • Each state in the chart must be given a unique identifier (N for state N) • Each time the completer adds a state, it also adds the unique identifier of the state completed to the list of previous states for that new state (which is a copy of an already existing state, waiting for the category which the current state just completed).

  29. Initial state of chart

  30. Example (from text) • (work through on board)

More Related