Basic parsing with context free grammars
1 / 26

Basic Parsing with Context-Free Grammars - PowerPoint PPT Presentation

  • Uploaded on

Basic Parsing with Context-Free Grammars. Analyzing Linguistic Units. Morphological parsing: analyze words into morphemes and affixes rule-based, FSAs, FSTs Ngrams for Language Modeling POS Tagging Syntactic parsing: identify constituents and their relationships

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Basic Parsing with Context-Free Grammars' - manton

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Analyzing linguistic units
Analyzing Linguistic Units

  • Morphological parsing:

    • analyze words into morphemes and affixes

    • rule-based, FSAs, FSTs

  • Ngrams for Language Modeling

  • POS Tagging

  • Syntactic parsing:

    • identify constituents and their relationships

    • to see if a sentence is grammatical

    • to assign an abstract representation of meaning

Syntactic parsing
Syntactic Parsing

  • Declarative formalisms like CFGs, FSAs define the legal strings of a language -- but only tell you ‘this is a legal string of the language X’

  • Parsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

  • Parsing useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…almost every task in NLP…but…

Parsing as a form of search
Parsing as a Form of Search

  • Searching FSAs

    • Finding the right path through the automaton

    • Search space defined by structure of FSA

  • Searching CFGs

    • Finding the right parse tree among all possible parse trees

    • Search space defined by the grammar

  • Constraints provided by the input sentence and the automaton or grammar

Cfg for fragment of english
CFG for Fragment of English




S vp np nom v det n book that flight

Parse Tree for ‘Book that flight’ for Prior CFG

S VP NP Nom V Det NBook that flight

Rule expansion


VP  V

S  Aux NP VP

PP -> Prep NP

S  VP (1)

N  book | flight | meal | money

NP  Det Nom (3)

V  book | include | prefer

NP PropN

Aux  does

Nom  N Nom

Prep from | to | on

Nom  N (4)

PropN  Houston | TWA

Nom  Nom PP

VP  V NP (2)

Rule Expansion

Det  that | this | a




Top down parser
Top-Down Parser

  • Builds from the root S node to the leaves

  • Assuming we build all trees in parallel:

    • Find all trees with root S (or all rules w/lhs S)

    • Next expand all constituents in these trees/rules

    • Continue until leaves are pos

    • Candidate trees failing to match pos of input string are rejected (e.g. Book that flight matches only one subtree)

Top down search space for cfg expanding only leftmost leaves
Top-Down Search Space for CFG (expanding only leftmost leaves)





Det Nom PropN Det Nom PropN V NP V

Det Nom


Bottom up parsing
Bottom-Up Parsing

  • Parser begins with words of input and builds up trees, applying grammar rules whose rhs match

    • Book that flight

      N Det N V Det N

      Book that flight Book that flight

    • ‘Book’ ambiguous (2 pos appear in grammar)

    • Parse continues until an S root node reached or no further node expansion possible

Two candidates one successful parse
Two Candidates: One Successful Parse




Nom Nom

V Det N V Det N

Book that flight Book that flight

S ~  VP NP

What s right wrong with
What’s right/wrong with….

  • Top-Down parsers – they never explore illegal parses (e.g. which can’t form an S) -- but waste time on trees that can never match the input

  • Bottom-Up parsers – they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)

  • For both: find a control strategy -- how explore search space efficiently?

    • Pursuing all parses in parallel or backtrack or …?

    • Which rule to apply next?

    • Which node to expand next?

A possible top down parsing strategy
A Possible Top-Down Parsing Strategy

  • Depth-first search:

    • Agenda of search states: expand search space incrementally, exploring most recently generated state (tree) each time

    • When you reach a state (tree) inconsistent with input, backtrack to most recent unexplored state (tree)

  • Which node to expand?

    • Leftmost or rightmost

  • Which grammar rule to use?

    • Order in the grammar? How?

Top down depth first left right strategy
Top-Down, Depth-First, Left-Right Strategy

  • Initialize agenda with ‘S’ tree and ptr to first word (cur)

  • Loop: Until successful parse or empty agenda

    • Apply next applicable grammar rule to leftmost unexpanded node (n) of current tree (t) on agenda and push resulting tree (t’) onto agenda

      • If n is a POS category and matches the POS of cur, push new tree (t’’) onto agenda and increment cur

      • Else pop t’ from agenda

    • Final agenda contains history of successful parse

  • Does this flight include a meal?

Left corners top down parsing with bottom up filtering
Left Corners: Top-Down Parsing with Bottom-Up Filtering

  • We saw: Top-Down, depth-first, L2R parsing

    • Expands non-terminals along the tree’s left edge down to leftmost leaf of tree

    • Moves on to expand down to next leftmost leaf…

    • Note: In successful parse, current input word will be first word in derivation of node the parser currently processing

    • So….look ahead to left-corner of the tree

      • B is a left-corner of A if A =*=> Bα

      • Build table with left-corners of all non-terminals in grammar and consult before applying rule

Left recursion vs right recursion
Left Recursion vs. Right Recursion

  • Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP)

  • Solutions:

    • Rewrite the grammar (automatically?) to a weakly equivalent one which is not left-recursive

      e.g. The man {on the hill with the telescope…}

      NP  NP PP (wanted: Nom plus a sequence of PPs)

      NP  Nom PP

      NP  Nom

      Nom  Det N


      NP  Nom NP’

      Nom  Det N

      NP’  PP NP’ (wanted: a sequence of PPs)

      NP’  e

      • Not so obvious what these rules mean…

  • Fix depth of search explicitly

  • Rule ordering: non-recursive rules first

    • NP --> Det Nom

    • NP --> NP PP

  • An exercise the city hall parking lot in town
    An Exercise: The city hall parking lot in town

    • NP NP NP PP

    • NP  Det Nom

    • NP  Adj Nom

    • NP  Nom Nom

    • Nom  NP Nom

    • Nom  N

    • PP  Prep NP

    • N  city | hall | lot | town

    • Adj  parking

    • Prep  to | for | in

    Another problem structural ambiguity
    Another Problem: Structural ambiguity

    • Multiple legal structures

      • Attachment (e.g. I saw a man on a hill with a telescope)

      • Coordination (e.g. younger cats and dogs)

      • NP bracketing (e.g. Spanish language teachers)

    • Solution?

      • Return all possible parses and disambiguate using “other methods”

    Summing up
    Summing Up

    • Parsing is a search problem which may be implemented with many control strategies

      • Top-Down or Bottom-Up approaches each have problems

        • Combining the two solves some but not all issues

      • Left recursion

      • Syntactic ambiguity

    • Next time: Making use of statistical information about syntactic constituents

      • Read Ch 12