Lin6932 topics in computational linguistics l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

LIN6932 Topics in Computational Linguistics PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on
  • Presentation posted in: General

LIN6932 Topics in Computational Linguistics. Lecture 6: Grammar and Parsing (I) February 15, 2007 Hana Filip. Outline for Grammar/Parsing. Context-Free Grammars (CFG) and Constituency Some common CFG phenomena for English Sentence-level constructions NP, PP, VP Coordination

Download Presentation

LIN6932 Topics in Computational Linguistics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lin6932 topics in computational linguistics l.jpg

LIN6932 Topics in Computational Linguistics

Lecture 6: Grammar and Parsing (I)

February 15, 2007

Hana Filip


Outline for grammar parsing l.jpg

Outline for Grammar/Parsing

  • Context-Free Grammars (CFG) and Constituency

  • Some common CFG phenomena for English

    • Sentence-level constructions

    • NP, PP, VP

    • Coordination

    • Subcategorization

  • Levels of complexity in grammars and automata:

    The Chomsky hierarchy

  • Top-down and Bottom-up Parsing

  • Earley Parsing

  • Quick sketch of probabilistic parsing


Review l.jpg

Review

  • Parts of Speech

    • Basic syntactic/morphological categories that words belong to

  • Part of Speech tagging

    • Assigning parts of speech to all the words in a sentence


Syntax l.jpg

Syntax

  • Syntax: from Greek syntaxis, “setting out together, arrangement’

  • Refers to the way words are arranged together, and their relationships

  • Distinction:

    • Prescriptive grammar: how people ought to talk

    • Descriptive grammar: how they do talk

  • Goal of syntax is to model the knowledge that people unconsciously have about the grammar of their native language


Syntax5 l.jpg

Syntax

  • Why should you care?

    • Grammar checkers

    • Question answering

    • Information extraction

    • Machine translation


4 key ideas of syntax l.jpg

4 key ideas of syntax

  • Constituency

  • Grammatical relations

  • Subcategorization

  • Lexical dependencies

    Plus one part we won’t have time for:

  • Movement/long-distance dependency


Context free grammars l.jpg

Context-Free Grammars

  • Capture constituency and ordering

    • Ordering:

      What are the rules that govern the ordering of words and bigger units in the language?

    • Constituency:

      How words group into units and how the various kinds of units behave


Constituency l.jpg

Constituency

  • Noun phrases (NPs)

    • Three parties from Brooklyn

    • A high-class spot such as Mindy’s

    • The Broadway coppers

    • They

    • Harry the Horse

    • The reason he comes into the Hot Box

  • How do we know these form a constituent?

    • They can all appear before a verb:

      • Three parties from Brooklyn arrive…

      • A high-class spot such as Mindy’s attracts…

      • The Broadway coppers love…

      • They sit…


  • Constituency ii l.jpg

    Constituency (II)

    • They can all appear before a verb:

      • Three parties from Brooklyn arrive…

      • A high-class spot such as Mindy’s attracts…

      • The Broadway coppers love…

      • They sit

    • But individual words can’t always appear before verbs:

      • *from arrive…

      • *as attracts…

      • *the is

      • *spot is…

    • Must be able to state generalizations like:

      • Noun phrases occur before verbs


    Constituency iii l.jpg

    Constituency (III)

    • Preposing and postposing:

      • On September 17th, I’d like to fly from Atlanta to Denver

      • I’d like to fly on September 17th from Atlanta to Denver

      • I’d like to fly from Atlanta to Denver on September 17th.

    • But not:

      • *On September, I’d like to fly 17th from Atlanta to Denver

      • *On I’d like to fly September 17th from Atlanta to Denver


    Cfg examples l.jpg

    CFG Examples

    • S -> NP VP

    • NP -> Det NOMINAL

    • NOMINAL -> Noun

    • VP -> Verb

    • Det -> a

    • Noun -> flight

    • Verb -> left


    Slide12 l.jpg

    CFGs

    • S -> NP VP

      • This says that there are units called S, NP, and VP in this language

      • That an S consists of an NP followed immediately by a VP

      • Doesn’t say that that’s the only kind of S

      • Nor does it say that this is the only place that NPs and VPs occur


    Generativity l.jpg

    Generativity

    • As with FSAs you can view these rules as either analysis or synthesis machines

      • Generate strings in the language

      • Reject strings not in the language

      • Impose structures (trees) on strings in the language


    Derivations l.jpg

    Derivations

    • A derivation is a sequence of rules applied to a string that accounts for that string

      • Covers all the elements in the string

      • Covers only the elements in the string


    Derivations as trees l.jpg

    Derivations as Trees


    Cfgs more formally l.jpg

    CFGs more formally

    • A context-free grammar has 4 parameters (“is a 4-tuple”)

      • A set of non-terminal symbols (“variables”) N

      • A set of terminal symbols  (disjoint from N)

      • A set of productions P, each of the form

        • A -> 

        • Where A is a non-terminal and  is a string of symbols from the infinite set of strings (  N)*

      • A designated start symbol S


    Defining a cf language via derivation l.jpg

    Defining a CF language via derivation

    • A string A derives a string B if

      • A can be rewritten as B via some series of rule applications

    • More formally:

      • If A -> is a production of P

      •  and  are any strings in the set (  N)*

      • Then we say that

        • A directly derives 

        • Or A

      • Derivation is a generalization of direct derivation

      • Let 1, 2, … m be strings in (  N)*, m>= 1, s.t.

        • 12, 23… m-1m

        • We say that 1derives m or 1*m

      • We then formally define language LG generated by grammar G

        • As set of strings composed of terminal symbols derived from S

        • LG = {w | w is in * and S * w}


    Parsing l.jpg

    Parsing

    • Parsing is the process of taking a string and a grammar and assigning correct trees to input strings

    • Correct tree: a tree that covers all and only the elements of the input and has an S node at the top


    Context l.jpg

    Context?

    • The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language.

    • All it means is that the non-terminal on the left-hand side of a rule is out there all by itself (free of context)

      A -> B C

      Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found


    Key constituents english l.jpg

    Key Constituents (English)

    • Sentences

    • Noun phrases

    • Verb phrases

    • Prepositional phrases


    Sentence types l.jpg

    Sentence-Types

    • Declaratives: A plane left

      S -> NP VP

    • Imperatives: Leave!

      S -> VP

    • Yes-No Questions: Did the plane leave?

      S -> Aux NP VP

    • WH Questions: When did the plane leave?

      S -> WH Aux NP VP


    Slide22 l.jpg

    NPs

    • NP -> Pronoun

      • I came, you saw it, they conquered

    • NP -> Proper-Noun

      • Los Angeles is west of Texas

    • NP -> Det Noun

      • The president

    • NP -> Nominal

    • Nominal -> Noun Noun

      • A morning flight to Denver


    Slide23 l.jpg

    PPs

    • PP -> Preposition NP

      • From LA

      • To Boston

      • On Tuesday

      • With lunch


    Recursion l.jpg

    Recursion

    • We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly).

      NP -> NP PP[[The flight] [to Boston]]

      VP -> VP PP[[departed Miami] [at noon]]


    Recursion25 l.jpg

    Recursion

    • Of course, this is what makes syntax interesting

      flights from Denver

      Flights from Denver to Miami

      Flights from Denver to Miami in February

      Flights from Denver to Miami in February on a Friday

      Flights from Denver to Miami in February on a Friday under $300

      Flights from Denver to Miami in February on a Friday under $300 with lunch

      Etc.


    Recursion26 l.jpg

    Recursion

    • Of course, this is what makes syntax interesting

      [[flights] [from Denver]]

      [[[Flights] [from Denver]] [to Miami]]

      [[[[Flights] [from Denver]] [to Miami]] [in February]]

      [[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a Friday]]

      Etc.


    Implications of recursion and context freeness l.jpg

    Implications of recursion and context-freeness

    • If you have a rule like

      • VP -> V NP

      • It only cares that the thing after the verb is an NP. It doesn’t have to know about the internal affairs of that NP


    The point l.jpg

    The Point

    • VP -> V NP

    • I hate

      flights from Denver

      Flights from Denver to Miami

      Flights from Denver to Miami in February

      Flights from Denver to Miami in February on a Friday

      Flights from Denver to Miami in February on a Friday under $300

      Flights from Denver to Miami in February on a Friday under $300 with lunch


    Bracketed notation l.jpg

    Bracketed Notation

    • [S [NP [PRO I] [VP [V prefer [NP [NP [Det a] [Nom [N morning] [N flight]]]]


    Coordination constructions l.jpg

    Coordination Constructions

    • S -> S and S

      • John went to NY and Mary followed him

    • NP -> NP and NP

    • VP -> VP and VP

    • In fact the right rule for English is

      X -> X and X


    Other syntactic stuff l.jpg

    Other Syntactic stuff

    • Grammatical Relations

      • Subject

        • I booked a flight to New York

        • The flight was booked by my agent.

      • Object

        • I booked a flight to New York

      • Complement

        • I said that I wanted to leave


    Problems l.jpg

    Problems

    • Agreement

    • Subcategorization

    • Movement (for want of a better term)


    Agreement l.jpg

    This dog

    Those dogs

    This dog eats

    Those dogs eat

    *This dogs

    *Those dog

    *This dog eat

    *Those dogs eats

    Agreement


    Possible cfg solution l.jpg

    S -> NP VP

    NP -> Det Nominal

    VP -> V NP

    SgS -> SgNP SgVP

    PlS -> PlNp PlVP

    SgNP -> SgDet SgNom

    PlNP -> PlDet PlNom

    PlVP -> PlV NP

    SgVP ->SgV Np

    Possible CFG Solution


    Cfg solution for agreement l.jpg

    CFG Solution for Agreement

    • It works and stays within the power of CFGs

    • But its ugly (loss of generalization)

    • And it doesn’t scale all that well


    Subcategorization l.jpg

    Subcategorization

    Verbs have preferences for the number and syntactic kinds of constituents they co-occur with

    • Sneeze: John sneezed intransitive verb

    • Find: Please find [a flight to NY]NP transitive verb

    • Give: Give [me]NP[a cheaper fare]NP ditransitive verb

    • Help: Can you help [me]NP[with a flight]PP DO IO/Oblique

    • Prefer: I prefer [to leave earlier]TO-VP VP complement

    • Said: You said [United has a flight]S sentential complement


    Subcategorization37 l.jpg

    Subcategorization

    But not:

    • *John sneezed the book

    • *I prefer United has a flight

    • *Give with a flight

    • Subcategorization expresses the constraints that a lexical predicate places on the number and syntactic kinds of arguments it wants to take (occur with)

    • So far we have only considerate one type of lexical predicate: verb

    • Nouns, adjectives and prepositions also take arguments


    Slide38 l.jpg

    So?

    • So the various CFG rules for VPs overgenerate.

      • They permit the presence of strings containing verbs and arguments that don’t go together

      • For example

        VP -> V NP

        predicts that

        Sneezed the book is a VP since “sneeze” is a verb and

        “the book” is a valid NP


    Forward pointer l.jpg

    Forward Pointer

    • It turns out that verb subcategorization facts will provide a key element for semantic analysis (determining who did what to who in an event).


    Possible cfg solution40 l.jpg

    Possible CFG Solution

    • VP -> V

    • VP -> V NP

    • VP -> V NP PP

    • VP -> IntransV

    • VP -> TransV NP

    • VP -> TransVwPP NP PP


    Movement l.jpg

    Movement

    • Core example

      • My travel agent booked the flight


    Movement42 l.jpg

    Movement

    • Core example

      • [[My travel agent]NP [booked [the flight]NP]VP]S

    • I.e. “book” is a straightforward transitive verb. It expects a single NP arg as the subject and a single NP arg within the VP


    Movement long distance dependencies l.jpg

    Movement - Long Distance Dependencies

    • What about?

      • Which flight do you want me to have the travel agent book __?

    • The direct object argument to “book” isn’t appearing in the right place. It is a long way from where it originally appeared

    • And note that it is separated from its verb by 2 other verbs.


    Cfgs a summary l.jpg

    CFGs: a summary

    • CFGs appear to be just about what we need to account for a lot of basic syntactic structure in English

    • But there are problems

      • That can be dealt with adequately, although not elegantly, by staying within the CFG framework.

    • There are simpler, more elegant, solutions that take us out of the CFG framework (beyond its formal power)

    • Syntactic theories: HPSG, LFG, CCG, Minimalism, etc


  • Login