1 / 42

Syntax and Processing it

Syntax and Processing it. John Barnden School of Computer Science University of Birmingham Natur al Language Process ing 1 2010/11 Semester 2. Our Concerns. The general nature of syntax (grammatical structure of sentences) Its significance

kamea
Download Presentation

Syntax and Processing it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax and Processing it John Barnden School of Computer Science University of Birmingham Natural Language Processing 1 2010/11 Semester 2

  2. Our Concerns • The general nature of syntax (grammatical structure of sentences) • Its significance • Context-free grammars S  NP VP etc. etc. • Syntactic structure of a sentence • Parsing: using a grammar to compute a syntax (syntactic structure) for a given sentence. S NP VP

  3. Overview • The motivation for the idea of syntax • Review of some types of phrase (syntactic categories) • Review of notion of context-free grammar • How separate/separable is syntax from other aspects of language? • Three methods for syntactic analysis: • Definite Clause Grammars [as an extension to Prolog] • Recursive transition networks [brief] • Active chart parsing • Other remarks.

  4. Motivation • Sentences seem to be able to be correctly or incorrectly structured in a way that seems largely independent of the meaning or the plausibility of the meaning. • The following seem correctly structured: • The monkey ate the banana. • The banana ate the monkey. • The car hit the building. • The building hit the car. • And moreover they seem to have the same structure. • The following seem incorrectly structured: • Banana the ate monkey the. • The the car building hit. • And we notice that incorrectness irrespective of considering meaning.

  5. Motivation, contd • At the same time, what particular structure we see in a sentence affects meaning in important ways. • Classic example: The girl saw the woman with the telescope in the park. • Some possible rough structures and interpretations: • [The girl] saw [the woman] [with the telescope] [in the park] • The seeing used the telescope and happened in the park • [The girl] saw [the woman] [with [the telescope in the park]] • The seeing used the telescope that was in the park • The girl [saw [the woman with the telescope] [in the park]] • The seeing happened in the park; the woman had a telescope • The girl [saw [the woman [with [the telescope in the park]]]] • The woman was with the telescope that was in the park. • Any others?

  6. Motivation, contd • We also see that phrases of certain sorts can be moved around in certain ways. • E.g., we can do this to • The girl saw the woman with the telescope in the park. • to get • The woman in the park with the telescope saw the girl. • But we cannot move a phrase just anywhere and we cannot move just any phrase around: e.g. cannot change the original to get • The with the telescope girl saw the woman in the park. • The girl saw the telescope the woman with in the park.

  7. Motivation, contd • Recursive structuring: phrases can be seen to have other phrases as constituents: • We can see • The woman with the telescope in the park eating sandwiches • as • [The woman [with [the telescope [in the park]]] [eating sandwiches]] • There is no in-principle limit to the depth of structuring or to its breadth. • Hence there is no in principle limit to sentence length.

  8. Motivation, contd • Some sorts of constituent of a sentence generally require the presence (in suitable positions) of other constituents of particular types. E.g. • A preposition needs to be followed by a specific sort of phrase • A verb may need to have surrounding constituents playing the role of subject, direct object, etc. • Any phrase of an appropriate type, no matter how internally complex the phrase is, can play a given role.

  9. Motivation, contd: a Caveat • English grammar is very much based on contiguity (adjacency) of lexical forms and their order in the sentence. • This is partly because it has very little in the way of agreement between words (e.g. a plural subject requiring a plural form of the verb) and very little in the way of case endings(e.g. nominative versus accusative forms of a pronoun) or verb inflections. • But languages with more pervasive use of agreement, case endings, verb inflections, etc. can have widely separated words that form units. E.g., an adjective and a noun can be associated with each other by agreement rather than contiguity. • English arguably has examples of this: notably verb-particlephrases such as “take up” meaning “adopt”. The verb and the particle are often separated, as in “He tookthat marvellous plan of Julie’s up”. But they arguably form a syntactic unit. • We can even imagine a language where order and adjacency are completely irrelevant

  10. a Caveat, contd. • So, in general, talk of “phrases” and “constituents” needs to be taken as encompassing non-contiguous sets of lexical forms, even though in the following they will always be contiguous sequences.

  11. Motivation: Summary • Language has constituent structure: the expressions of the language can be structured into constituents of particular types, where a constituent of a particular type can often have, or may need to have, sub-constituents of particular types. • Language has syntactic compositionality: there are composition rules that, whenever they are given constituents of particular types, produce constituents (or whole sentences) of particular types, where the resulting types are dependent only on the rule and the constituents given to it. • The compositionality is linked to the constituent structure: to at least a first approximation, the resulting constituent contains the original constituents as parts. • E.g. Putting a determiner and a noun together to get a (simple sort of) “noun phrase”; putting a preposition together with a noun phrase to get a “prepositional phrase”; putting a noun phrase and a prepositional phrase together to get a noun phrase. • So, the constituent structure is systematic by virtue of the compositionality. • Through the syntactic compositionality, language is (in principle) productive: the composition rules can be applied successively to any extent, leading to infinitely many possible constituents and sentences. • And constituents/sentences can be of any structural depth and length, in principle.

  12. Performance versus Competence • The ability freely to apply a grammar when speaking or understanding is often taken to be a sort of competence. • But often practical considerations place limits on what one can do: e.g. what amount of complexity one can tolerate in a sentence. • Notice the in principle bits on previous slide. • And one can make mistakes. • These perturbations of competence are labelled as being matters of performance. • But some performance matters can be built into grammars, so it’s ultimately rather arbitrary which effects are relegated to “performance.” • Performance also encompasses graded effects like relative perceived naturalness of structures in specific cases. E.g.: although “He gave the crying boy a nice red pencil” and “He gave a nice red pencil to the crying boy” are both equally OK, consider: • He gave the crying boy who had spots all over his face and a hole in his shirt a nice red pencil • He gave a nice red pencil to the crying boy who had spots all over his face and a hole in his shirt.

  13. In-Class Diagnostic Syntax Exercise(not assessed, no ID/name needed) • For each of the following, annotate each word with a POS, and provide a syntax tree. If you can’t do the latter, at least annotate some phrases as to what sort of thing they are: “noun phrase”, “verb phrase”, “prepositional phrase”, “relative clause” or whatever. • Dogs hate cats. • The dog bit three cats. • The dog bit the cat with the long tail. • The girl saw the woman with the telescope in the park. • The man who came to dinner was my uncle. • Shouting at a policeman is not a sensible thing to do. • For John to shout at a policeman is unusual. • The more I walk the thinner I get. • By and large, the British are confused about their identity. • Mary believes that Cadillacs are a sort of fruit. • Mary wants to eat a Cadillac, boiled then fried.

  14. Syntactic Categories • Syntactic Categories: the categories of constituent: e.g. Sentence (S), Noun Phrase (NP), Verb Phrase (VP), Prepositional Phrase (PP), etc., • and I include all the lexical categories (Noun, Adjective, etc.). • Sentence (S): Phrases in S are the largest lexical-form sequences that are subject to syntactic characterization, usually. • But people have proposed grammar-like structuring of discourse more broadly. • A special case of this are patterns of “turn taking” in conversation. • Of course, can trivially go beyond sentences, as e.g. defining a paragraph to be a sequence of sentences. • A sentence can in principle contain a smaller sentence as a constituent, although a grammar often includes as a “clause” instead. • A clause is a sentence-like constituent; often can’t stand as a sentence in its own right – e.g. Relative clause “that is in the park” following “the telescope”. • S needs to cover declarative (= statement-making), interrogative, and imperative sentences.

  15. Syntactic Categories: S contd • To cope with language in practice, we have to allow S to be broader than normally thought of, or allow a discourse to include complete utterances that are non-sentences. E.g., consider the following complete utterances: • Yes. / Hallo. / Hmm? / Oh, damn. / How about it. • Brilliant. / Or tomorrow, for that matter. / A beer, please. / You silly idiot. • [Could be considered to be cases of ellipsis.] • Abbreviated language of various other sorts, such as: • the expression in square brackets just above • newspaper headlines • in road signs • in telegrams

  16. Syntactic Categories: General Problems • Ellipsis more generally needs to be taken care of: either within the definition of S and other syntactic categories, or as a special side-issue that can distort constituents. E.g. • John wants to have snail and seaweed ice-cream on his cake, but Mary just seaweed. • Some idioms or other fixed phrases, and some special constructions, have strange syntax: • By and large, ... / By the by, ... • The more I walk the more relaxed I get. [Can also be just part of a larger unit.] • Punctuation: typically not handled in a grammar, partly because too inconsistent and too style-dependent. • Direct speech/thought is often not quoted, leading to special structuring, e.g.: • No! I’m not up for that, he said/thought.

  17. Syntactic Categories: Noun Phrase (NP) • An NP is a constituent that can act like a noun. • Examples [with illustrative context in square brackets]: • any noun, any proper name • any non-adjectival pronoun (I, him, mine, these [when pronoun], which [when pronoun], ...) • the horse / that horse / my horse / John’s horse • three huge red cars • one hundred and three huge, very red cars • the horse with a long tail and a glossy mane (includes a PP) • the horse that had a long tail and a glossy mane (includes a relative clause) • the man called Horse [lived with the Algonquins] • the baking (gerund) • baking a cake [is a tricky business] (gerund) • which horse [do you want to ride?] • for John to shout at a policeman [is unusual] (for-complement used as noun) • to shout at a policeman [is to run a huge risk] (to-complement used as noun) • that John shouted at a policeman [amazes me] (that-complement used as noun) • the Riviera firefighter arson scandal (noun-noun compound)

  18. Syntactic Categories: Prepositional Phrase (PP) • a PP is a preposition followed by an NP • on the table • with metal legs • The preposition may be more than one word (e.g. In: it can cost up tofive pounds). • The NP can itself include a PP (e.g.: on the table with the metal legs ). • In that example, “with the metal legs” serves an adjectival function: • it’s a form of adjectival phrase • He broke the table on Saturday: the PP serves an adverbial function: • it’s a form of adverbial phrase • Should “in back of”, “by means of”, “with respect to” etc. be counted as multi-word prepositions? Or should “back”, “means” and “respect” be treated as separate nouns?

  19. Syntactic Categories: Verb Phrase (VP) • a VP is a verb possibly preceded by auxiliary verbs and possibly followed by (non-subject) complements and adjuncts • washed • washed the dishes • wash the dishes (could be a command) • do wash the dishes • don’t / do not wash the dishes • [we] wash the dishes • was putting the cloth on the table at 5pm • “the cloth” and “on the table” are complements of the verb: necessary parameter values • “at 5pm” is an adjunct: an additional, optional qualification • believed that John was laying the table • “that John was laying the table” is a that-complement, containing a clause/sentence • wanted to lay the table / Mary wanted John to lay the table: • here we have a to-complement, containing a VP or a clause (a sentence-like phrase)

  20. Syntactic Categories: Relative Clauses • A relative clause is a clause with an adjectival function, hence a form of adjectival clause. • A restrictive relative clause: a clause that has an adjectival function within a noun phrase, further restricting what is being talked about: • The goat that/which had long ears [ate my shoelaces] (pedantically which is held to be incorrect) • The goat what/wot ’ad long ears [ate my shoelaces] (incorrect!) • The man that/who had long ears [ate my lunch]. (pedantically who is held to be incorrect) • The man with whom I had lunch [had long ears]. • The man I had lunch with [had long ears] • The man that/who I had lunch with [had long ears] • A non-restrictive relative clause: a clause that is in apposition with (= accompanies) a noun phrase, and adds extra information to something already specified by the noun-phrase: • The goat, which had long ears, [ate my shoelaces] • The man, with whom I had lunch by the way, [had long ears] • The man, who I did have lunch with by the way, [had long ears] • Restrictive case: no commas. Non-restrictive case: commas.

  21. Syntactic Categories: Relative Clauses, etc. • The goat really did eat my shoelaces. [Needed for exam  ] • [Mary knows] who laid the table: a who-complement, with the same internal syntax as a relative clause. • Apposition is a broader phenomenon than non-restrictive relative clauses: • Theodosia Kirkbride, a most refined landlady, never mentioned the rent to me. • Theodosia Kirkbride – landlady, mud wrestler and quantum physicist – ate my lunch. • Theodosia Kirkbride, the landlady, never mentioned the rent to me. • The landlady, Theodosia Kirkbride, never mentioned the rent to me. • The vile boy, singing an annoying little X Factor tune, kicked my dustbin over.

  22. What Now? • Particular syntactic structures as syntax trees. • NB: Somewhat restrictive notationally. Not the only method used, but is a standard tool. • Context-free grammar for capturing rules of syntactic structure (that is generally expressed as trees). • Definite Clause Grammars: CF grammar in Prolog. • As recognition mechanism • As parsing mechanism • Other parsing mechanisms.

  23. a Syntax Tree XY : non-terminal syntactic categories Xyz : terminal syntactic categories abc: wordforms S NP VP Det Noun Verb PP PP Prep NP Prep NP Det Adj Noun Noun The dog sat on a big mat with dignity

  24. Ambiguous Syntax: Another Parse S NP VP Det Noun Verb PP Prep NP Det Adj Noun PP Prep NP Noun The dog sat on a big mat with dignity

  25. Context-Free Grammars (CFGs) • A CF grammar consists of productions(a form of rule). A production is of form: • <non-terminal symbol>  <seq of one or more symbols (terminalor non-terminal)> • Non-CF grammars: more symbols allowed on LHS. Explained later. • Non-terminal symbols: non-terminal syntactic categories (NP, etc.) and special categories • (e.g., AdjSeq, PPseq – though these could be regarded as new syntactic categories). • Terminal symbols: lexical categories (Noun, etc.) and sometimes specific lexical forms that play a special syntactic function (e.g.: to{preceding infinitive verb}, by {in passive voice}). • Examples of productions: • S  NP VP S  NP VP PP S  VP • NP  Noun NP  Det AdjSeq Noun • AdjSeq  Adj AdjSeq AdjSeq  Adj

  26. Context-Free Grammars (CFGs), contd. • Some other refinements: • Kleene star for zero or more occurrences: S  NP VP PP* • Related + symbol for one or more occurrences: NNcompound  Noun+ • Square brackets for optional elements: NP  [Det] Adj* Noun • Upright bar for alternatives: MOD  PP | RELCLAUSE • Such refinements reduce the number of productions needed, but are not formally necessary (they don’t increase the coverage of the grammar). • But can affect precise syntactic structure: PP* causes possibly more branches at parent node, • whereas PPseq leads to binary branching. • NB: Traditionally, “N”, “V” are used instead of our “Noun”, “Verb”.

  27. Tree view of productions VP  Verb NP VP Verb NP

  28. Definite Clause Grammarsin Prolog

  29. DCGs: Introduction • A way of writing syntactic recognizers and parsers directly in Prolog. • We write Prolog rules of a special type. These look very much like CF grammar productions. • Recognition or parsing happens by the normal Prolog computation process. • Different structures can be recognized/created for the same sentence, by the normal alternative-answer process of Prolog: i.e., natural handling of syntactic ambiguity. • In the parsing case, syntax trees are produced. • Grammatical constraints such as agreement are also easy to include. • The rules can be translated into ordinary Prolog, but with a lot of extra parameters that are tedious to write and that obscure the main information. • The compiler meta-interprets the rules into normal Prolog. • Caution: DCGs provide only top-down depth-first parsing, because of Prolog’s approach to using rules. • But other strategies may be better. More on this later.

  30. DCGs, contd: Recognition • See link on Slides page to a toy recognizer in DCGthat you can examine and play with. • Example DCG rules for recognition of non-terminal categories: • s --> np, vp. • np --> noun, pp. np --> det, adj, noun, pp. • Example DCG rules for recognition of terminal categories: • det --> [a]. det --> [an]. det --> [the]. • noun --> [cat]. noun --> [dog]. noun --> [dogs]. • verb --> [dogs]. • (There is another, more economical method.) • The program can be run in two ways: • s([a, dog, sits, on, a, mat], []). np([a, dog], []). • phrase(s, ([a, dog, sits, on, a, mat]). phrase(np,[a,dog]). • The second argument for s, npetc. is for catching extra words: • np([a, dog, sits, on, a], X). Gives X = [sits, on, a].

  31. Advantage of DCGs over ordinary Prolog • Consider the abstract grammar rules S  NP VP NP  Det Noun • Here’s how they could be implemented in ordinary Prolog (for just recognition, but syntax-tree constructing and grammatical-category checking [see later] can be added) : • s(WordList, Residue):- • np(WordList, Residue_to_pass_on), vp(Residue_to_pass_on, Residue). • np(WordList, Residue):- • det(WordList, Residue_to_pass_on), noun(Residue_to_pass_on, Residue). • det([the | Residue], Residue). • noun([dog | Residue], Residue). • Can be called as in:s([a, dog, sits, on, a dog], []). • Exercise: See ordinary-prolog version of the recognizer linked from Slides page. • Compared to DCG form, have the extra WordListandResiduearguments in every syntactic-category predicate. Tedious, error-prone.

  32. DCGs: Additions • Can embed ordinary Prolog within grammar rules. • Can use disjunction and cuts. • Can add arguments to the category symbols (np, det, etc.) so as to • Build syntax trees, i.e. do parsing, not just recognition • Include “grammatical categories” (used to enforce constraints such as agreement) • Build semantic structures. • Will see some of this in following slides.

  33. DCGs: Parsing • Add a parameter to each category symbol, delivering a node of the syntax tree: • vp(vp_node(Verb_node, PP_node) ) --> verb(Verb_node), pp(PP_node). • verb(verb_node(sits)) --> [sits]. • The program can again be run in two ways: • s(ST, [a, dog, sits, on, a, mat], []). • phrase(s(ST), ([a, dog, sits, on, a, mat]). • See links on Slides page to toy parsers in DCGthat you can examine and play with. • So far: “basic” parser1. • An initial exercise: add new words and new NP rules.

  34. DCGs: Syntactic Ambiguity • Suppose we add two extra rules: • vp( vp_node(Verb_node, PP_node1, PP_node2) ) --> • verb(Verb_node), pp(PP_node1), pp(PP_node2). • np( np_node(Det_node, N_node, PP_node) ) --> • det(Det_node), noun(N_node), pp(PP_node). • Then we get two different structures for • A dog sits on the mat with the flowers. • Exercise: • Work out by hand what structures you should get, both as drawn syntax trees and as Prolog forms. • Try it out using the relevant parser on the Slides page.

  35. Terminals: A Better Implementation • verb(verb_node(Word)) --> [Word], {verb_pred(Word)}. • The part in braces is ordinary Prolog. • Individual verbs are included as follows: • verb_pred(sit). verb_pred(sits). verb_pred(hates). • This is less writing per individual verb, and concentrates the node-building into one place. • Looks possibly less efficient, because of the extra step. • BUT in modern Prologs it speeds up execution: • by making the DCG terminal symbol call (verb in top line above) deterministic • by making the call of the lexical predicates (verb_pred, etc.) deterministic. • Exercise: amend one of the toy parsers by using the above method.

  36. Here

  37. Grammatical Categories • A grammatical category is a dimension along which (some) lexical or syntactic consistuents can vary in limited, systematic ways, such as (in English): • Number singular or plural: lexically, nouns, verbs, determiners, numerals • Person first, second and third: lexically, only for verbs, nouns and some pronouns • Tense present, past (various forms), future: lexically, only for verbs • Gender M, F, N [neither/neuter]: lexically, only some pronouns and some nouns • Syntactic constituents can sometimes inherit grammatical category values from their components, e.g. (without showing all possible GC values): • the big dog: 3rd person M/F/N singular NP // the big dogs: 3rd person M/F/N plural NP • we in the carpet trade: 1st person M/F plural NP // you silly idiot: 2nd person M/F singular NP • eloped with the gym teacher:past-tense VP // will go:future-tense VP • the woman with the long hair: female NP // the radio with the red knobs: neuter NP • A lexical or syntactic constituent can be ambiguous as to a GC value: • e.g. sheep: singular/plural; manage: singular/plural 1st/2nd person

  38. Grammatical Categories, contd • Why worry about grammatical categories? Because there needs to be agreement of certain sorts. We can’t have the following (the * indicates incorrectness): • Within NPs as to number: • * a flowers // * this flowers // * many flower // * three big flower • Between VPs and their NP subjects as to number & person, and case when NP is pronoun: • * he eagerly eat [when not subjunctive] • * the women in the theatre eats • * all the women was • * I is // us are // * you idiot am // * them went • Between VPs and their pronoun direct objects as to case: • * the guy robbed we • Between NPs and pronouns referring to them, as to number and gender: • * the woman took out their gun and shot his dog [when it’s her gun and her dog]

  39. Grammatical Categories in DCGs • We can add grammatical categories into lexical entries, via a second type of argument: • noun(n_node(dog), gcs(numb(singular), person(third)) ) --> [dog]. • noun(n_node(dog), gcs(numb(plural), person(third)) ) --> [dogs]. • det(det_node(a), gcs(numb(singular), person(third)) ) --> [a]. • det(det_node(the), gcs(numb(_), person(third)) ) --> [the]. • verb(v_node(sits), gcs(numb(singular), person(third)) ) --> [sits]. • verb(v_node(sit), gcs(numb(plural), person(third)) ) --> [sit]. • verb(v_node(sit), gcs(numb(_), person(first)) ) --> [sit]. • verb(v_node(sit), gcs(numb(_), person(second)) ) --> [sit].

  40. Grammatical Categories in DCGs, contd • Or, using the better lexicon representation: • noun(n_node(Word), gcs(numb(Numb), person(third)) ) • --> [Word], {noun_pred(Word, Numb)}. • noun_pred(dog, singular). • noun_pred(dogs, plural).

  41. Grammatical Categories in DCGs, contd • Enforcing agreement in an NP syntax rule: • np(np_node(Det_node, N_node), gcs(Number_gc, Person_gc) ) • --> det(Det_node, gcs(Number_gc, Person_gc) ), • noun(n_node, gcs(Number_gc, Person_gc) ). • OR more simply, if don’t need to enforce a particular shape to gcs(...): • np(np_node(Det_node, N_node), GCs) • --> det(Det_node, GCs), noun(n_node, GCs). • Enforcing subject-NP / VP agreement (NB: doesn’t handle the case GC) • s(s_node(NP_node, VP_node), GCs) • --> np(NP_node, GCs), vp(VP_node, GCs).

  42. Grammatical Categories in DCGs, contd • Not enforcing agreement within part of a VP rule: • vp(vp_node(Verb_node, PP_node), GCs ) • --> verb(Verb_node, GCs), pp(PP_node). • OR if you needed PP to return some GCs that didn’t matter: • vp(vp_node(Verb_node, PP_node), GCs ) • --> verb(Verb_node, GCs), pp(PP_node, _ ). • Exercise: understand and play around with the GC version of the parser linked from Slides page. • The program can again be run in two ways: • s(ST, GCs, [a, dog, sits, on, a, mat], []). • phrase(s(ST, GCs), ([a, dog, sits, on, a, mat]).

More Related