1 / 73

LIN6932: Topics in Computational Linguistics

LIN6932: Topics in Computational Linguistics. Hana Filip. Parsing with context-free grammars. Grammar Equivalence and Chomsky Normal Form. Weak equivalence Strong equivalence. Grammar Equivalence and Chomsky Normal Form (CNF).

stefano
Download Presentation

LIN6932: Topics in Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIN6932: Topics in Computational Linguistics Hana Filip LIN 6932

  2. Parsing with context-free grammars LIN 6932

  3. Grammar Equivalence and Chomsky Normal Form • Weak equivalence • Strong equivalence LIN 6932

  4. Grammar Equivalence and Chomsky Normal Form (CNF) many proofs in the field of languages and computability make use of the Chomsky Normal Form. there are algorithms that decide whether a given string can be generated by a given grammar and that use the Chomsky normal form: e.g., the CYK (Cocke-Younger-Kasami) LIN 6932

  5. Grammar Equivalence and Chomsky Normal Form (CNF) • Chomsky Normal Form (CNF) is one of the most basic Normal Forms (roughly: in the context of computing and rewriting systems, a form that cannot be further reduced to a simpler form). In CNF each production (rewriting rule) has the form A → B C or A →α where • A, B and C are nonterminal symbols • α is a terminal symbol (i.e., a symbol that represents a constant value) • productions (rewriting rules) are expansive: throughout the derivation of a string, each string of terminals and nonterminals is always either the same length or one element longer than the previous such string LIN 6932

  6. Grammar Equivalence and Chomsky Normal Form (CNF) • For grammars in Chomsky Normal Form the parse tree is always a binary tree. • We can talk about the relationship between: • the depth of the parse tree, and • the length of its yield. LIN 6932

  7. Grammar Equivalence and Chomsky Normal Form (CNF) • If a parse tree for a word string w is generated by a CNF and the parse tree • has a path length of at most i, • then the length of w is at most 2i-1. LIN 6932

  8. Grammar Equivalence and Chomsky Normal Form (CNF) LIN 6932

  9. Grammar Equivalence and Chomsky Normal Form (CNF) Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be efficiently transformed into an equivalent one which is in Chomsky normal form. LIN 6932

  10. Grammar Equivalence and Chomsky Normal Form (CNF) LIN 6932

  11. Grammar Equivalence and Chomsky Normal Form (CNF) LIN 6932

  12. CFG for Fragment of English: G0 LIN 6932

  13. Parse Tree for ‘Book that flight’ using G0 LIN 6932

  14. FSA and Syntactic Parsing with CFGs(see previous lecture: types of formal grammar on Chomsky H - the class of languages they generate - types of finite state automata that recognizes each class) CFG rule: NP  (Det) Adj* N LIN 6932

  15. Parsing as a Search Problem • parsing (linguistics: syntax analysis) is the process of analyzing a sequence of tokens to determine its grammatical structure with respect to a given formal grammar. LIN 6932

  16. Parsing as a Search Problem • Searching FSAs • Finding the right path through the automaton • Search space defined by structure of FSA • Searching CFGs • Finding the right parse tree among all possible parse trees • Search space defined by the grammar • Constraints provided by • the input sentence and • the automaton or grammar LIN 6932

  17. Two Search Strategies How can we use Go to assign the correct parse tree(s) to a given string of words? • Constraints provided by • the input sentence and • the automaton or grammar Give rise to two search strategies: • Top-Down (Hypothesis-Directed) Search • Search for tree starting from S until input words covered. • Bottom-Up (Data-Directed) Search • Start with words and build upwards toward S LIN 6932

  18. Two Search Strategies search strategies and epistemology (the study of knowledge and justified belief, philosophy of science) • Top-Down (Hypothesis-Directed) Search • Search for tree starting from S until all input words covered • Rationalist tradition: emphasizes the use of prior knowledge • Bottom-Up (Data-Directed) Search • Start with words and build upwards toward S • Empiricist tradition: emphasizes the data The rationalist vs. empiricist controversy concerns the extent to which we are dependent upon sense experience in our effort to gain knowledge LIN 6932

  19. Top-Down Parser • Builds from the root S node down to the leaves • Assuming we build all trees in parallel: • Find all trees with root S • Next expand all constituents in these trees/rules • Continue until leaves are part of speech categories (pos) • Candidate trees failing to match pos of input string are rejected • Top-Down: Rationalist Tradition • Expectation- or Theory-driven • Goal: Build tree for input starting with S LIN 6932

  20. Top-Down Search Space for G0 LIN 6932

  21. Bottom-Up Parsing • The earliest known parsing algorithm (suggested by Yngve 1955) • Parser begins with words of input and builds up trees, applying G0 rules whose right-hand sides match • Book that flight N Det N V Det N Book that flight Book that flight • ‘Book’ambiguous • Parse continues until an S root node reached or no further node expansion possible • Bottom-Up: Empiricist Tradition • Data driven • Primary consideration: Lowest sub-trees of final tree must hook up with words in input. LIN 6932

  22. Expanding Bottom-Up Search Space for ‘Book that flight’ LIN 6932

  23. Comparing Top-Down and Bottom-Up • Top-Down parsers: never explore illegal parses (e.g. parses that can’t form an S) -- but waste time on trees that can never match the input • Bottom-Up parsers: never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root) • For both: how to explore the search space? • Pursuing all parses in parallel or …? • Which node to expand next? • Which rule to apply next? LIN 6932

  24. A Possible Top-Down Parsing Strategy • Depth-first search: • start at the root (selecting some node as the root in the graph case) and expand as far as possible until • you reach a state (tree) inconsistent with input, backtrack to the most recent unexplored state (tree) • Which node to expand? • Leftmost • Which grammar rule to use? • Order in the grammar LIN 6932

  25. Basic Algorithm for Top-Down, Depth-First, Left-Right Strategy • Initialize agenda with ‘S’ tree and point to first word and make this current search state (cur) • Loop until successful parse or empty agenda • Apply next applicable grammar rule to leftmost unexpanded node (n) of current tree (t) on agenda and push resulting tree (t’) onto agenda • If n is a POS category and matches the POS of cur, push new tree (t’’) onto agenda • Else pop t’ from agenda • Final agenda contains history of successful parse LIN 6932

  26. Example: Does this flight include a meal? LIN 6932

  27. Example continued … LIN 6932

  28. Augmenting Top-Down Parsing with Bottom-Up Filtering • We saw: Top-Down, depth-first, L-to-R parsing • Expands non-terminals along the tree’s left edge down to leftmost leaf of tree • Moves on to expand down to next leftmost leaf… • In a successful parse, current input word will be the first word in derivation of the unexpanded node that the parser is currently processing • So … look ahead to left-corner of the tree • B is a left-corner of A if A ==>* B • Build table with left-corners of all non-terminals in grammar and consult before applying rule LIN 6932

  29. Left Corners Pre-compute all POS that can serve as the leftmost POS in the derivations of each non-terminal category LIN 6932

  30. Previous Example: Left-Corner Table for G0 LIN 6932

  31. Summing Up Parsing Strategies • Parsing is a search problem which may be implemented with many search strategies • Top-Down vs. Bottom-Up Parsers • Both generate too many useless trees • Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up look-ahead • Left-corner table provides more efficient look-ahead • Pre-compute all POS that can serve as the leftmost POS in the derivations of each non-terminal category LIN 6932

  32. Three Critical Problems in Parsing • Left Recursion • Ambiguity • Repeated Parsing of Sub-trees LIN 6932

  33. Left Recursion • A long-standing issue regarding algorithms that manipulate context-free grammars (CFGs) in a "top-down" left-to-right fashion is that left recursion can lead to nontermination, to an infinite loop. • Direct Left Recursion happens when you have a rule that calls itself before anything else. Examples:NPNP PP, NP NP and NP, VPVP PP, S S and S • Indirect Left Recursion: Example:NPDet Nominal DetNP ’s LIN 6932

  34. NP NP NP Det Nominal ’s Left Recursion • Indirect Left Recursion: Example:NPDet Nominal DetNP ’s LIN 6932

  35. Solutions to Left Recursion • Don't use recursive rules • Rule ordering • Limit depth of recursion in parsing to some analytically or empirically set limit • Don't use top-down parsing LIN 6932

  36. Solution: Grammar Rewriting • Rewrite a left-recursive grammar to a weakly equivalent one which is not left-recursive. • How? • By Hand (ick) or … • Automatically LIN 6932

  37. Solution: Grammar Rewriting I saw the man on the hill with a telescope. N V NP PP PP NP: noun phrase PP: prepositional phrase Phrase: characterized by its head (N, V, P) Ambiguous: 5 possible parses LIN 6932

  38. Solution: Grammar Rewriting I saw the man on the hill with a telescope. (1) S NP VP V NP N PP PP LIN 6932

  39. Solution: Grammar Rewriting I saw the man on the hill with a telescope. (2) S NP VP PP V NP N PP LIN 6932

  40. Solution: Grammar Rewriting I saw the man on the hill with a telescope. (3) S NP VP PP PP V NP LIN 6932

  41. Solution: Grammar Rewriting I saw the man on the hill with the telescope… NP  NP PP (recursive) NP  N PP (nonrecursive) NP  N …becomes… NP  N NP’ NP’  PP NP’ NP’  e • Not so obvious what these rules mean… LIN 6932

  42. Rule Ordering • Bad: • NP  NP PP • NP  Det N • Rule ordering: non-recursive rules first • First: NP  Det N • Then: NP  NP PP LIN 6932

  43. Depth Bound • Set an arbitrary bound • Set an analytically derived bound • Run tests and derive reasonable bound empirically LIN 6932

  44. Ambiguity • Lexical Ambiguity • Leads to hypotheses that are locally reasonable but eventually lead nowhere • “Book that flight” • Structural Ambiguity • Leads to multiple parses for the same input LIN 6932

  45. Lexical Ambiguity: Word Sense Disambiguation (WSD) as Text Categorization • Each sense of an ambiguous word is treated as a category. • “play” (verb) • play-game • play-instrument • play-role • “pen” (noun) • writing-instrument • enclosure • Treat current sentence (or preceding and current sentence) as a document to be classified. • “play”: • play-game: “John played soccer in the stadium on Friday.” • play-instrument: “John played guitar in the band on Friday.” • play-role: “John played Hamlet in the theater on Friday.” • “pen”: • writing-instrument: “John wrote the letter with a pen in New York.” • enclosure: “John put the dog in the pen in New York.” LIN 6932

  46. Structural ambiguity • Multiple legal structures • Attachment (e.g. I saw a man on a hill with a telescope) • Coordination (e.g. younger cats and dogs) • NP bracketing (e.g. Spanish language teachers) LIN 6932

  47. Two Parse Trees for Ambiguous Sentence LIN 6932

  48. Humor and Ambiguity • Many jokes rely on the ambiguity of language: • Groucho Marx: One morning I shot an elephant in my pajamas. How he got into my pajamas, I’ll never know. • She criticized my apartment, so I knocked her flat. • Noah took all of the animals on the ark in pairs. Except the worms, they came in apples. • Policeman to little boy: “We are looking for a thief with a bicycle.” Little boy: “Wouldn’t you be better using your eyes.” • Why is the teacher wearing sun-glasses. Because the class is so bright. LIN 6932

  49. Ambiguity is Explosive • Ambiguities compound to generate enormous numbers of possible interpretations. • In English, a sentence ending in n prepositional phrases has over 2n syntactic interpretations. • “I saw the man with the telescope”: 2 parses • “I saw the man on the hill with the telescope.”: 5 parses • “I saw the man on the hill in Texas with the telescope”: 14 parses • “I saw the man on the hill in Texas with the telescope at noon.”: 42 parses LIN 6932

  50. What’s the solution? Return all possible parses and disambiguate using “other methods” LIN 6932

More Related