1 / 48

LING 408/508: Computational Techniques for Linguists

LING 408/508: Computational Techniques for Linguists. Lecture 21 10/10/2012. Outline. Parsing Parsing arithmetic exprs . in prefix notation Parsing arithmetic exprs . in postfix notation Short assignment # 13 Long assignment #6.

vidar
Download Presentation

LING 408/508: Computational Techniques for Linguists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 408/508: Computational Techniques for Linguists Lecture 21 10/10/2012

  2. Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6

  3. Previously: string representations of arithmetic expressions • Infix: (5 / 7) + ((2 * 4) - 1) • Parentheses have inserted for disambiguation; they are not represented in the original tree • Prefix: + / 5 7 - * 2 4 1 • Postfix: 5 7 / 2 4 * 1 - + • Different string representations of the same tree + / - 5 7 1 * 2 4

  4. Parsing arithmetic expressions • Given a string representation of an arithmetic expression, construct a binary tree • Example: • Input: '5 7 / ' • Output: ('/', (5,None,None), (7,None,None)) • Parsing algorithms for different notations • Prefix: recursion • Postfix: iteration; shift-reduce parsing with a stack • Infix: recursion; recursive-descent parsing

  5. Parsing algorithms operate upon tokenized input string # input: a space-separated string # representing an expression # output: a list of strings # # EXAMPLE: # input: '+ 3 * 2 5' # output: ['+', '3', '*', '2', '5'] def tokenize(s): return s.split(' ')

  6. Parsing natural language syntax • Given a sentence, return all parse trees for that sentence.

  7. Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6

  8. Prefix notation • Operator occurs before left and right operands / 5 7 • Operands may be recursively constructed expressions + / 5 7 - * 2 4 1

  9. Parsing prefix notation • Recursive case: • Read operator op • Recursively parse left operand lnode • Recursively parse right operand rnode • Construct node: (op, lnode, rnode) • Base case: • Read an integer (need to convert string to integer) • Construct node: (value, None, None)

  10. Attempt #1 (doesn’t work) def parse_prefix(s): # s is a list of strings operators = {'+', '-', '*', '/'} if s[0] not in operators: # base case: integer return (int(s[0]), None, None) # leaf node else: # recursive case op = s[0] lnode = parse_prefix(s) rnode = parse_prefix(s) return (op, lnode, rnode) # parent node

  11. Variable position for reading operator • Two lines of code refer to s[0] if s[0] not in operators: op = s[0] • But an operator can be in many positions in the string • Solution: specify starting index for parsing an operand from the input string parse_pref(s, idx)

  12. Attempt #2 (doesn’t work):specify starting index def parse_prefix(s, idx=0): operators = {'+', '-', '*', '/'} if s[idx] not in operators: return (int(s[idx]), None, None) else: op = s[idx] lnode = parse_prefix(s, idx+1) rnode = parse_prefix(s, idx+1) return (op, lnode, rnode) T = parse_pref(s)[0] # s is a list of strings

  13. Doesn’t work: right operand index • Input: [operator][left operand][right operand] • Calling function: op = s[idx] lnode = parse_prefix(s, idx+1) rnode = parse_prefix(s, idx+1) return (op, lnode, rnode) • Left operand begins immediately after operator • Index idx+1 in input string • Where does right operand begin? • Second argument should be greater than idx+1 • Need to know how large the left operand is

  14. Right operand index:use size of left subtree • Instead of just returning the node corresponding to a subtree for an operand: lnode = parse_prefix(s, idx) also return the size of the subtree: (lnode, lsz) = parse_prefix(s, idx) • Now the calling function will know where to begin to parse right operand in input string [operator][left operand][right operand] idx idx+1 idx+1 + size(left)

  15. Solution: also return size of subtree def parse_prefix(s=0): operators = {'+', '-', '*', '/'} if s[idx] not in operators: # base case: integer leaf = (int(s[idx]), None, None) return (leaf, 1) # size of subtree else: op = s[idx] (lnode, lsz) = parse_prefix(s, idx+1) (rnode, rsz) = parse_prefix(s, idx+1 + lsz) parent = (op, lnode, rnode) return (parent, 1 + lsz + rsz) T = parse_prefix(s)[0]

  16. A complete program, with tokenization def tokenize(s): return s.split(' ') def parse_prefix(s=0): operators = {'+', '-', '*', '/'} if s[idx] not in operators: # base case: integer leaf = (int(s[idx]), None, None) return (leaf, 1) # size of subtree else: op = s[idx] (lnode, lsz) = parse_prefix(s, idx+1) (rnode, rsz) = parse_prefix(s, idx+1 + lsz) parent = (op, lnode, rnode) return (parent, 1 + lsz + rsz) s = '+ 3 * 2 5' T = parse_prefix(tokenize(s))[0]

  17. Example: sequence of function calls for input s = '+ 3 * 2 5' parse_prefix(s, 0) # parses + 3 * 2 5 parse_prefix(s, 1) # parses 3 parse_prefix(s, 2) # parses * 2 5 parse_prefix(s, 3) # parses 2 parse_prefix(s, 4) # parses 5

  18. Example: function calls and return values (nodes and size) for s = '+ 3 * 2 5' parse_pref(s, 0) # parses + 3 * 2 5 parse_pref(s, 1) # parses 3 returns ((3,None,None),1) parse_pref(s, 2) # parses * 2 5 parse_pref(s, 3) # parses 2 returns ((2,None,None),1) parse_pref(s, 4) # parses 5 returns ((5,None,None),1) returns ((*,(2,None,None),(5,None,None)),3) returns ((+,(3,None,None),(*,(2,None,None),(5,None,None))),5)

  19. Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6

  20. Parsing postfix • Postfix: 5 7 / 2 4 * 1 - + • Operator occurs after left and right operands • Requires shift-reduce parsing • Shift: create node according to position in input, advance one position • Reduce: when we see an operator in the input, construct a parent node for the two previous operands

  21. Example • Input: 3 5 + Tokenize as ['3', '5', '+'] • Sequence of steps: 1. idx = 0 Shift: read 3, construct (3, None, None) 2. idx = 1 Shift: read 5, construct (5, None, None) 3. idx = 2 Reduce: read +, construct parent node: ('+', (3,None,None), (5,None,None))

  22. Example • Input: 3 5 + 6 8 / * • Sequence of steps (some omitted): • Reduce: read +, construct parent node: ('+', (3, None, None), (5, None, None) • Reduce: read /, construct parent node: ('/', (6, None, None), (8, None, None) • Reduce: read *, construct parent node: ('*', ('+',(3,None,None),(5,None,None)), ('/',(6,None,None),(8,None,None)))

  23. Reduce operation applies to the2 most-recently shifted items • 3 5 + Apply + to a leaf and a leaf • 3 5 + 6 8 / * Apply * to a parent and a parent • 5 7 / 2 4 * 1 - + Apply - to a parent ('*',2,4) and a leaf (1,None,None)

  24. Can accumulate arbitrarily many operands before reducing • 1 2 3 4 5 + + + + = 1+(2+(3+(4+5))) in infix • Need a data structure to hold these operands • Keep track of their order • Operator applies to two most recent operands 1 2 3 4 5 + + + +

  25. Stack data structure • Stores a sequence of items • Example: stack of 1, 2, 3 • Example: empty stack 3 2 1

  26. Stack operations • Push: put an item on the top of the stack • Pop: take an item off the top of the stack • Example: 4 Push 4 Pop 3 3 3 2 2 2 1 1 1

  27. Analogy to stack of plates:only (put on / take off of) the top http://www.gettyimages.com/detail/200131588-001/The-Image-Bank http://blog.timesunion.com/advocate/files/2008/09/stack_of_plates.jpg

  28. The stack in shift-reduce parsing • Shift: • Push an item on top of the stack • Reduce: • Pop an item from the top (right operand) • Pop another item from the top (left operand) • Perform computation with operator • Push new item onto the stack

  29. The stack in shift-reduce parsing • End result: after reading the entire input string, the result of the computation is the single item on the stack • (Assume that the input string is well-formed) • A stack is the same thing as a pushdown automaton (abstract machine that recognizes context-free languages)

  30. Example: parse this postfix expression:5 7 / 2 4 * 1 - + • Initially: empty stack

  31. Postfix: 5 7 / 2 4 * 1 - + • Read 5 • Push 5 5 In Python, this is [(5,None,None)]

  32. Postfix: 5 7 / 2 4 * 1 - + • Read 7 • Push 7 7 5 5 In Python, this is [(5,None,None), (7,None,None)]

  33. Postfix: 5 7 / 2 4 * 1 - + • Read / • Pop 7 • Pop 5 • Construct node • Push node 7 5 ( /, 5, 7 ) In Python, this is [('/', (5,None,None), (7,None,None))]

  34. Postfix: 5 7 / 2 4 * 1 - + • Read 2 • Push 2 2 ( /, 5, 7 ) ( /, 5, 7 ) In Python, this is [('/', (5,None,None), (7,None,None)), (2,None,None)]

  35. Postfix: 5 7 / 2 4 * 1 - + • Read 4 • Push 4 4 2 2 ( /, 5, 7 ) ( /, 5, 7 )

  36. Postfix: 5 7 / 2 4 * 1 - + • Read * • Pop 4 • Pop 2 • Construct node • Push node 4 2 ( *, 2, 4 ) ( /, 5, 7 ) ( /, 5, 7 )

  37. Postfix: 5 7 / 2 4 * 1 - + • Read 1 • Push 1 1 ( *, 2, 4 ) ( *, 2, 4 ) ( /, 5, 7 ) ( /, 5, 7 )

  38. Postfix: 5 7 / 2 4 * 1 - + • Read - • Pop 1 • Pop (*, 2, 4 ) • Construct node • Push node 1 ( *, 2, 4 ) (-, (*,2,4), 1) ( /, 5, 7 ) ( /, 5, 7 )

  39. Postfix: 5 7 / 2 4 * 1 - + • Read + • Pop (-, (*, 2, 4), 1) • Pop (/, 5, 7) • Construct node • Push node (-, (*,2,4), 1) ( /, 5, 7 ) (+, ( /, 5, 7 ), (-, (*,2,4), 1))

  40. Postfix: 5 7 / 2 4 * 1 - + • End of input string • Stop • Result on top of stack (+, ( /, 5, 7 ), (-, (*, 2, 4), 1)) In Python, this is [('+', ('/', (5,None,None), (7,None,None)), ('-', ('*', (2,None,None), (4,None,None)), (1,None,None)))]

  41. Implementing a stack in Python • Want: • Sequence of items • Push: add to end of sequence • Pop: remove from end of sequence • Use a list • Push is list.append • Pop is list.pop >>> help(list.pop) Help on method_descriptor: pop(...) L.pop([index]) -> item -- remove and return item at index (default last). Raises IndexError if list is empty or index is out of range.

  42. Code for parsing postfix def parse_postfix(s): stack = [] operators = {'+', '-', '*', '/'} for x in s: if x not in operators: leaf = (int(x), None, None) stack.append(leaf) # push on top of stack else: rnode = stack.pop() lnode = stack.pop() parent = (x, lnode, rnode) stack.append(parent) return stack[0] # single node on stack

  43. Can use stack to evaluate an expression directly,without constructing a tree first • Instead of: read 5, push 5 read 5, push 5 read /, pop 7, pop 5, construct node • Do: read 5, push 5 read 7, push 7 read /, pop 7, pop 5, compute, push 0 7 5 5 ( /, 5, 7 ) 7 5 5 0

  44. Later: parsing infix • Infix: (5 / 7) + ((2 * 4) - 1) • Prefix: + / 5 7 - * 2 4 1 • Postfix: 5 7 / 2 4 * 1 - + + / - 5 7 1 * 2 4

  45. Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6

  46. Due 10/12 • Convert this prefix expression to infix: - * / 8 3 * + 7 4 2 9 • Draw the sequence of stack operations for parsing this postfix expression: 1 2 3 + 4 5 6 * / - +

  47. Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6

  48. Due 10/19

More Related