1 / 22

Parsing

Parsing. Goals of Parsing. Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse, a complete parse tree Parse tree (or trace) is basis for translation. Top-down Parsers.

dian
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing

  2. Goals of Parsing • Check the input for syntactic accuracy • Return appropriate error messages • Recover if possible • Produce, or at least traverse, a complete parse tree • Parse tree (or trace) is basis for translation

  3. Top-down Parsers • Parse tree is built from the root down to the leaves • Builds parse tree in preorder • Corresponds to a leftmost derivation • Parsing decision problem: choosing correct rule • Two most common algorithms: • Recursive Descent – implemented in code • Table driven implementation • Both are LL algorithms (left-to-right scan, left-most derivation)

  4. Bottom-up • Parse tree is built from the leaves up to the root • Builds parse in reverse of a rightmost derivation • Requires finding a handle, that is, a correct RHS • Most common algorithms are LR (left-to-right, rightmost derivation)

  5. Complexity • The most general parsing algorithms work for any unambiguous grammar • Complicated, inefficient • O(n^3) • Trade generality for efficiency • Commercial compilers have complexity O(n)

  6. Recursive Descent • Parser is made up of a collection of subprograms • One for each non-terminal • Subprogram responsible for generating the parse tree rooted at the given non-terminal • Pulls tokens from the tokenizer, and leaves the first token not a part of its rule in nextToken • If multiple rules associated with the current non-terminal, first a determination of the correct rule must be made

  7. Function Factor //<factor> -> id | (<expr>) void factor() {    if (nextToken == ID_CODE) lex(); else if (nextToken == LEFT_PAREN_CODE) { lex(); expr();     if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } else error(); /* Neither RHS matches */ }

  8. <ifstmt> ::= if ( <boolexpr> ) <stmt> [else <stmt>] void ifstmt() { if (nextToken != IF_CODE) error(); else { lex(); if (nextToken != LEFT_PAREN) error(); else { lex(); boolexpr(); if (nextToken != RIGHT_PAREN) error(); else { lex(); statement(); if (nextToken == ELSE_CODE) { lex(); statement(); } } } } }

  9. Grammar Restrictions • Left-recursion is a problem • A ::= A + B • Parsing would never terminate! • In some cases, left-recursion can be eliminated by refactoring the grammar • E ::= E + T | T • E ::= T E’ • E’ ::= + T E’ | ε

  10. Grammar restrictions continued • Ability to choose correct production based on a single next token • Pairwise disjointedness test indicates whether or not this choice can be accomplished • If the first terminal that can be generated from a rule is unique A ::= aB | bAb | Bb B ::= cB | d A ::= aB | Bab B ::= aB | b FIRST Sets {a} {b} {c, d} Disjoint, Recursive descent parsable FIRST Sets {a} {a,b} Not disjoint, not recursive descent parsable

  11. Table driven parsers • Encode production choice in a table • Rows indicate current top of the stack • Columns for each input token • Entry in matrix gives production number • Preferred for large grammars • Algorithm is fixed • Only table size grows

  12. Expression Grammar Example S ::= A $ A ::= i = E; E ::= T E’ E’ ::= | AO T E’ AO ::= + | - T ::= F T’ T’ ::=MO F T’ MO ::= * | / F ::= F’ P F’ ::= | UO UO ::= - | ! P ::= i | l | ( E )

  13. Bottom-up Parsing • Often called shift-reduce algorithms • Integral piece of every bottom-up parser is a stack • Shift moves the next input token onto the stack • Reduce replaces a RHS on the top of the stack with the corresponding LHS • Most bottom-up parsing algorithms are variations of the LR process • Originally designed by Donald Knuth • Relatively small program and a parsing table

  14. Advantages of LR Parsers • Will work for nearly all grammars that describe programming languages. • Work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. • Can detect syntax errors as soon as it is possible. • LR class of grammars is a superset of the class parsable by LL parsers

  15. Disadvantage • For anything but very small grammars, it is difficult to produce by hand the parsing table • But this is exactly what tools like yacc and bison can do for us automatically! • Original version was computationally intensive (both in terms of time and memory) • Variations developed: • Less computer resources required • Not as general

  16. Key Insight • A bottom-up parser can use the entire history of the parse, up to the current point, to make parsing decisions • There are only a finite and relatively small number of different parse situations that could have occurred, so the history can be stored in a parser state, on the parse stack

  17. Parser Configuration • Made up of both the stack, and the input • For each state on the stack, there is an associated grammar symbol • E.g. (S0X1S1X2S2…XmSm, aiai+1…an$) where Si indicates a state, and Xi indicates a grammar symbol • Initial configuration: (S0, a0…an$)

  18. Table driven bottom up parsing • Table has two components: • ACTION table • Specifies the action of the parser, given the parser state and the next token • Rows are state names • Columns are terminals • GOTO table • Specifies state to put in the stack after a reduce operation • Rows are state names • Columns are non-terminals

  19. Structure of an LR parser

  20. Parser actions • If ACTION[Sm, ai] = Shift S, the next configuration is: (S0X1S1X2S2…XmSmaiS, ai+1…an$) • If ACTION[Sm, ai] = Reduce A  and S = GOTO[Sm-r, A], where r = the length of , the next configuration is (S0X1S1X2S2…Xm-rSm-rAS, aiai+1…an$) • If ACTION[Sm, ai] = Accept, the parse is complete and no errors were found. • If ACTION[Sm, ai] = Error, the parser calls an error-handling routine.

  21. Example LR Parsing Table • 1. E ::= E + T • 2. E ::= T • 3. T ::= T * F • 4. T ::= F • 5. F ::= ( E ) • 6. F ::= id

  22. Trace of parse of id + id * id

More Related