1 / 30

Lesson 3

Lesson 3. CDT301 – Compiler Theory , Spring 2011 Teacher : Linus Källberg. Outline. Introduction to parsing Specifying language syntax using CFGs Ambiguous grammars. Introduction to parsing. Why use regexps and grammars?. It gives a clear understanding of the language

israel
Download Presentation

Lesson 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lesson 3 CDT301 – CompilerTheory, Spring 2011 Teacher: Linus Källberg

  2. Outline • Introduction to parsing • Specifying language syntax using CFGs • Ambiguous grammars

  3. Introduction to parsing

  4. Why use regexps and grammars? • It gives a clear understanding of the language • Most grammars and regexps can be used more or less directly as input to parser generators • Grammars can be used to specify also the semantics (e.g., generation of code) • A grammar serves as a clear and compact specification for a recursive top-down parser

  5. Overview of parsing • The lexical analyzer (or scanner or tokenizer) splits the input into tokens • Token = type + attribute • Examples: <id, 3>, <+>, <num, 1234> • This is done by determining membership of strings in regular languages

  6. Overview of parsing • The parser uses the tokens as terminals to build a parse tree • Implicitly or explicitly • Most often, the parser repeatedly “asks” the scanner for the next token

  7. Overview of parsing • The parser tries to determine which grammar rules to apply to build the parse tree • No suitable rules found = syntax error • Two main strategies: top-down or bottom-up • Top-down parsing starts with the start symbol, i.e., the root of the parse tree • Bottom-up parsing starts with the terminals, i.e., the leaves of the parse tree

  8. Examples of grammars • Lists of space-separated digits like 1 9 7 4 5 • Possible solution, assuming non-empty lists: digit_list → digit digit_list | digit • Note: • digit is a terminal: the name of a token, of which the actual integer value is an attribute • The spaces are assumed to have been removed in the lexical analysis; therefore they are not present in the grammar

  9. Examples of grammars • Simple expressions, e.g., id + id + idid + id E → E + idE → id • Note: here '+' is a token (terminal) as well as id

  10. Examples of grammars • Grammar for a “begin-end” block in the Pascal language: block → beginstmt_listendstmt_list → stmt_list ; stmt | stmtstmt→ assign | if … … (more statement types)

  11. Exercise (1) Write a grammar for the language that allows declarations of a single integer array with initialization in C. The list is not allowed to be empty. Example: intarr[2] = {1, 2, 42}; Note: don't care about matching the number of elements in the initialization with the array size. What are suitable tokens? What change is needed in order to allow the initialization list to be empty?

  12. Top-down parsing • Also called predictive parsing • Works as this: • Creates the root of the parse tree • Repeatedly expands non-terminal nodes in the parse tree, i.e., adding children to them, until the tree is finished, or the parser gets stuck (syntax error) • What grammar rules to apply is predicted by looking at the input • In lab 1 you will implement a variant known as recursive descent

  13. Recursive descent – example • Grammar: S → num C C → , S C → ; • Example strings: 3; 5, 7, 9; 1, 2, 3, 4, 5;

  14. Recursive descent – example int main(void){ // 1 = OK // 0 = syntax errorreturnExpectS();}intExpectS(){if (Lookahead()==NUM) { Consume();returnExpectC(); }elsereturn 0;} intExpectC(){switch (Lookahead()) {case COMMA: Consume();returnExpectS();case SEMICOLON: Consume();return 1;default:return 0; }}

  15. Using the recursivedescent technique • The previous parser merely determines whether or not the input program is correct • However, by inserting semantical actions (code segments) into the parser, a syntax-directed translation can be performed during the parse • We will look at this later

  16. Ambiguous grammars

  17. Writing parsers fromcontext-free grammars • Different grammars may describe the same language. Example: S → e S | eand S → S e | edescribe the same language, a non-empty sequence of e's • The preferred form of the grammar depends on the parsing strategy used

  18. Ambiguous grammars • A grammar is ambiguous if it is possible to build more than one parse tree for a produced string • It is still a valid grammar for the language • This might make it hard to use the grammar to write a parser • The grammar doesn't guide the parsing algorithm in making decisions

  19. Exercise (2) Show that the following grammar is ambiguous, by building two different parse trees for some string produced by the grammar expr → expr + expr | expr – expr | num

  20. Handling ambiguity • Ignore it • Bad for the semantical analysis • Rewrite the grammar • Handle it carefully in the parser • Explicit directives to the parser generator • Which parse tree is preferred?

  21. Rewriting the expression grammar • The grammar can be rewritten to an unambiguous form, and still describe the same language • However, preferably the (unique) parse trees should reflect the order in which the operators (+ and -) are applied • Application order is specified by operator associativity and operator precedence (described later)

  22. Operator associativity • Binary operators are often left-associative, e.g., +, -, *, and / • This means that if an operand is surrounded by two operators of the same type, the left operator should be applied before the right one • Examples: 3 - 7 - 9 = (3 - 7) - 9 a - (b + c) - d = (a - (b + c)) - d

  23. Rewriting the expression grammar • We rewrite the ambiguous grammar expr → expr + expr | expr – expr | num as expr → expr + num | expr – num | num • Both grammars describe the exact same language, but the latter one unambiguously and also reflecting the left associativity

  24. Rewriting the expression grammar • In this particular case the ambiguity could be resolved by using operator associativity • In general we do not aim to express semantics with the grammar • There is no general method for rewriting ambiguous grammars to unambiguous ones

  25. Operator precedence • In addition to associativity, operators have a precedence level • Example: * and / have higher precedence than + and -. This means that a + b * c = a + (b * c) although both + and * are left-associative • Operators with higher precedence are always applied before those with lower precedence • The application order for operators within the same precedence group is given by their associativity

  26. Operator precedence in C

  27. Exercise (3) The previous grammar contained only + and -, which have the same precedence. Let's add * and / to the grammar as well: expr → expr + num | expr – num | expr * num | expr / num | num Rewrite this grammar to reflect the operator precedence (it is already unambiguous, and the associativity is already reflected) Tip: operators on the same precedence level can be handled identically

  28. “Dangling-else” • Grammar for if-else statements: stmt →if ( expr ) stmt else stmt | if ( expr ) stmt | other • Problematic program: if (expr) if (expr) other else other

  29. Conclusion • The parser builds a parse tree (or syntax tree), either explicitly or implicitly, by grouping tokens provided by the scanner using productions of the grammar • There can be several grammars for the same language • Ambiguous grammars can sometimes be rewritten as unambiguous grammars

  30. Next time • Recursive descent parsers • Left recursion • Left factoring

More Related