1 / 28

Programming Languages Third Edition

Programming Languages Third Edition. Chapter 6 Syntax. Objectives. Understand the lexical structure of programming languages Understand context-free grammars and BNFs Become familiar with parse trees Understand ambiguity, associativity, and precedence Read Sections 6.1 – 6.4, pp. 204-220.

gasha
Download Presentation

Programming Languages Third Edition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming LanguagesThird Edition Chapter 6 Syntax

  2. Objectives • Understand the lexical structure of programming languages • Understand context-free grammars and BNFs • Become familiar with parse trees • Understand ambiguity, associativity, and precedence • Read Sections 6.1 – 6.4, pp. 204-220 Programming Languages, Third Edition

  3. Introduction • Syntax is the structure of a language • Syntax rules are analogous to the grammar rules of a natural language • John Backus and Peter Naur developed a notational system for describing these grammars, now called Backus-Naur forms, or BNFs • First used to describe the syntax of Algol60 • Every modern computer scientist needs to know how to read, interpret, and apply BNF descriptions of language syntax Programming Languages, Third Edition

  4. Flowchart for Compilation Source Code (your program) Compiler Object Code (machine language) Programming Languages, Third Edition

  5. Flowchart for Compilation - Details Source Code (your program = char stream) Semantic analysis (analyses meaning) Scanner (lexical analysis) Intermediate Code Lexical items / Tokens Optimization Parser (syntactic analysis) Object Code (machine language) Parse tree Programming Languages, Third Edition

  6. Lexical Structure of Programming Languages • Lexical structure: the structure of the tokens, or words, of a language • Related to, but different than, the syntactic structure • Scanning phase: the phase in which a translator collects sequences of characters from the input program and forms them into tokens • Parsing phase: the phase in which the translator processes the tokens, determining the program’s syntactic structure Programming Languages, Third Edition

  7. Lexical Structure of Programming Languages (cont’d.) • Tokens generally fall into several categories: • Reserved words (or keywords) • Literals or constants • Special symbols, such as “;” “<=“ “+” • Identifiers Programming Languages, Third Edition

  8. Lexical Structure of Programming Languages (cont’d.) • Token delimiters (or white space): formatting that affects the way tokens are recognized • Indentation can be used to determine structure • Free-format language: one in which format has no effect on program structure other than satisfying the principle of longest substring • Fixed format language: one in which all tokens must occur in prespecified locations on the page • Tokens can be formally described by regular expressions Programming Languages, Third Edition

  9. ScanningRegular Expressions • Metalanguage for describing patterns for strings of characters – metasymbols are | means choice * means zero or more occurrences + means one of more occurrences ? means one optional occurrence [ ] choose one of list of chars in brackets can use a range . (period) means one of any character ( ) can be used for grouping \ can precede metasymbol with this to use metasymbol in string Programming Languages, Third Edition

  10. Regular Expressions (cont’d.) • Most modern text editors use regular expressions in text searches • Utilities such as lex can automatically turn a regular expression description of a language’s tokens into a scanner Programming Languages, Third Edition

  11. Regular Expressions (cont’d.) • Examples: (a|b)*c [ab]*c [aeiou] [aeiouAEIOU] [aeiouAEIOU]+ [aeiouAEIOU]* [A-Z][a-z]* [A-Z]+[a-z] [A-Za-z]* Programming Languages, Third Edition

  12. Regular Expressions (cont’d.) • Examples: [0-9]+ [0-9]+(\.[0-9]+) • Can test by making text file on Unix and using egrep –x “pattern” filename Programming Languages, Third Edition

  13. Regular Expressions (cont’d.) • Let’s try writing some: • Signed integers, sign not optional • Signed integers, sign optional • Signed integers, sign optional, no signed zero • Signed integers, allow leading zeros, but no signed zero Programming Languages, Third Edition

  14. Regular Expressions (cont’d.) • Let’s try writing some for license plates: • Start with VA, followed by zero or more digits • Start with VA, followed by one or more digits • Start with VA, followed by 2 digits, followed by zero or more lower case letters • Start with V or A, followed by -, followed by 2-4 digits • Start with VA, any case, followed by 2-3 digits or 2-3 letters (tried in class) • Start with VA, any case, followed by 2-3 digits, followed by 2-3 letters (meant to try) Programming Languages, Third Edition

  15. ParsingContext-Free Grammars and BNFs • Context-free grammar: consists of • a series of grammar rules (Productions) • Each rule has a single phrase structure name on the left, then a  metasymbol, followed by a sequence of symbols or other phrase structure names on the right • Nonterminals: names for phrase structures, since they are broken down into further phrase structures • Start symbol: one of the Nonterminals • Terminals: words or token symbols that cannot be broken down further Programming Languages, Third Edition

  16. Example 1: Unsigned Integers <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Terminals: 0, 1, … , 9 Nonterminals: <num> , <digit> Start Symbol: <num> Productions: there are 12 Metasymbols: “::=“ , “|” Programming Languages, Third Edition

  17. Example 1 (cont’d) • Derivation: the process of building in a language by beginning with the start symbol and replacing left-hand sides by choices of right-hand sides in the rules • Let’s derive the number 123 (on board) • Parse tree: graphical depiction of the replacement process in a derivation • Let’s draw parse tree for 123 (on board) Programming Languages, Third Edition

  18. Example 1 (cont’d) • Notice recursion in one of rules • Notice recursive symbol is on left • This is a left-recursive grammar • This is a left-associative grammar • Notice how parse tree cascades to left Programming Languages, Third Edition

  19. Example 2: Unsigned Integers <num> ::= <digit> | <digit> <num> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Only made one change, so now grammar is • Right-recursive • Right-associative • Let’s draw parse tree for 123 Programming Languages, Third Edition

  20. Ex 3: Simple Expression Grammar <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Let’s derive parse tree for: 3 + 4 + 5 Programming Languages, Third Edition

  21. Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Is there another parse tree for: 3 + 4 + 5 Programming Languages, Third Edition

  22. Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 A grammar is ambiguous if there are two parse trees for the same string Programming Languages, Third Edition

  23. Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Ambiguity is undesirable (but sometimes unavoidable) Let’s see why it’s undesirable: Derive parse trees for 3 + 4 * 5 Programming Languages, Third Edition

  24. Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 So what was the problem? Which tree provides correct arithmetic interpretation? Programming Languages, Third Edition

  25. Ex 3 (cont’d) Can we modify the grammar to “fix” the problem? YES! Add more levels of productions: <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Programming Languages, Third Edition

  26. Ex 3 (cont’d) <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Redraw parse trees for 3 + 4 + 5 and 3 + 4 * 5 Programming Languages, Third Edition

  27. Chapter 6Final Thoughts • A grammar is context-free when nonterminals appear singly on the left sides of productions • There is no context under which only certain replacements can occur • Anything not expressible using context-free grammars is a semantic, not a syntactic, issue • BNF form of language syntax makes it easier to write translators • Parsing stage can be automated Programming Languages, Third Edition

  28. Chapter 6Final Thoughts • Syntax establishes structure, not meaning • But meaning is related to syntax • Syntax-directed semantics: process of associating the semantics of a construct to its syntactic structure • Must construct the syntax so that it reflects the semantics to be attached later Programming Languages, Third Edition

More Related