1 / 37

Chapter 2 Syntax

Chapter 2 Syntax. Syntax. The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be constituted from characters The syntactic structure specifies how sentences can be constituted from words. Lexical Structure.

burke
Download Presentation

Chapter 2 Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 Syntax

  2. Syntax • The syntax of a programming language specifies the structure of the language • The lexical structure specifies how words can be constituted from characters • The syntactic structure specifies how sentences can be constituted from words

  3. Lexical Structure • The tokens of a programming language consist of the set of all baisc grammatical categories that are the building blocks of syntax • A program is viewed as a stream of tokens

  4. Standard Token Categories • Keywords, such as if and while • Literalsorconstants, such as 42 (a numeric literal) or "hello" (a string literal) • Special symbols, such as “;”, “<=”, or “+” • Identifiers, such as x24, putchar, or monthly_balance

  5. White Spaces and Comments • White spaces and comments are ignored except they function as delimiters • Typical white spaces: newlines, tabs, spaces • Comments: • /* … */, // … \n (C, C++, Java) • -- … \n (Ada, Haskell) • (* … *) (Pascal, ML) • ; … \n (Scheme)

  6. C tokens There are six classes of tokens: identifiers, keywords, constants, string literals, operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments as described below (collectively, "white space") are ignored except as they separate tokens. Some white space is required to separate otherwise adjacent identifiers, keywords, and constants. If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token.

  7. An Example /* This program counts from 1 to 10. */ main( ) { inti; for (i = 1; i <= 10; i++) { printf(“%d\n”, i); } }

  8. Backus-Naur Form (BNF) • BNF is a notation widely used in formal definition of syntactic structure • A BNF is a set of rewriting rules , a set of terminal symbols , a set of nonterminal symbols N, and a “start symbol” SN • Each rule in  has the following formA where AN and (N  )*

  9. Backus-Naur Form • The terminals in  form the basic alphabet(tokens) from which programs are constructed • The nonterminals in N identify grammatical categories like Identifier, Integer, Expression, Statement, Function, Program • The start symbol S identifies the principal grammatical category being defined by the grammar

  10. Examples 1. binaryDigit 0 binaryDigit 1 binaryDigit 0 | 1 2. IntegerDigit|Integer Digit Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 metasymbol or metasymbol concatenate

  11. Derivation • Integer •  IntegerDigit • IntegerDigitDigit • DigitDigitDigit • 3DigitDigit • 3 5Digit • 3 5 2 Sentential form Sentence

  12. Parse Tree Sentential form

  13. Example: Expression AssignmentIdentifier=Expression ExpressionTerm|Expression+Term |Expression–Term TermFactor|Term*Factor |Term/Factor Factor Identifier|Literal|(Expression)

  14. Example: Expression x + 2 * y

  15. Syntax for a Subset of C Program voidmain( ) {Declarations Statements } Declarations  | Declarations Declaration Declaration  Type Identifiers; Type int|boolean Identifiers Identifier|Identifiers, Identifier Statements  | Statements Statement Statement  ;|Block|Assignment |IfStatement|WhileStatement Block {Statements } AssignmentIdentifier=Expression; IfStatement if(Expression) Statement | if(Expression) Statement else Statement WhileStatement while(Expression) Statement

  16. Syntax for a Subset of C ExpressionConjuction|Expression||Conjuction ConjuctionRelation|Conjuction&&Relation RelationAddition|Relation<Addition| Relation<=Addition| Relation>Addition| Relation>=Addition| Relation==Addition| Relation!=Addition AdditionTerm|Addition+Term|Addition–Term TermNegation|Term*Negation|Term/Negation Negation  Factor|!Factor Factor Identifier|Literal|(Expression)

  17. Example: Program . . void main ( ) { int x; x = 1;}

  18. Ambiguity • A grammar is ambiguous if it permits a string to be parsed into two or more different parse trees AmbExpInteger|AmbExp – AmbExp2 - 3 - 4

  19. An Example (2 – 3) – 4 2 – (3 – 4)

  20. The Dangling Else Problem if ( x < 0 ) if ( y < 0 ) y = y – 1; else y = 0;

  21. The Dangling Else Problem if ( x < 0 ) if ( y < 0 ) y = y – 1; else y = 0;

  22. The Dangling Else Problem • Solution I: use a special keyword fi to explicitly close every if statement. For example, in AdaIfStatement if(E) S fi| if(E) S else S fi • Solution II: use an explicit rule outside the BNF syntax. For example, in C, every else clause is associated with the closest preceding if in the statement

  23. Extended BNF (EBNF) • EBNF introduces 3 parentheses: • It uses { } to denote repetition to simplify the specification of recursion • It uses [ ] to denote the optional part • It uses ( ) for grouping

  24. An Example ExpressionTerm| Expression+Term| Expression– Term TermFactor|Term*Factor|Term/ FactorFactor+number|-number|number grouping ExpressionTerm{ (+|– )Term} TermFactor{ (*|/ )Factor}Factor[+|-]number zero or more occurrences optional

  25. Abstract Syntax • The abstract syntax of a language identifies the essential syntactic elements in a program without describing how they are concretely constructed while i < n do begin i := i + 1 end while(i < n){ i = i + 1; } Pascal C

  26. Example: Loop • Thinking a loop abstractly, the essential elements are a test expression for continuing a loop and a body which is the statement to be repeated • All other elements constitute nonessential “syntactic sugar” • The complete syntax is usually called concrete syntax

  27. Example: Loop while i < n do begin i := i + 1 end loop = < Pascal + i n i while (i < n) { i = i + 1; } i 1 C

  28. Example: Expression x + 2 * y

  29. + x * y 2 Example: Expression x + 2 * y

  30. Parser • A parser of a language accepts or rejects strings based on whether they are legal strings in the language • In a recursive-descent parser, each nonterminal is implemented as a function, and each terminal is implemented as a matching with the current token

  31. Example: Calculator commandexpr ‘\n’ exprterm{‘+ ’term} termfactor{‘*’factor} factor number |‘(’expr‘)’ number digit{digit } digit 0| 1| 2| 3| 4| 5| 6| 7| 8| 9

  32. Example: Calculator #include <ctype.h> #include <stdlib.h>#include <stdio.h>int token;int pos = 0; void command(void);void expr(void);void term(void);void factor(void);void number(void);void digit(void);

  33. Example: Calculator main(){ parse(); return 0;} void getToken(void){ token = getchar(); pos++; while (token == ' ') { token = getchar(); pos++; }} void parse(void){ getToken(); command();}

  34. Example: Calculator commandexpr ‘\n’ void command(void){ expr(); match(‘\n’);} void match(char c){ if (token == c) getToken(); else error();}

  35. Example: Calculator exprterm{‘+ ’ term} termfactor{‘*’ factor} void term(void){ factor(); while (token == '*') { match('*'); term(); }} void expr(void){ term(); while (token == '+') { match('+'); term(); }}

  36. Example: Calculator factornumber|‘(’ expr ‘)’ numberdigit{digit} void factor(void){ if (token == '(') { match('('); expr(); match(')'); } else { number(); }} void number(void){ digit(); while (isdigit(token)) digit();}

  37. Example: Calculator void digit(void){ if (isdigit(token)) match(token); else error();} void error(void){ printf("parse error: position %d: character %c\n", pos, token); exit(1);}

More Related