1 / 31

3. Phase 2 : Syntax Analysis Part I

3. Phase 2 : Syntax Analysis Part I. Parsing. The symbol table. The abstract syntax tree. Representing statements & expressions. Recursive descent parsing. A bit of theory. Parsing. Given a grammar and a particular sequence of symbols, either :

renjiro
Download Presentation

3. Phase 2 : Syntax Analysis Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3. Phase 2 : Syntax Analysis Part I • Parsing. • The symbol table. • The abstract syntax tree. • Representing statements & expressions. • Recursive descent parsing. • A bit of theory.

  2. Parsing • Given a grammar and a particular sequence of symbols, either : • Find how that sequence might be derived from the sentence symbol by using the productions of the grammar. • Or, show that the sequence could not be derived from the sentence symbol and therefore is not a sentence. • In programming language terms, for ‘sentence’ read ‘source program’ and for ‘sequence’ read ‘contents of the file being parsed’. • The parser reads the file and checks that the contents could be derived from the production rules. If not it produces compile time error messages.

  3. Parsing II • If the source program is syntactically correct then parsing produces : • Symbol table (ST) from the declarations. • Abstract syntax tree (AST) from the statements. • If the source program is syntactically incorrect then parsing produces : • Error messages. • ST and AST data structures and error messages declared in syner.h in phase 2 unit directory : /usr/users/staff/aosc/cm049icp/phase2

  4. The Symbol Table • A data structure recording all the user defined symbols (i.e. identifiers) declared in the program. • Either a variable sized array or a linked list. • No variable sized arrays in C++ so we’ve got to use a linked list. • C++ does have dynamic arrays but they are not variable sized. • Each (user defined) identifier is stored in a struct with elements for its name, type and other things as desired. • Used to determine what identifiers refer to during syntax analysis and code generation.

  5. The Symbol Table II • For a real language the ST is a sequence of frames, one per subprogram. • Frame : Conceptually, a sequence of entries, one per variable / constant. • C-- has no subprograms so the ST is simply a sequence of entries. • Example C-- source : const string s = “Hello” ; int n = 0 ; bool b ; { ... }

  6. ident : “b” type : BOOLDATA constFlag : false initialise : NULL address : 0 next : ident : “n” type : INTDATA constFlag : false initialise : address : 0 next : 0 ident : “s” type : STRINGDATA constFlag : true initialise : address : 0 next : NULL “Hello” Symbol Table Entries • Schematic diagram : Stored in reverse of declaration order.

  7. C++ Data Structures For The Symbol Table • In syner.h : enum DataType { VOIDDATA, BOOLDATA, STRINGDATA, INTDATA } ; struct SymTab { string ident ; DataType type ; bool constflag ; Factor *initialise ; int address ; SymTab *next ; } ; struct Factor { ... // Later. } ;

  8. if > = = b 6 x 0 y x The Abstract Syntax Tree • An abstract representation of the program. • An n-ary tree. • Example C-- source : if (b > 6) x = 0 ; else y = x ; • Of course it’s nowhere near as simple as this diagram suggests.

  9. C++ Data Structures For The Abstract Syntax Tree • In syner.h : enum StatType { NULLST, ASSIGNST, IFST, WHILEST, CINST, COUTST } ; struct AST { StatType tag ; AssignSt *assignst ; IfSt *ifst ; WhileSt *whilest ; CinSt *cinst ; CoutSt *coust ; AST *next ; } ;

  10. tag : ASSIGNST next : assignst : tag : COUTST next : NULL coutst : AST for x = 23 + 4 AST for cout << x Schematic Diagram Of An Abstract Syntax Tree Built in reverse order. Must then be reversed. • Example C-- program : void main() { x = 23 + 4 ; cout << x ; } Unused fields set to NULL.

  11. target expr AST for y + z * 34 ST entry for x Assignment Statements • Example C-- source : x = y + z * 34 ; • C++ data structure : struct AssignSt { SymTab *target ; Expression *expr ; } ; • The Expression data structure is quite complicated. • Must handle arbitrary expressions. • Later.

  12. if Statements • Example C-- source : if (a > b) { cout << a ; } else { cout << b ; } ; • C++ data structure : struct IfSt { Expression *condition ; AST *thenstats ; AST *elsestats ; int elselabel ; int endlabel ; } ; The { and } are obligatory in C--. So is the ;. elselabel and endlabel are used in code generation.

  13. AST for a > b condition : thenstats : elsestats : elselabel : 0 thenlabel : 0 AST for cout << a AST for cout << b if Statements II • Schematic diagram :

  14. while Statements • Example C-- source : while (a > 0) { a = a - 1 ; cout << a ; } ; • C++ data structure : struct WhileSt { Expression *condition ; AST *stats ; int startlabel ; int endlabel ; } ; The { and } are obligatory in C--. So is the ;. startlabel and endlabel are used in code generation.

  15. AST for a > 0 condition : stats : startlabel : 0 endlabel : 0 AST for a = a - 1 ; cout << a ; while statements II • Schematic diagram :

  16. cin statements • Example C-- source : cin >> a ; • C++ data structure : struct CinSt { SymTab *invar ; } ; • Schematic diagram : a must be an intvariable in C--. Symbol table entry for a. invar :

  17. Symbol table entry for a. outvar : cout statements • Example C-- source : cout >> a ; • C++ data structure : struct CoutSt { SymTab *outvar ; } ; • Schematic diagram : a must be an int variable or constant or a string constant in C--.

  18. Expressions • C-- syntax : expression ::= basic_exp [ rel_op basic_exp ] basic_exp ::= term { add_op basic_exp } term ::= factor {mul_op term } factor ::= literal | identifier | ‘(‘ expression ‘)’ | ‘!’ factor • Example expressions : 22 + 4 < b z != y a !true

  19. Expressions II • In syner.h : struct Expression { BasicExp *be1 ; string relOp ; BasicExp *be2 ; } ; struct BasicExp { Term *term ; string addOp ; BasicExp *bexp ; } ; struct Term { Factor *fact ; string mulOp ; Term *term ; } Unused fields set to ““ or NULL as appropriate.

  20. Factors • In syner.h : struct Factor { bool literal ; // Tag field DataType type ; // Type field string litBool ; // Boolean literal string litString ; // String literal int litInt ; // Integer literal SymTab *ident ; // Identifier Expression *bexp ; // Bracket expression Factor *nFactor ; // Negated factor } ; • Literal is short for literal constant. • Note that C-- does not allow negative integer literals. • Makes things a lot simpler. • Unused fields are ignored. • Set to ““ or 0 or NULL as appropriate.

  21. be1 : relOp : ““ be2 : NULL term : addOp : ““ bexp : NULL fact : mulOp : ““ term : NULL literal : true type : intData litBool : ““ litString : ““ litInt : 42 ident : NULL bexp : NULLnFactor : NULL Example Expressions • Example C-- expression : 42 • Schematic diagram :

  22. be1 : * relOp : “<“ be2 : term : addOp : ““ bexp : NULL fact : mulOp : ““ term : NULL literal : false type : intData litBool : ““ litString : ““ litInt : 0 ident : bexp : NULLnFactor : NULL Symbol table entry for a Example Expressions II * as from be1 on last slide. • Example C-- expression : 42 < a • Schematic diagram :

  23. Recursive Descent Parsing • Notice that the AST is a recursive data structure. • Definition : for recursion see recursion. • This is necessary because C-- has a recursive syntax : • while and if statements may include other statements. • Expressions include basic expressions which include terms which include factors which include expressions. • The simplest way to syntax analyse a recursive syntax is via a method known as recursive descent syntax analysis. • Usually called recursive descent parsing although technically parsing is syntax analysis plus lexical analysis.

  24. Recursive Descent Parsing II • In a recursive descent parser we write a subprogram to recognise each of the different syntactic constructs. • One subprogram per production rule (more or less). • Each analysis subprogram calls the lexAnal to get tokens as required. • Each analysis subprogram may call other functions to recognise component syntactic constructs. • synStatement will call synAssign, synIf, synWhile etc. • synIf and synWhile will call synStatementrecursively. • Lots more on how to do recursive descent parsing in the next lecture.

  25. Chomsky Grammars • Noam Chomsky is an American linguist (among other things). • Chomsky’s theories about the syntactic structure of languages are very popular with Computists. • Much less popular with Linguists. • Chomsky identified 4 types of grammar. • Increasing levels of restrictions on the production rules. • In a Chomsky type 0 grammar all production rules are of the form A ::= alpha where A is a non-empty string of non-terminal symbols and alpha is a string of terminal and/or non-terminal symbols. • Type 0 is the most general type of grammar. • Often called free grammars. • Natural language grammars.

  26. Chomsky Grammars II • A Chomsky type 1 grammar contains only production rules of the form beta A gamma ::= beta alpha gamma where A is a single non-terminal symbol and alpha, beta and gamma are strings of terminal and/or non-terminal symbols. • i.e. each production replaces a single non-terminal symbol in a particular context. • Often called context-sensitive grammars.

  27. Chomsky Grammars III • A Chomsky type 2 grammar contains only production rules of the form A ::= alpha where A is a single non-terminal symbol and alpha is a string of terminal and/or non-terminal symbols. • Often called context-free grammars. • A Chomsky type 3 grammar contains only production rules of the forms A ::= a and A ::= bB in which A and B are single non-terminal symbols and a and b are single terminal symbols. • Often called regular-expression grammars.

  28. Programming Languages And Chomsky Grammars • In programming languages the syntax of statements, expressions etc. is defined by a type 2 grammar. • Syntax analysis works on type 2 grammars. • Type 3 grammars are used to define the micro-syntaxof lexical items such as identifiers, literals and strings. • Lexical analysis works on type 3 grammars. • Note that a type 3 grammar is also a type 2 grammar. • A type 2 grammar is also a type 1 grammar. • A type 1 grammar is also a type 0 grammar. • In general type 0 grammars cannot be analysed by a computer. • The other 3 types can be analysed by a computer.

  29. Backtracking • Consider the following grammar : S ::= ‘c’ A ‘d’ A ::= ‘a’ ‘b’ | ‘a’ • Suppose the input sequence is “cad”. • Parser consumes ‘c’. • Chooses first alternative for A and consumes ‘a’. • Parser now looking for a ‘b’ which is not present. • Parser must backtrack and try the second alternative for A. • Parser consumes ‘a’. • Parser consumes ‘d’. • Backtracking is inefficient and makes the parser more complex to write.

  30. Predictive Parsing • It is reasonably straightforward to construct a grammar which can be parsed without backtracking. • Simplest type is the LL(1) grammar : • An LL(1) grammar can be parsed by reading the input left to right. • At each step the leftmost derivation is produced. • Only the leftmost non-terminal in the sequence is replaced. • Which production rule should be used at each step can be uniquely determined from the tokens already consumed, the current input token and a peek at the next input token. • 1 token lookahead. • C-- has an LL(1) grammar. • Ideal for recursive descent predictive parsing.

  31. Summary • Syntax analysis : determining whether or not a file contains a syntactically correct program. • Recursive descent parsing : One subprogram to syntax analyse each (major) syntactic construct. • Symbol table holds information about user defined identifiers for use in syntax analysis and code generation. • Abstract syntax tree holds an abstract representation of the statements for use in syntax analysis and code generation. • Chomsky grammars are used to define the syntax of programming languages. • C-- has an LL(1) Chomsky type 2 grammar.

More Related