Download Presentation
## Coverage

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Coverage**• Programming Language Syntax: Syntax Specifications, • Stages in Translation: Processing Programs, Syntax Analysis, Semantic Analysis, Lexical Analyzer, Code Generation, • Regular expressions, • Finite Automata, • Grammar Types: Unrestricted, Context-Free, Context-Sensitive, Regular, BNF, EBNF, • Derivation: Parse Tree, • Grammar Issues: Ambiguous Grammars, Grammar Transformations, Syntax Diagram, • Recursive Descent Process, Shift-reduce Parsing, • Concrete and Abstract Syntax, • LL grammar and • LR grammar: SLR, LALR. • Programming the Scanner and Parser Syntax Analysis**Programming Language Syntax**• Syntax defines the structure of the language • Syntax helps in: • Language design and language comprehension • Implementing or writing the compiler, software specification and the language system as a whole • Verifying for program correctness • Definitions • Constructs: Strings that belong to the language • Syntax: The form or structure of the expression, statements, and the program unit as a whole is called as Syntax • Semantics: Semantics duly considers what happens while executing a program segment. Thus, it provides the meaning of the statements, expressions and program unit • Pragmatics: Tools provided by the translator to help in debugging and interacting with the operating system Syntax Analysis**Programming Language Syntax**• Lexeme: Lowest level syntactic unit of any language (e.g., sum, begin) • Token: Category of lexemes (e.g., Identifiers) • Any complier needs to have recognizers to recognize the syntax of the language • Notations of Expressions • Infix notation: operator symbol is present between the operands • Prefix or Polish notation: operator symbol is present before the operands • Postfix or Suffix or Reverse Polish notation: operator symbol is present after the operands • Mixfix notation: operations that don't fit into the previous notations, like if-then-else Syntax Analysis**Programming Language Syntax**• Associativity in Expressions • Left-associative: Expressions with the same operator or operator with same precedence are grouped from left to right. • Example: +, -, * and / • Right-associative: Expressions with the same operator or operator with same precedence are grouped from right to left. • Example: Assignment symbol and exponentiation • Expression Trees and their Evaluation • Expressions are expressed in the form of a tree with the root indicating the result of the expression • Traversing a tree can be done in many ways: • In-order traversal: All the nodes in the left subtree are visited first and then the root node is visited. Finally, the nodes in the right subtree are visited. • Post-order traversal: All the nodes in the left and right subtree are visited before the root node is visited. Syntax Analysis**Programming Language Syntax**• Expression Trees and their Evaluation • Traversing a tree can be done in many ways: • Pre-order traversal: The root node is visited first and then the nodes of the left and right subtree are visited. • Breadth-first traversal: Traversing is taken level by level. Finish visiting nodes at one level before moving to the next level. It is also called as level-order traversal. • Depth-first traversal: Traversing goes into the depth and then rises to the next subtree. The order of traversing the tree performed by depth-first traversal is similar to preorder traversal. Syntax Analysis**Programming Language Syntax**• Evaluation of Expressions • Applicative Order Evaluation (strict or eager evaluation): The process of evaluation is bottom-up, which means the processing starts from the leaves and moves towards the root • Normal Order Evaluation: Evaluation of an expression is done when it is needed in the computation of the result • Addition(5+2) • Addition(Y) {int Y; Y = Y + 2;} • Here, Y is replaced with 5+2 instead of doing the addition first • Lazy Evaluation (Delayed evaluation): Evaluation is postponed until it is really needed • Frequently used in functional languages. • Block Order Evaluation: This is the evaluation of an expression that contains a declaration. • Example: We could have block expression in a function that includes variable declaration in Pascal Syntax Analysis**Programming Language Syntax**• Evaluation of Expressions • Short Circuit Evaluation: When we are evaluating expressions which are of Boolean or logical, we could partially evaluate the expression and get the result • AND (X AND Y): If both X and Y are "1", then the result is "1". Otherwise, the result is "0". • OR (X OR Y): If either or both X and Y are "1", then the result is "1". Otherwise, the result is "0". • XOR (X XOR Y): If only one of them (X or Y) is "1", then the result is "1". Otherwise, the result is "0". • NOT (X): If X is "1", then the result is "0". If X is "0", then the result is "1". Syntax Analysis**Compilation Process**Syntax Analysis**Compilation Process**• Syntax Analysis is of low-level and high-level parts. • Low-level (scanner or lexical analyzer): • Mostly done using finite automata • Input symbols are scanned and grouped into meaningful units called tokens. • Tokens are formed by principle of longest substring or maximum match, using lookahead pointer • High-level part (parser or syntax analyzer) • Done using Backus-Naur Form (BNF) or Context-Free grammar • Tokens are grouped into syntactic units like expressions, statements and declarations and checked whether they confirm to the grammatical rules of the language • Identification of reserved words: Use lookup table (symbol table) • if statement: "if" "(" "y" "<" "5" ")" … • y is called as a variable, < is called as an operator, … • Tokens are represented as keywords, operators, identifiers, literals, etc. Syntax Analysis**Compilation Process**• Parser • The parser should find all syntax errors and produce the parse tree • Parsing algorithms: • Top-down: Recursive descent (which is a coded implementation) and LL Parser (which is a table driven implementation) • Bottom-up: LR grammar • Why separate the syntax analysis into scanner and parser? • Simplicity: Separating them makes the parser simpler. • Efficiency: Due to the separation, we could make optimization possible for the lexical analyzer. • Portability: Even though parts of the lexical analyzer might not be portable, we could always make the parser portable Syntax Analysis**Compilation Process**• Semantic analysis (Contextual analysis) is required to make sure that the data types match • Semantic analysis works in synchronization with the syntax analysis • Contextual analysis is used to answer the following: • Whether the variable has been declared earlier or not? • Does the declaration type match with the usage type of the variable? • Whether the initialization of the variable has been done in advance or not? • Is the reference to the array within the bounds of the array? • … • Code generation • Converting the program into executable machine code • Stages: intermediate code generation and code generation Syntax Analysis**Regular Expressions**• Regular expression is used to represent the information required by the lexical analyzer • Regular Expression Definitions: The rules of a language L(E) defined over the alphabet of the language is expressed using regular expression E. • Alternation: If a and b are regular expressions, then (a+b) is also a regular expression. • Concatenation (or Sequencing): If a and b are regular expressions, then (a.b) is also a regular expression. • Kleene Closure: If a is a regular expression, then a* means zero or more representation of a. • Positive Closure: If a is a regular expression, then a+ means one or more of the representation of a. • Empty: Empty expressions are those with no strings. • Atom: Atoms indicate that there is only one string in the expression. Syntax Analysis**Regular Expressions**Syntax Analysis**Regular Expressions**Syntax Analysis**Regular Expressions**• Regular expression to match integers and floating point numbers • To match a digit: [0-9] • To match one or more occurrences, we use [0-9]+ • To support both signed and unsigned integers: -?[0-9]+ • -? indicates the presence or absence of minus • Floating point representation: Decimal part is present before the dot • ([0-9]* \. [0-9]+) • Exponent part: Presence of the character "e" either as lower or uppercase. • “e” is followed by + or – sign which is followed by an integer. • ([eE][-+]?[0-9]+)? • Question mark at the end indicates the presence of exponent part is not compulsory. • -?(([0-9]+) | ([0-9]* \. [0-9]+) ([eE][-+]?[0-9]+)?) Syntax Analysis**Finite Automata**• Finite Automata represent computing devices that could accept or recognize the given regular expression that represent a language • Finite Automata Definitions • Alphabet (å): An alphabet is made up of finite, non-empty set of symbols. Symbols are represented using lower case Latin alphabets. Symbols are considered to be atoms which cannot be subdivided further. Ex. å = {a,b,c} • String or Word: String is a sequence of symbols formed using a single alphabet. • Given the alphabet å = {a,b,c}, the various strings that could be formed are: a, abc, aa, abcabcabc • Empty String (e): Empty string indicates a string that is composed of zero symbols. Empty string can be included in an alphabet. • Size of a String: Size of a string indicates the number of symbols present in the string. • Size of the string ab is denoted as, |ab| = 2 • Size of the string |e| = 0 Size of the string |b| = 1 Syntax Analysis**Finite Automata**• Finite Automata Definitions • Concatenation of Strings: String can be combined together to form a new string. • S1 = abc and S2 = def: S1S2 = abcdef and S2S1 = defabc • Concatenate empty string: S1e = eS1 = abce = eabc = abc = S1 • Empty string is called as the identity operator for string concatenation. • Languages (L): Language defines an infinite set of strings from a given alphabet. å = {a,b,c}, Language L = {anbncn | n ³ 0} • In this example, number of a's and b's and c's are the same. • Power of an alphabet: • Represented by the power of order n • This order represents the number of elements present in each permutation combination of the given string • For a string å = {a,b,c} • å0 = {e} • å1 = {a, b, c} • å2 = {aa, bb, cc, ab, ba, ac, ca, bc, cb} • å3 = {aaa, bbb, ccc, aab, bba, aac, cca, …} Syntax Analysis**Finite Automata**• Finite Automata Definitions • Closure of an alphabet: • Transitive Closure: • Zero or more combinations of the string. • å* = å0Èå1Èå2Èå3 = {e, a, b, c,aa, bb, cc, ab, … } • Transitive-reflexive Closure: • One or more combinations of the string. • å+ = å1Èå2Èå3 = {a, b, c,aa, bb, cc, ab, … } • Any language defined on the given alphabet is a subset of the transitive-reflexive closure of the alphabet. • "L, L Íå* • Empty Language: • Empty language is one that has no strings in it. • L = {} is an empty language. • L = {e} is not an empty language because it is made up of one string, called as the empty string. Syntax Analysis**Finite Automata**Figure 2.2. NFA for e Figure 2.3. NFA for t Figure 2.4. NFA for XY Figure 2.5. NFA for X|Y • Finite Automata Representation • Circle: state; Arrows: transition; Double circle: final state • States are indicated using numbers • Arrows are indicated using a transition variable or e Syntax Analysis**Finite Automata**Figure 2.6. NFA for X* • DFA (Deterministic Finite Automata) Vs NFA (Non-deterministic Finite Automata) • In DFA, empty transitions (e) are not allowed. Also, from any state s there should be only one edge labeled a. • Convert from NFA to DFA • Find e–closure of s: • Add s (the node itself) to its e–closure. i.e. e–closure(s) = {s} • Reachable with empty transition: If there is a node t in e–closure(s), and there exists an edge labeled e from t to u, then u is also added to e–closure(s) if u is not there already. Continue until no more nodes can be added to e–closure(s) Syntax Analysis**Finite Automata**• Convert from NFA to DFA • State transition: • From the initial e–closure, find transitions on various terminals present in the given regular expression • Example: If there is a node t in the e–closure(s), and there exists an edge labeled a (non-empty) from t to u, u is also added to e–closure(s) if u is not there already. From u, add all the nodes that could be reached using e–transition. • A transition table is drawn based on the States and Inputs. • Optimization of the transition table can be done as: • Partition the set of states into non-final and final states. • With the non-final states: • The state whose transition goes to outside the group is separated from the group. • If there are states with same transition on all the inputs, keep one of those states and replace the other entries with the preserved one. • Check for dead state. Dead state is one in which the transitions end up in the same state irrespective of the input. Also, this dead state is not the final state. Syntax Analysis**Finite Automata - Example**• Transitions for (m | n)*mnn • Find e–closure: Starting from 0, using e-transition, we could reach 0, 1, 2, 4 and 7.A = {0, 1, 2, 4, 7}. • From node 3, we can reach 6, 7, 1, 2 and 4 using e-transition. But from node 8, there is no more transition possible using e-transition. • e-Closure({3,8}) = B = {3,8} • Finally, we get B = {1, 2, 3, 4, 6, 7, 8}. • Transition of n on set A, we get C = {1,2,4,5,6,7} • Transition of n on set B, we get D = {1,2,4,5,6,7,9} • Transition of n on set D, we get E = {1,2,4,5,6,7,10} • If you apply transition of m on set C, we get B. So, we stop here because any further transition repeats to the already found sets only. Syntax Analysis**Finite Automata - Example**• Transition Table • Non-Final States (ABCD); Final State (E). • With non-final states • On input m, all of them go to B and so they are in one group. • On input n, states A, B, and C move to members of group (ABCD) but D goes to E. So, split (ABCD) into (ABC) and (D). • In (ABC), with input n, states A & C go to C but B goes to D. So, split them as (AC) and (B). • In (AC), both of have the same transitions. Thus, use only one (A) of them. • Check for dead state. In our example, there is no dead state. Syntax Analysis**Grammar Types - Definitions**• Terminal Symbols: Atomic or non-divisible symbols in any language • Non-terminal Symbols (variable symbols or syntactic categories or syntactic variable or abstraction): A single non-terminal symbol can be made of more than one Right Hand Side (RHS) derivation, separated by a divisor (|). • Variable symbol or distinguished symbol (start symbol): Basic category that is being defined • Production or Rewriting Rules: Rules that are used to define the structure of the constructs. Defines how to write any variable symbol using terminal and non-terminal symbols. Rule has a left-hand size (LHS) derived to a right-hand side (RHS) that is made up of terminal and non-terminal symbols. Syntax Analysis**Grammar Types - Definitions**• Grammar: A grammar is a finite non-empty set of rules. • Syntactic lists: Lists of syntactic nature could be represented using recursion. <ident_list> ident | ident, <ident_list> • Derivation: This is the process of repeatedly applying the rules, starting from the start symbol until there are no more non-terminal symbols to expand. Syntax Analysis**Grammar Types**• Unrestricted Grammar: • Called as Recursively Enumerable or Phrase Structured grammar or Type 0 grammar. • There is no restriction on the right hand side of the production rule. • At least one non-terminal symbol on the left side of the production rule must be present • a Þ b, where a Î(V È T)+ and b Î(V È T)* • V: finite set of Variable Symbols. • T: finite set of terminal symbols. • Example: S Þ ACaB; Ca Þ aaC Syntax Analysis**Grammar Types**• Context-Sensitive Grammar: • Called as Type 1 grammar • Requires that the right side of the production rule must not have fewer symbols compared to the left side • Called as Context-Sensitive Grammar as any replacement of a variable depends on what surrounds it • a Þd1Ad2, bÞd1wd2 • where AÎV, d1,d2 Î(VÈT)* and w Î(V È T)+ • Example: Things b Þ b Thing; Thing c Þ Other b c Syntax Analysis**Grammar Types**• Context-Free Grammar: • Called as Type 2 grammar • Developed by Noam Chomsky during the mid-1950s • The left side of a production rule is a single variable symbol and the right side is a combination of terminal and variable symbols • Production rule takes the form A Þ a where AÎV, a Î(V È T)* • Example: Fraction Þ Digit; Fraction Þ Digit Fraction Syntax Analysis**Grammar Types**• Regular Grammar: • Called as Restrictive Grammar or Type 3 grammar • Each production rule is restricted to have only one terminal or one terminal and one variable on the right side • Regular Grammars are classified as right-linear or left-linear grammars. • Right-linear grammar • A Þ xB or A Þ x where AÎV, BÎV, and xÎT • Left-linear grammar • A Þ Bx or A Þ x where AÎV, BÎV, and xÎT • Regular expressions Vs context-free grammar: • To represent lexical rules which are simple in nature, we don't need a powerful notation like context-free grammar • Regular expressions can be used to make recognizers for any language. Syntax Analysis**Grammar Types**• Backus-Naur Form (BNF): • Invented by John Backus to describe Algol 58 • Described as a metalanguage because it is a language that is used to describe another language • Considered equivalent to context-free grammar • Abstractions are used to represent various classes of syntactic structures, which act like non-terminal symbols. • To represent While statement: • <while_stmt> while ( <logic_expr> ) <stmt> • Reasons for using BNF to describe syntax are: • BNF provides a clear and concise syntax description. • The parser can be based directly on the BNF. • Parsers based on BNF are easier to handle. Syntax Analysis**Grammar Types**• Extended BNF (EBNF): • BNF’s notation + regular expressions • Different notations persist: • Optional parts: Denoted with a subscript as opt or used within a square bracket. • <proc_call> ident ( <expr_list>)opt • <proc_call> ident [ ( <expr_list>)] • Alternative parts: • Pipe (|) indicates either-or choice • Grouping of the choices is done with square brackets or brackets. • <term> <term> [+ | -] const • <term> <term> (+ | -) const • Put repetitions (0 or more) in braces ({ }) • Asterisk indicates zero or more occurrence of the item. • Presence or absence of asterisk means the same here, as the presence of curly brackets itself indicates zero or more occurrence of the item. • <ident> letter {letter | digit}* • <ident> letter {letter | digit} Syntax Analysis**Grammar Types**• Differences between BNF and EBNF notations • BNF: • <expr> <expr> + <term> | <expr> - <term> | <term> • <term> <term> * <factor> | <term> / <factor> | <factor> • EBNF: • <expr> <term> {[+ | -] <term>}* • <term> <factor> {[ * | / ] <factor>}* • EBNF uses the final replacement of <expr> by the <term> and provides the right hand side without any <expr> entry there. Syntax Analysis**Derivation**• Apply the grammar to the start symbol <program> and continue to expand until there is no more non-terminal symbol left on the right-hand side • Methods of Derivation • Leftmost derivation is a process by which the leftmost non-terminal in each sentential form is expanded • Parse-tree or Derivation tree • Top-down parser keeps the start symbol as the root of the tree. Then, it replaces every variable symbol with a string of terminal symbols. • Bottom-up parser begins with the terminal symbols. These terminal symbols are matched with the right hand side of the production rule and are replaced with the corresponding variable symbols present in the left hand side of the production rule. • Parse trees can be used to attach semantics of a construct to its syntactic structure, called as syntax-directed semantics Syntax Analysis**Derivation - Example**• Given the regular grammar S ::= aS | bS | a | b, check whether the grammar can derive the form anbn. • Let's try for a1b1; S aS ab • Let's try for a2b2; S aS aaS aabS aabb • Let's try for a3b3; S aS aaS aaaS aaabS aaabbS aaabbb • We are able to attain the required format using this regular grammar. Syntax Analysis**Grammar Issues**• Ambiguities in Grammar • Any grammar is said to be ambiguous if it generates a sentential form that has two or more distinct parse trees. • Ex. If statement with dangling else. Syntax Analysis**Grammar Transformations**• Left Factorization: • Initial element of the options in right side of the given rule is same • N XY | XZ X (Y|Z) • Elimination of Left Recursion: • First element on the right hand side causes transition to the left hand side of the rule • N X | NY XY* • The termination of the NY is possible only if we replace N with X. • If N X is used without the use of N NY, then there will be no Y. • N NY NYY XYY • Substitution of Non-terminal Symbols: • Presence of any non-terminal symbol in the right hand side of the given rule should be replaced using another rule. • N X and M N can be changed as N X and M X Syntax Analysis**Syntax Diagram**• Called as Syntax Charts or Railroad Diagram • Developed by Niklaus Wirth in 1970 • Used to visualize rules in the form of diagrams • Used to represent EBNF notations and not BNF notations • Variables are represented by rectangles and terminal symbols are represented by circles (sometimes oval shape) • Each production rule is represented as a directed graph whose vertices are symbols Syntax Analysis**Recursive Descent Parsing**• There is a subprogram for each non-terminal in the grammar that parses the sentences that are generated by the non-terminal • For proceeding with the correct grammatical rule, we match each terminal symbol in the right hand side with the next input token. • If there is a match, we continue further. • Otherwise, an error is generated or other rules are tried • If a non-terminal has more than one RHS, we determine which one to parse first using: • Choose the correct RHS based on the next token (lookahead). • Next token is compared with the first token that can be generated by each RHS until a match is found. • If there is no match, then it is considered as a syntax error. • Shift-Reduce Parsing: With the given grammar and given input string, we reduce the right hand side of the input string to attain the start symbol of the grammar Syntax Analysis**Concrete and Abstract Syntax**• Concrete Syntax: • Defines the structure of all the parts of a program like arithmetic expressions, assignments, loops, functions, definitions, etc. • Context-Free grammars, BNF, EBNF, etc are of concrete syntax type. • Assignment Identifier = Expression; • Expression Term | Expression + Term • Abstract Syntax: • Generated by the parser and is used to link syntax and semantics of a program • Unlike concrete syntax, abstract syntax provides only the essential syntactic elements and does not describe how they are structured • Statement = Assignment | Loop • Assignment = Variable target; Expression source • Ambiguity occurs in concrete syntax but not in abstract syntax Syntax Analysis**Symbol Table**• Identification Tables • Called as symbol tables. • A dictionary-type data structure to store identifier names along with corresponding attributes • Organization of identification table depends on the "block structure" used in different languages • Monolithic block structure: e.g. BASIC, COBOL • Flat block structure: e.g. Fortran • Nested block structure is used in the modern "block-structured" programming languages (e.g. Algol, Pascal, C, C++, Scheme, Java, …) • Monolithic Block Structure: • A single block is used for the entire program • Every identifier is visible throughout the entire program • Scope of each identifier is the whole program and cannot be declared twice Syntax Analysis**Symbol Table**• Flat Block Structure: • Whole block area is divided into several disjoint blocks • Declarations can be local or global • Identifiers can be redefined in another block • Local declaration is given higher priority over global declaration • Nested Block Structure: • Blocks may be nested one within another • Scope of an identifier depends on the level of nesting present • An identifier cannot be defined more than once at the same level within the same block Syntax Analysis**Symbol Table Structure**• Unordered list: Data could be stored in an array or a linked list. • Ordered list: • Entries in the list are ordered • Searching is faster • Insertion of data into the list is an expensive process • Binary Search Tree: • Using a binary search tree, the searching time takes O(log(n)). • Hash Table: • Most commonly used option • Access the data can be done in constant time • Storage of data is not time consuming Syntax Analysis**LL Grammar**• First L in LL specifies that a left-to-right scan of the input is handled • Second L specifies that a leftmost derivation is generated • First step towards using LL grammar is elimination of common prefix. Note:a and b can match zero or more elements. • Form is B ab1 | ab2 | … |abm |Xm+1| Xm+2 | … | Xm+n • Replace it with • B aB1 | Xm+1| Xm+2 | … | Xm+n • B1 b1 | b2 | … |bm • Convert the grammar into unambiguous one • Make sure they obey precendence and associativity rules • Start from the terminal and move from high precedence to low precedence • Consider the grammar: E E + E | E * E | (E) | id • Select the terminals and name them differently. • Factor (E) | id • * operator has high priority that + operator. So, select E E * E next • E E * E is considered first. Syntax Analysis**LL Grammar**• Convert the grammar into unambiguous one • Consider the grammar: E E + E | E * E | (E) | id • * has high priority that +. So, select E E * E next • To provide the link between E * E and the Factor, use the pipe (|) operator. • With no link, the non-terminal will never become a terminal. • Give a new name “Term” for the element. • Term Term * Factor | Factor • Then, consider E E + E and change it also. • Expression Expression + Term | Term • So, F (E) | id; T T * F | F; E E + T | T • Remove Left-recursion • If A Aa1 | Aa2 | … | Aam | b1 | b2 | … | bn • Where no bi begins with an A. Where A is E, a is +T & b is T • Replace the above as: • A b1A' | b2A' |… | bnA' • A' a1A' | a2A' | … | amA' | e Syntax Analysis**LL Grammar**• Consider the grammar • ETE'; E'+TE'|e; TFT'; T'*FT'|e; F(E)|id • FIRST & FOLLOW • FIRST: • If X is terminal, then FIRST(X) is {X}. • If X is non-terminal and X aa is a production, then add a to FIRST(X). If X e is a production, then add e to FIRST(X). • If X Y1Y2…Yk is a production, then for all i such that all of Y1,..Yi-1 are non-terminals and FIRST(Yj) contains e for j=1,2,… i-1, add every non-e symbol in FIRST(Yj) to FIRST(X). If e is in FIRST(Yj) for all j=1,2,…,k, then add e to FIRST(X). • The third rule of FIRST is like E TE' where T FT' and F(E)|id. Thus, what is in FIRST(F) will be in FIRST(E) & FIRST(T). • FIRST(E) = FIRST(T) = FIRST(F) = {(,id} FIRST(E')={+, e} • FIRST(T')={*, e} Syntax Analysis**LL Grammar**• FIRST & FOLLOW • FOLLOW: (a is any string of grammar symbols; a can also be e.) • $ in FOLLOW(X), where X is the start symbol. • If there is a production AaBb, b¹ e, then everything in FIRST(b) but e is in FOLLOW(B). • If there is a production AaB, or a production AaBb where FIRST(b) contains e, then everything in FOLLOW(A) is in FOLLOW(B). • In FOLLOW, take the first rule apply to all the grammar and then take the second rule apply to all the grammar and so on. • Note: Refer to notes for verbal explanation for FIRST & FOLLOW rules Syntax Analysis**LL Grammar**• FIRST & FOLLOW • FOLLOW(E) = FOLLOW(E') = {), $} • FOLLOW(T) = FOLLOW(T') = {+,), $} • FOLLOW(F) = {+,*,),$} • Generating the parsing table • A Grammar whose parsing table has no multiply-defined entries is said to be LL(1). a is any string of grammar symbols; a can also be e. • For each production Aa of the grammar, do steps 2 & 3. • For each terminal a in FIRST(a), add Aa to M[A,a]. • If e is in FIRST(A), add Ae to M[A,b] for each terminal b in FOLLOW(A). If e is in FIRST(A) and $ is in FOLLOW(A), add Ae to M[A,$]. • Note: Here, M[A,b] indicates the corresponding cell in the table, whose row corresponds to the non-terminal A and column corresponds to the terminal b. 4. Make each undefined entry of M error. Syntax Analysis**LR Grammar**• Left to Right grammar • Most powerful shift-reduce parsing technique • Non-backtracking shift-reduce parsing which could detect a syntactic error as soon as possible • Represented as LR(k) where k indicates the look-ahead value • LR(1) means no look-ahead: only next element is considered and not anything those follows the next element. • Can parse all grammars that could be parsed with predictive parsers like LL(1) grammar • Types of LR grammars: • SLR – Simple LR parser. • LR – Most general LR parser. • LALR – Intermediate LR parser (Look-ahead LR parser). • All the types use the same algorithm but with different parsing table Syntax Analysis**LR Grammar**• LR parser configuration: (S0 X1 S1 ... Xm Sm, ai ai+1 ... an $), which includes Stack values and the rest of Inputs • Xi is a grammar symbol • Si is a state • ai is an input • Initial Stack contains just S0 Figure 2.11. LR Parsing Syntax Analysis**LR Grammar**• Parser takes action using Sm and ai • shift s: shifts the next input symbol ai and the state s onto the stack • (S0 X1 S1 ... Xm Sm, ai ai+1 ... an $) (S0 X1 S1 ... Xm Sm ai s, ai+1 ... an $) • reduce A (or rn where n is a production number) • pop r (r is the length of ) number of items from the stack; This is done so that we can replace the right hand side with the left hand side of the grammar. • then push A and s where s=goto[sm-r,A]. Here, m-r indicates that r items have been taken of the stack. • (S0 X1 S1 ... Xm Sm, ai ai+1 ... an $) (S0 X1 S1 ... Xm-r Sm-r A s, ai ... an $) • Output is the reducing production rule, reduce A • Accept: Parsing is successfully completed. • Error: Parser has detected an error. This might because there is an empty entry in the action table. • GOTO takes a state and grammar symbol as arguments and produces a state. Syntax Analysis