Chapter 4 syntax analysis part 1 grammar concepts
This presentation is the property of its rightful owner.
Sponsored Links
1 / 61

Chapter 4: Syntax Analysis Part 1: Grammar Concepts PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

Chapter 4: Syntax Analysis Part 1: Grammar Concepts. Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155. [email protected] http://www.engr.uconn.edu/~steve (860) 486 - 4818.

Download Presentation

Chapter 4: Syntax Analysis Part 1: Grammar Concepts

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 4: Syntax AnalysisPart 1: Grammar Concepts

Prof. Steven A. Demurjian

Computer Science & Engineering Department

The University of Connecticut

371 Fairfield Way, Unit 2155

Storrs, CT 06269-3155

[email protected]

http://www.engr.uconn.edu/~steve

(860) 486 - 4818

Material for course thanks to:

Laurent Michel

Aggelos Kiayias

Robert LeBarre


Syntax Analysis - Parsing

  • An overview of parsing : Functions & Responsibilities

  • Context Free Grammars: Concepts & Terminology

  • Writing and Designing Grammars

  • Resolving Grammar Problems / Difficulties

  • Top-Down Parsing

    • Recursive Descent & Predictive LL

  • Bottom-Up Parsing

    • LR & LALR

  • Key Issues in Error Handling

  • Concluding Remarks/Looking Ahead


An Overview of Parsing

  • Why are Grammars to formally describe Languages Important ?

    • Precise, easy-to-understand representations

    • Compiler-writing tools can take grammar and generate a compiler

    • allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones

    • ADA  ADA9x, C++ Adds: Templates, exceptions,

How do grammars relate to parsing process ?


Parsing During Compilation

errors

token

lexical analyzer

rest of front end

source program

parse tree

parser

get next token

symbol table

regular expressions

interm repres

  • also technically part or parsing

  • includes augmenting info on tokens in source, type checking, semantic analysis

  • uses a grammar to check structure of tokens

  • produces a parse tree

  • syntactic errors and recovery

  • recognize correct syntax

  • report errors


Parsing Responsibilities

  • Syntax Error Identification / Handling

  • Recall typical error types:

    • Lexical : Misspellings

    • Syntactic : Omission, wrong order of tokens

    • Semantic : Incompatible types

    • Logical : Infinite loop / recursive call

  • Majority of error processing occurs during syntax analysis

  • NOTE: Not all errors are identifiable !! Which ones?


Key issues in Error Handling

  • Detection

    • Actual Detection

    • Finding position at which they occur

  • Reporting

    • Clear / accurate presentation

  • Recovery

    • How to skip overto continue and find later errors

    • Cannot impact compilation of correct programs


What are some Typical Errors ?

#include<stdio.h>

int f1(int v)

{ int i,j=0;

for (i=1;i<5;i++)

{ j=v+f2(i) }

return j; }

int f2(int u)

{ int j;

j=u+f1(u*u)

return j; }

int main()

{ int i,j=0;

for (i=1;i<10;i++)

{ j=j+i*I; printf(“%d\n”,i);

printf("%d\n",f1(j));

return 0;

}

As reported by MS VC++

'f2' undefined; assuming extern returning intsyntax error : missing ';' before ‘}‘syntax error : missing ';' before ‘return‘fatal error : unexpected end of file found

Which are “easy” to recover from? Which are “hard” ?


Error Recovery Strategies

Panic Mode – Discard tokens until a “synchro” token is

found ( end, “;”, “}”, etc. )

-- Decision of designer

-- Problems:

skip input miss declaration – causing more errors

miss errors in skipped material

-- Advantages:

simple suited to 1 error per statement

Phase Level – Local correction on input

-- “,” ”;” – Delete “,” – insert “;”

-- Also decision of designer

-- Not suited to all situations

-- Used in conjunction with panic mode to

allow less input to be skipped


Error Recovery Strategies – (2)

Error Productions:

-- Augment grammar with rules

-- Augment grammar used for parser

construction / generation

-- Add a rule for

:= in C assignment statements

Report error but continue compile

-- Self correction + diagnostic messages

-- What are other rules ? (supported by yacc)

Global Correction:

-- Adding / deleting / replacing symbols is

chancy – may do many changes !

-- Algorithms available to minimize changes

costly - key issues

What do you think of each approach? What approach do compilers you’ve used take ?


Motivating Grammars

Reg. Lang.

CFLs

  • Regular Expressions

  •  Basis of lexical analysis

  •  Represent regular languages

  • Context Free Grammars

  •  Basis of parsing

  •  Represent language constructs

  •  Characterize context free languages

EXAMPLE: anbn , n  1 : Is it regular ?


Context Free Languages

Context Free

Languages

Regular Languages

Context Free Languages

Reg. Languages

Token Structure

Sentence Structure

Parser Generation

Scanner Generation


Context Free Grammars : Concepts & Terminology

Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:

T: Terminals / tokens of the language

NT: Non-terminals to denote sets of strings generatable by the grammar & in the language

S: Start symbol, SNT, which defines all strings of the language

PR: Production rules to indicate how T and NT are combines to generate valid strings of the language.

PR: NT  (T | NT)*

Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !


Context Free Grammars : A First Look

assign_stmt id := expr ;

expr  term operator term

term id

term real

term integer

operator +

operator -

What do “BLUE” symbols represent?

What do “BLACK” symbols represent?

Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens.

Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.


How is Grammar Used ?

Given the rules on the previous slide, suppose

id := real + int;

is input. Is it syntactically correct? How do we know?

expr is represented as: expr term operator term

Is this accurate / complete?

expr  expr operator term

expr  term

How does this affect the derivations which are possible?


Derivation and Parse Tree

Let’s derive: id := id + real – integer ;

What’s its parse tree ?


Derivation Example

  • Let’s derive: id := id + real – integer ;

assign_stmt assign_stmt → id := expr ;

→ id := expr ;expr → expr operator term

→ id := expr operator term;expr → expr operator term

→ id := expr operator term operator term; expr → term

→ id := term operator term operator term; term → id

→ id := id operator term operator term; operator→ +

→ id := id + term operator term; term→ real

→ id := id + real operator term; operator→ -

→ id := id + real - term;term→ integer

→ id := id + real - integer;

assign_stmt → id := expr ;

expr → expr operator term

expr → term

term → id

term → real

term → integer

operator → +

operator → -


Example Grammar

expr  expr op expr

expr ( expr )

expr - expr

expr id

op +

op -

op *

op /

op 

Black : NT

Blue : T

expr : S

9 Production rules

To simplify / standardize notation, we offer a synopsis of terminology.


Example Grammar - Terminology

Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings

Non Terminals: A,B,C,S, black strings

T or NT: X,Y,Z

Strings of Terminals: u,v,…,z in T*

Strings of T / NT:  ,  in ( T  NT)*

Alternatives of production rules:

A 1; A 2; …; A k;  A  1 | 2 | … | 1

First NT on LHS of 1st production rule is designated as start symbol !

E  E A E | ( E ) | -E | id

A  + | - | * | / | 


Grammar Concepts

A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.

EXAMPLE: E  -E (the  means “derives” in one step) using the production rule: E  -E

EXAMPLE: E  E A E  E * E  E * ( E )

DEFINITION:  derives in one step

 derives in  one step

 derives in  zero steps

+

*

EXAMPLES:  A      if A  is a production rule

1  2 …  n  1  n ;    for all 

If    and    then   

*

*

*

*


How does this relate to Languages?

+

Let G be a CFG with start symbol S. Then S W (where W has no non-terminals) represents the language generated by G, denoted L(G). So WL(G)  S W.

+

W : is a sentence of G

When S  (and  may have NTs) it is called a sentential form of G.

EXAMPLE: id * id is a sentence

Here’s the derivation:

E  E A E E * E  id * E  id * id

E id * id

Sentential forms

*


Leftmost and Rightmost Derivations

lm

rm

Leftmost: Replace the leftmost non-terminal symbol

E  E A E  id A E  id * E  id * id

Rightmost: Replace the leftmost non-terminal symbol

E  E A E  E A id  E *id  id * id

lm

lm

lm

lm

rm

rm

rm

rm

Important Notes: A  

If A    , what’s true about  ?

If A    , what’s true about  ?

Derivations: Actions to parse input can be represented pictorially in a parse tree.


Examples of LM / RM Derivations

E  E A E | ( E ) | -E | id

A  + | - | * | / | 

A leftmost derivation of : id + id * id

A rightmost derivation of : id + id * id


Derivations & Parse Tree

E

E

E

E

*

E

E

E

E

A

A

A

A

E

E

E

E

id

*

id

id

*

E  E A E

 E * E

 id * E

 id * id


Parse Trees and Derivations

E

E

E

E

E + E  id+ E

E

E

E

E

+

+

*

+

E

E

E

E

id+ E  id+ E * E

id

id

Consider the expression grammar:

E  E+E | E*E | (E) | -E | id

Leftmost derivations of id + id * id

E  E + E


Parse Tree & Derivations - continued

id+ E * E  id+id* E

id

id

id

id

id

E

E

E

E

E

E

E

E

+

*

*

+

E

E

E

E

id+id* E  id+id* id


Alternative Parse Tree & Derivation

E

E

E

*

E

E

+

E

id

id

id

E  E * E

 E + E * E

 id + E * E

 id + id * E

 id + id * id

WHAT’S THE ISSUE HERE ?


Example Pascal Grammar in Yacc

%term

T_AND T_ARRAY T_BEGIN T_CASE

T_CONST T_DIV T_DO T_RANGE

T_TO T_ELSE T_END T_FILE

T_FOR T_FUNCTION /* T_GOTO */ T_IDENTIFIER

T_IF T_IN T_INTEGER T_LABEL

T_MOD T_NOT T_REAL T_OF

T_OR /* T_PACKED */ T_NIL T_PROCEDURE

T_PROGRAM T_RECORD T_REPEAT T_SET

T_STRING T_THEN T_DOWNTO T_TYPE

T_UNTIL T_VAR T_WHILE /* T_WITH */

T_COMMA T_PERIOD T_ASSIGN T_COLON

T_LPAREN T_RPAREN T_LBRACK T_RBRACK

T_UPARROW T_SEMI

%binary T_LT T_EQ T_GT T_GE T_LE T_NE T_IN

%left T_PLUS T_MINUS T_OR

%left UNARYSIGN

%left T_MULT T_RDIV T_DIV T_MOD T_AND

%left T_NOT

%%


Example Pascal Grammar in Yacc

Start : Program_Header Declarations Block T_PERIOD

Program_Header : T_PROGRAM T_IDENTIFIER T_LPAREN Id_List T_RPAREN T_SEMI

Block : T_BEGIN Stt_List T_END

Declarations : Const_Def_Part

Type_Def_Part

Var_Decl_Part

Routine_Decl_Part

Const_Def_Part : T_CONST Const_Def_List

| /* epsilon */ ;

Const_Def_List : Const_Def

| Const_Def_List Const_Def ;

Const_Def : T_IDENTIFIER T_EQ Constant T_SEMI

Type_Def_Part : T_TYPE Type_Def_List

| /* epsilon */

Type_Def_List : Type_Def

| Type_Def_List Type_Def


Example Pascal Grammar in Yacc

Type_Def : T_IDENTIFIER T_EQ Type T_SEMI

Var_Decl_Part : T_VAR Var_Decl_List

| /* epsilon */

Var_Decl_List : Var_Decl_List Var_Decl

| Var_Decl

Var_Decl : Id_List T_COLON Type T_SEMI

Routine_Decl_Part : Routine_Decl_Part Routine_Decl

| /* epsilon */

Routine_Decl : Routine_Head T_IDENTIFIER T_SEMI

| Routine_Head Declarations Block T_SEMI

Routine_Head : T_PROCEDURE T_IDENTIFIER Params T_SEMI

| T_FUNCTION T_IDENTIFIER Params Function_Type T_SEMI

Params : T_LPAREN Params_List T_RPAREN

| /* epsilon */

Param_Group : Id_List T_COLON Type

| T_VAR Id_List T_COLON Type


Example Pascal Grammar in Yacc

Function_Type : T_COLON Type

| /* epsilon */

Params_List : Param_Group

| Params_List T_SEMI Param_Group

Constant : T_STRING | Number | T_PLUS Number | T_MINUS Number

Number : T_IDENTIFIER | T_INTEGER | T_REAL

Const_List : Constant | Const_List T_COMMA Constant

Type : Simple_Type | T_UPARROW T_IDENTIFIER | Structured_Type

Simple_Type : T_IDENTIFIER

| T_LPAREN Id_List T_RPAREN

| Constant T_RANGE Constant

Structured_Type : T_ARRAY T_LBRACK Simple_Type_List T_RBRACK T_OF Type

| T_SET T_OF Simple_Type

| T_RECORD Field_List T_END

Simple_Type_List : Simple_Type

| Simple_Type_List T_COMMA Simple_Type


Example Pascal Grammar in Yacc

Field_List : Fixed_Part /* Variant_part */

Fixed_Part : Field | Fixed_Part T_SEMI Field

Field : /* epsilon */ | Id_List T_COLON Type

Variant_part : / * epsilon * /

| T_CASE T_IDENTIFIER T_OF Variant_List

| T_CASE T_IDENTIFIER T_COLON T_IDENTIFIER T_OF Variant_List

Variant_List : Variant | Variant_List T_SEMI Variant

Variant : / * epsilon * /

| Const_List T_COLON T_LPAREN Field_List T_RPAREN

Stt_List : Statement | Stt_List T_SEMI Statement

Case_Stt_List : Case_Statement | Case_Stt_List T_SEMI Case_Statement

Case_Statement : Const_List T_COLON Statement | /* epsilon */


Example Pascal Grammar in Yacc

Statement : /* epsilon */

| T_IDENTIFIER | T_IDENTIFIER T_LPAREN Expr_List T_RPAREN

| Variable T_ASSIGN Expression | T_BEGIN Stt_List T_END

| T_CASE Expression T_OF Case_Stt_List T_END

| T_WHILE Expression T_DO Statement

| T_REPEAT Stt_List T_UNTIL Expression

| T_FOR Variable T_ASSIGN Expression T_TO Expression T_DO Statement

| T_FOR Variable T_ASSIGN Expression T_DOWNTO Expression T_DO Statement

| T_IF Expression T_THEN Statement

| T_IF Expression T_THEN Statement T_ELSE Statement

Expression : Expression relop Expression %prec T_LT

| T_PLUS Expression %prec UNARYSIGN

| T_MINUS Expression %prec UNARYSIGN

| Expression addop Expression %prec T_PLUS

| Expression divop Expression %prec T_MULT

| T_NIL | T_STRING | T_INTEGER | T_REAL

| Variable

| T_IDENTIFIER T_LPAREN Expr_List T_RPAREN

| T_LPAREN Expression T_RPAREN

| negop Expression %prec T_NOT

| T_LBRACK Element_List T_RBRACK

| T_LBRACK T_RBRACK


Example Pascal Grammar in Yacc

Element_List : Element | Element_List T_COMMA Element

Element : Expression | Expression T_RANGE Expression

Variable : T_IDENTIFIER

| Variable T_LBRACK Expr_List T_RBRACK

| Variable T_PERIOD T_IDENTIFIER

| Variable T_UPARROW

Expr_List : Expression | Expr_List T_COMMA Expression

relop : T_EQ | T_LT | T_GT | T_NE | T_LE | T_GE | T_IN

addop : T_PLUS | T_MINUS | T_OR

divop : T_MULT | T_RDIV | T_DIV | T_MOD | T_AND ;

negop : T_NOT


Resolving Grammar Problems/Difficulties

Reg. Lang.

CFLs

Regular Expressions : Basis of Lexical Analysis

Reg. Expr.  generate/represent regular languages

Reg. Languages  smallest, most well defined class

of languages

Context Free Grammars: Basis of Parsing

CFGs  represent context free languages

CFLs  contain more powerful languages

anbn – CFL that’s not regular!

Try to write DFA/NFA for it !


Resolving Problems/Difficulties – (2)

a

start

a

b

b

0

2

1

3

b

Since Reg. Lang.  Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA.

Recall: (a | b)*abb


Resolving Problems/Difficulties – (3)

a

b

i

j

i

j

Construct CFG as follows:

  • Each State I has non-terminal Ai : A0, A1, A2, A3

  • If then Aia Aj

  • If then Ai Aj

  • If I is an accepting state, Ai  : A3  

  • If I is a starting state, Ai is the start symbol : A0

: A0 aA0, A0 aA1

: A0 bA0,A1 bA2

: A2 bA3

T={a,b}, NT={A0, A1, A2, A3}, S = A0

PR ={ A0 aA0 | aA1 | bA0 ;

A1  bA2 ;

A2  3A3 ;

A3   }


How Does This CFG Derive Strings ?

a

start

a

b

b

0

2

1

3

b

vs.

A0 aA0, A0 aA1

A0 bA0, A1 bA2

A2 bA3, A3 

How is abaabb derived in each ?


Regular Expressions vs. CFGs

  • Regular expressions for lexical syntax

    • CFGs are overkill, lexical rules are quite simple and straightforward

    • REs – concise / easy to understand

    • More efficient lexical analyzer can be constructed

    • RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion. Why?

CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, …

Intended for structural associations between tokens !

Are tokens in correct order ?


Resolving Grammar Difficulties : Motivation

  • ambiguity

  • -moves

  • cycles

  • left recursion

  • left factoring

  • Humans write / develop grammars

  • Different parsing approaches have different needs

LL(k)

Recursive

LR(k)

LALR(k)

Top-Down vs. Bottom-Up

  • For: 1  remove “errors”

  • For: 2  put / redesign grammar

Grammar Problems


Resolving Problems: Ambiguous Grammars

Consider the following grammar segment:

stmt if exprthen stmt

| if exprthen stmtelse stmt

| other (any other statement)

What’s problem here ?

Consider the Program:

if e1

then if e2

then s1

else s2

Else must match to previous then.

Structure indicates parse subtree for expression.


Resulting Parse Tree

  • Easy case

    • Else must match to previous then.

if e1

then s1

else if e2

then s2

else s3


Example : What Happens with this string?

If E1then if E2then S1else S2

How is this parsed ?

if E1then

if E2then

S1

else

S2

if E1then

if E2then

S1

else

S2

vs.

What’s the issue here ?


Parse Trees for Example

if e1

then if e2

then s1

else s2

Form 1:

Form 2:

What’s the issue here ?


Removing Ambiguity

Take Original Grammar:

stmt if exprthen stmt

| if exprthen stmtelse stmt

| other (any other statement)

Revise to remove ambiguity:

stmt  matched_stmt | unmatched_stmt

matched_stmt if exprthen matched_stmt else matched_stmt | other

unmatched_stmt if exprthen stmt

| if exprthen matched_stmt else unmatched_stmt

How does this grammar work ?


Resolving Difficulties : Left Recursion

A left recursive grammar has rules that support the derivation : A  A, for some .

+

Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.

A  A  A  A … etc. A A | 

Take left recursive grammar:

A  A | 

To the following:

A’  A’

A’  A’ | 


Why is Left Recursion a Problem ?

Derive : id + id + id

E  E + T 

Consider:

E  E + T | T

T  T * F | F

F  ( E ) | id

How can left recursion be removed ?

E  E + T | T What does this generate?

E  E + T  T + T

E  E + T  E + T + T  T + T + T

How does this build strings ?

What does each string have to start with ?


Resolving Difficulties : Left Recursion (2)

For our example:

E  E + T | T

T  T * F | F

F  ( E ) | id

E  TE’ E’  + TE’ | 

T  FT’ T’  * FT’ | 

F  ( E ) | id

Informal Discussion:

Take all productions for A and order as:

A  A1 | A2 | … | Am | 1 | 2 | … | n

Where no i begins with A.

Now apply concepts of previous slide:

A  1A’ | 2A’ | … | nA’

A’  1A’ | 2A’| … | m A’ | 


Resolving Difficulties : Left Recursion (3)

S  Aa | b

A  Ac | Sd | 

S  Aa  Sda

Problem: If left recursion is two-or-more levels deep,

this isn’t enough

Algorithm:

  • Input: Grammar G with no cycles or -productions

  • Output: An equivalent grammar with no left recursion

  • Arrange the non-terminals in some order A1,A2,…An

  • for i := 1 to n do begin

  • for j := 1 to i – 1 do begin

  • replace each production of the form Ai  Aj

  • by the productions Ai  1 | 2 | … | k

  • where Aj  1|2|…|k are all current Aj productions;

  • end

  • eliminate the immediate left recursion among Ai productions

  • end


Using the Algorithm

Apply the algorithm to: A1  A2a | b

A2  A2c | A1d | 

i = 1

For A1 there is no left recursion

i = 2

for j=1 to 1 do

Take productions: A2 A1 and replace with

A2  1  | 2  | … | k |

where A1 1 | 2 | … | k are A1 productions

in our case A2  A1d becomes A2  A2ad | bd

What’s left: A1 A2a | b

A2  A2 c | A2ad | bd | 

Are we done ?


Using the Algorithm (2)

No ! We must still remove A2 left recursion !

A1 A2a | b

A2  A2 c | A2ad | bd | 

Recall:

A  A1 | A2 | … | Am | 1 | 2 | … | n

A  1A’ | 2A’ | … | nA’

A’  1A’ | 2A’| … | m A’ | 

Apply to above case. What do you get ?


Removing Difficulties : -Moves

Very Simple: A  and B uAv implies add B uv to the grammar G.

Why does this work ?

Examples:

E  TE’ E’  + TE’ | 

T  FT’ T’  * FT’ | 

F  ( E ) | id

A1 A2a | b

A2 bd A2’ | A2’

A2’  c A2’ | bd A2’ | 


Removing Difficulties : Cycles

How would cycles be removed ?

S  SS | ( S ) | 

Has: S  SS  S

S  


Removing Difficulties : Left Factoring

Problem : Uncertain which of 2 rules to choose:

stmt if exprthen stmt else stmt

| if exprthen stmt

When do you know which one is valid ?

What’s the general form of stmt ?

A  1 | 2  : if exprthen stmt

1: else stmt 2 : 

Transform to:

A   A’

A’  1 | 2

EXAMPLE:

stmt if exprthen stmt rest

rest else stmt | 


Resolving Grammar Problems : Another Look

Leads to:

  • Forcing Matching Within Grammar

    • - if-then-else else must go with most recent if

  • stmt  if expr then stmt | if expr then stmt else stmt | other

stmt  matched_stmt | unmatched_stmt

matched_stmt if exprthen matched_stmt else matched_stmt | other

unmatched_stmt if exprthen stmt

| if exprthen matched_stmt else unmatched_stmt


Resolving Grammar Problems : Another Look (2)

  • Left Factoring - Know correct choice in advance !

  • where_queries  where are_does declarations of symbol

  • | where are_does “search_option” appear

Rules given in Assignment or

where_queries  where rest_where

rest_where  are declarations of symbol

| does“search_option” appear


Resolving Grammar Problems : Another Look (3)

E  TE’ E’  + TE’ | 

E  E + T | T

T  T * F | F

F  ( E ) | id

T  FT’ T’  * FT’ | 

F  ( E ) | id

  • Left Recursion - Avoid infinite loop when choosing rules

A  A’

A’  A’ | 

A  A | 

EXAMPLE:

If we use the original production rules on input: id + id (with leftmost):

E  E + T  E + T + T  E + T + T + T  T + T + T + … + T

*

not assured

If we use the transformed production rules on input: id + id (with leftmost):

E  TE’  T + TE’  id+ TE’  id + id E’  id + id

*

*


Resolving Grammar Problems : Another Look (4)

  • Removing -Moves

E  TE’ E’  + TE’ | 

E  T

E  TE’

E’  + TE’

E’  + T

Is there still a problem ?


Resolving Grammar Problems

Reg. Lang.

CFLs

  • Note: Not all aspects of a programming language can be represented by context free grammars / languages.

  • Examples:

    • 1. Declaring ID before its use

    • 2. Valid typing within expressions (Type Rules)

    • 3. Parameters in definition vs. in call

    • 4. Method Overloading/hiding

  • These features are call context-sensitive and define yet another language class, CSL.

  • CSLs


    Context-Sensitive Languages - Examples

    • Examples:

      • L1 = { wcw | w is in (a | b)* } : Declare before use

      • L2 = { an bm cn dm | n  1, m  1 }

        • an bm : formal parameter

        • cn dm : actual parameter

    What does it mean when L is context-sensitive ?


    How do you show a Language is a CFL?

    L3 = { w c wR | w is in (a | b)* }

    L4 = { an bm cm dn | n  1, m  1 }

    L5 = { an bn cm dm | n  1, m  1 }

    L6 = { an bn | n  1 }


    Solutions

    L3 = { w c wR | w is in (a | b)* }

    L4 = { an bm cm dn | n  1, m  1 }

    L5 = { an bn cm dm | n  1, m  1 }

    L6 = { an bn | n  1 }

    S  a S a | b S b | c

    S  a S d | a A d

    A  b A c | bc

    S  XY

    X  a X b | ab

    Y  c Y d | cd

    S  a S b | ab


  • Login