Week 7

Week 7 • Questions / Concerns • What’s due: • Lab3 next Monday 5/19 • Coming up: • Lab2 & Lab3 check-off next week • Lab3: LL(1) • Bottom-Up Parsers • LR(1) parsers • Using tools

LL(1) • Another top-down parser • It’s a table-driven parser. • LL(1) • L – first L, the input is from left to right • L – second L, leftmost derivation (top-down) • 1 – one token look ahead • Grammar pre-req: • No left recursion • Unit productions are okay, but should minimize • MUST left factor to ensure one-token look ahead. • Procedure: • Compute First and Follow sets from the grammar

LL(1) Example E -> TX X -> +TX | -TX | T -> FYY -> *FY | /FY |  F -> id | num | ( E )

In-Class Exercise #9 E -> TX X -> +TX | -TX | T -> FYY -> *FY | /FY |  F -> id | num | ( E ) Parse “a * b + 3” using this table

In-Class Exercise #9 Parse “a * b + 3” using this table Stack Input Action $E a * b + 3 $ [E, a] -> TX$XT a * b + 3 $

Lab 3 • The purpose of this lab is to demonstrate that LL(1) parsers can parse any LL(1) table (any language). • Input: • Three LL(1) tables, one for each test language. • Input program / string • Output: • Showing parsing steps (stack, input, action) and final result (Yes/No) Lab 3 demo

Schedule • Week 7: • Wednesday / Friday : Back-end compilation steps, project info • Week 8: • Monday, 5/19, Lab3 due, Project Symposium • Tuesday, 5/20, Lab2 & lab3 check-off • 12 to 2pm, 3:30 to 5:30 • Wednesday, 5/21, Test#2, check-off continues for Lab2 & Lab3 • Friday, 5/23, project work day, no class. • Week 9: • Monday: holiday • Week 10: • Thursday: Final project due • Friday: Project check-off in class / pizza party

Output from Parser • Yes/No parser • Most compilers are one-pass. • You don’t read the input file or revisit the input tokens again after read. • In order to process the parsed statements, need to save them into some sort of data structure. • Parse tree / list is the most common choice. • A simplified language is another choice, but it requires parsing again.

Parse Tree Data Decl if stmt int x Expr && Assign if stmt Simple_expr

Week 7 • Questions / Concerns • What’s due: • Lab3 next Monday 5/19 • Test #2 next Wednesday, 5/21 – covering recursive descent & LL(1) • Including grammar modifications • Information for the project • Additions to the symbol table • Semantic Analysis • Binding • Type binding / checking • Scope • Lifetime • Intermediate representation • Back-end compiler

Structure of Compilers Front-end skeletal source program preprocessor Modified Source Program Syntax Analysis (Parser) Lexical Analyzer (scanner) Tokens Syntactic Structure Semantic Analysis Intermediate Representation Optimizer Symbol Table Code Generator Back-end Target machine code

Compiler revisions • What’s included with every new release?? • Language changes are few and far in between. • Last C++ standard was finalized in 2010. • Compilers are finally catching up with some of the changes. • Examples: • auto keyword • Range-based for loop • Back-end optimization and code generation. • Most common

auto keyword in C++ • Before C++0x standard, auto is a storage specifier on variables. • There are 3 specifiers: • static • auto - optional • extern //File1.cpp //File2.cpp int f1() { extern int g_x; int g_x; static int y; auto int z; //same as int z } //z goes out of scope automatically

auto keyword in C++ • After C++0x • The old auto keyword has been replaced with a new usage (new grammar). auto x = 5; //auto binds a type to x based on the assigned value. //you can only do this once however since x can only //be declared once. map<pair<string, string>, vector<std::string>> table; pair<map<pair<string, string>, vector<string> >::iterator, bool> status; status = table.insert(make_pair(key, rule)); auto status = table.insert(…);

Range-based for loop • Before C++0x for (int i =0; i< MAX; i++) cout << someArray[i]; for_each(someArray, someArray+MAX, myFunction); void myFunction(int i){ cout << i; } • After C++0x (not available in VS2010 but available from VS2012 on) for (int i : someArray) cout << i; for (int i : someVector) cout << i;

Symbol Table Revisited • Information about each symbol • Name • Type • Use (variable, function, parameters, typename, etc.) • Scope • Lifetime

Binding • Binding is association of meaning/operation to a symbol • Static • It occurs before runtime and remains unchanged throughout program execution. • Dynamic • It occurs at runtime and can change in the course of program execution.

Type Binding • Before a variable can be referenced in a program, it must be bound to a data type. • Two important questions to ask: 1. How is the type specified? • Explicit Declaration • Implicit Declaration • All variable names that start with the letters ‘i’ - ‘r’ are integer, real otherwise • @name is an array, %something is a hash structure • Determined by context and value

Type Binding 2. When does the binding take place • Explicit declaration (static) • Implicit declaration (static) • Determined by context and value (dynamic) • Dynamic Type Binding • When a variable gets a value, the type of the variable is determined right there and then.

Dynamic Type Binding • Specified through an assignment statement (set x ‘(1 2 3)) <== x becomes a list (set x ‘a) <== x becomes an atom • Advantage: • flexibility (generic program units) • Disadvantages: 1. High cost (dynamic type checking and interpretation) 2. Type error detection by the compiler is difficult

Type Checking - • Type checking is the activity of ensuring that the operands of an operator are of compatible types • A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler-generated code, to a legal type. This automatic conversion is called a coercion. • Two kinds of type checking • Static Type Checking (Compile time) • what we will be doing for the final project. • Dynamic Type Checking (Run time) • Rarely done because it’s expensive and slow.

Static Type Checking & Strong Typing • Advantage of strong typing: • Allows the detection of the misuses of variables that result in type errors. • Languages: • 1. FORTRAN 77 is not: parameters, EQUIVALENCE • 2. Pascal is not: variant records • 3. Modula-2 is not: variant records, WORD type • 4. C and C++ are not: parameter type checking can be avoided; unions are not type checked. • 5. Ada is, almost (UNCHECKED CONVERSION is loophole) (Java is similar)

Type Coercion / Warning / Errors • Type Coercion / Promotion • Different types but no info lost by changing it to a different type. double x; x = 1; //okay, no info lost • Type Warning int y; y = 3.5; //type warning, lost information • Type Errors int z; z = “Hello”; //type error

Scope • Most programming languages use Static scoping rules. • Scope of a variable is determined by compile time. • Very few programming language use dynamic scoping rules. • Scope of a variable is determined at run time.

Scope int x = 10; void f1(); void main() { int x = 20; f1(); cout << x; } void f1() { cout << x; } What’s the output?

Scope intx = 10; void f1(); void main() { intx = 20; f1(); cout << x; } void f1() { cout << x; } What’s the output?

Static scope • Based on program text • To connect a name reference to a variable, you (or the compiler) must find the declaration. • Search process: • search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name. • Static scoping is also block scoping.

Static Scope: C++ int global_X; int main() { int x; …. { int y; … } } void f1() { int z; … } global_x x main y z f1

Static Scope: C++ Scopes are easily marked with a number. Each { introduces a new scope – higher number int global_X; int main() { int x; …. { int y; … } } void f1() { int z; … } global_x (0) x (1) main y (2) x (1) f1

Dynamic Scope What’s the output? int x = 10; void f1(); void main() { int x = 20; f1(); cout << x; } void f1() { cout << x; } Dynamic scope uses the closest x on the call stack Which x? f1 Dynamic link x 20 Main x 10 Global

Scope vs. Lifetime • Scope and lifetime are sometimes closely related, but are different concepts!! • Consider a static variable in a C or C++ function void someFunction() { static int x; ... }

Structure of Compilers skeletal source program preprocessor Modified Source Program Syntax Analysis (Parser) Lexical Analyzer (scanner) Tokens Syntactic Structure Semantic Analysis Intermediate Representation Optimizer Symbol Table Code Generator Target machine code

Intermediate Representation • Almost no compiler produces code without first converting a program into some intermediate representation that is used just inside the compiler. • This intermediate representation is called by various names: • Internal Representation • Intermediate representation • Intermediate language

Intermediate Representation • Intermediate Representations are also called by the form the intermediate language takes: • tuples • abstract syntax trees • Triples • Simplied language

Intermediate Form • In general, an intermediate form is kept around only until the compiler generates code; then it cane be discarded. • Another difference in compilers is how much of the program is kept in intermediate form; this is related to the question of how much of the program the compiler looks at before it starts to generate code. • There is a wide spectrum of choices.

Abstract Syntax Tree x = y + 3; = x + y 3

Quadruples y a x b T1 T2 c T3 T4 y= a*(x+b)/(x-c); T1= x+b; (+, 3, 4, 5) T2=a*T1; (*, 2, 5, 6) T3=x-c; (-, 3, 7, 8) T4=T2/T3; (/, 6, 8, 9) y=T4; (=, 0, 9, 1)

Week 7

Week 7

Presentation Transcript

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

week 7

7 Week 7

Week 7

Week 7

Week 7

WEEK 7

Week 7

Week 7

Week 7:

Week 7