Cwhsueh@csie ntu edu tw http www csie ntu edu tw cwhsueh 98 spring
Download
1 / 28

薛智文 [email protected] csie.ntu.tw/~cwhsueh/ 98 Spring - PowerPoint PPT Presentation


  • 195 Views
  • Uploaded on

Compiler TH 234, DTH 103. 薛智文 [email protected] http://www.csie.ntu.edu.tw/~cwhsueh/ 98 Spring. Why Study Compilers?. Excellent software-engineering example --- theory meets practice. Essential software tool. Influences hardware design, e.g., RISC, VLIW.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 薛智文 [email protected] csie.ntu.tw/~cwhsueh/ 98 Spring' - arawn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cwhsueh@csie ntu edu tw http www csie ntu edu tw cwhsueh 98 spring

Compiler

TH 234, DTH 103

薛智文

[email protected]

http://www.csie.ntu.edu.tw/~cwhsueh/

98 Spring



Why study compilers
Why Study Compilers?

  • Excellent software-engineering example --- theory meets practice.

  • Essential software tool.

  • Influences hardware design, e.g., RISC, VLIW.

  • Tools (mostly “optimization”) for enhancing software reliability and security.

/27


Compilers architecture
Compilers & Architecture

  • Modern architectures have very complex structures, especially opportunities for parallel execution.

  • Sequential programs can only make effective use of these features via an optimizing compiler.

  • Hardware question: If we implemented this, could a compiler use it?

/27


Software reliability
Software Reliability

  • Optimization technology (data-flow analysis) used in:

    • Lock/unlock errors.

    • Buffers not range-checked.

    • Memory Leaks.

    • SQL injection bugs..

/27


What this course offers
What this Course Offers?

  • Compiler methodology for both compiler implementation and related applications.

  • Theoretical framework.

  • Key algorithms.

  • Hands-on experience.

  • Nongoal: build a complete optimizing compiler.

/27


Course outline
Course Outline

  • Part 1 --- Introduction.

  • Part 2 --- Scanner.

  • Part 3 --- Parser.

  • Part 4 --- Syntax-Directed Translation.

  • Part 5 --- Symbol Table.

  • Part 6 --- Intermediate Code Generation.

  • Part 7 --- Run Time Storage Organization.

  • Part 8 --- Optimization.

  • Part 9 --- How to Write a Compiler.

  • Part 10 --- A Simple Code Generation (PSEUDO) Example.

/27


Introduction
Introduction

source program

Compiler

target program

input

output

  • Compiler is one of language processors.

/27


What is a compiler
What is a Compiler?

source program

Compiler

target program

input

output

  • Definitions:

    • The software system translatesdescription of computations into a program executable by a computer.

    • Source and target must be equivalent!

  • Compiler writing spans:

    • programming languages;

    • machine architecture;

    • language theory;

    • algorithms and data structures;

    • software engineering.

  • History:

    • 1950: the first FORTRAN compiler took 18 man-years;

    • now: using software tools, can be done in a few months as a student’s project.

/27


An interpreter
An Interpreter

Interpreter

source program

output

input

/27


A hybrid compiler
A Hybrid Compiler

source program

Translator

intermediate program

Virtual

Machine

output

input

/27


A language processing system
A Language-Processing System

Preprocessor

Compiler

Assembler

Linker/Loader

modified source program

target assembly program

relocatable machine code

target machine code

source program

library files

relocatable object files

/27


Applications
Applications

3 + 5 – 6 * 6

3 5 + 6 6 * –

  • Computer language compilers.

  • Translator: from one format to another.

    • query interpreter

    • text formatter

    • silicon compiler

    • infix notation  postfix notation:

    • pretty printers

    • · · ·

  • Software productivity tools.

/27


Relations with computational theory
Relations with Computational Theory

  • Computational theory:

    • a set of grammar rules ≡ the definition of a particular machine.

      • also equivalent to a set of languages recognized by this machine.

    • a type of machines: a family of machines with a given set of operations, or capabilities;

    • power of a type of machines

      ≡ the set of languages that can be recognized by this type of machines.

/27


A language processing system1
A Language-Processing System

Preprocessor

Compiler

Assembler

Linker/Loader

modified source program

target assembly program

relocatable machine code

target machine code

source program

library files

relocatable object files

/27


Phases of a compiler
Phases of a Compiler

character stream

Lexical Analyzer (scanner)

tokenstream

Syntax Analyzer (parser)

abstract-syntax tree

Semantic Analyzer

annotated abstract-syntax tree

Intermediate Code Generator

intermediaterepresentation

Machine-Independent Code Optimizer

optimizedintermediaterepresentation

Code Generator

relocatablemachinecode

Machine-Dependent Code Optimizer

target-machinecode

front-end

Error

Handling

Symbol

Table

back-end

/27


Lexical analyzer scanner
Lexical Analyzer (Scanner)

  • Actions:

    • Reads characters from the source program;

    • Groups characters into lexemes , i.e., sequences of characters that “go together”, following a given pattern ;

    • Each lexeme corresponds to a token .

      • the scanner returns the next token, plus maybe some additional information, to the parser;

    • The scanner may also discover lexical errors, i.e., erroneous characters.

  • The definitions of what a lexeme, token or badcharacter is depend on the definition of the source language.

/27


Scanner example for c
Scanner Example for C

Symbol Table

  • Lexeme: C statement

    position = initial + rate* 60;

    (Lexeme) position = initial + rate* 60 ;

    < id,1> <=> <id,2> <+> <id,3> <*> <60> <;>

    (Token) ID ASSIGN ID PLUS ID TIME INT SEMI-COL

  • Arbitrary number of blanks (white spaces) between lexemes.

  • Erroneous sequence of characters, that are not parts of comments, for the C language:

    • control characters

    • @

    • 2abc

/27


Syntax analyzer parser
Syntax Analyzer (Parser)

  • Actions:

    • Group tokens into grammatical phrases , to discover the underlying structure of the source

    • Find syntax errors , e.g., the following C source line:

      (Lexeme) index = 12 * ;

      (Token) ID ASSIGN INT TIMES SEMI-COL

      Every token is legal, but the sequence is erroneous!

  • May find some static semantic errors , e.g., use of undeclared variables or multiple declared variables.

  • May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree.

/27


Parser example for c
Parser Example for C

=

+

< id,1>

<id,2>

<id,3>

60

  • Source code: position = initial + rate* 60

    < id,1> <=> <id,2> <+> <id,3> <*> <60>

  • Abstract-syntax tree:

    • interior nodes of the tree are OPERATORS;

    • a node’s children are its OPERANDS;

    • each subtree forms a logical unit .

    • the subtree with * at its root shows that * has higher precedence than +, the operation “rate * 60” must be performed as a unit, not “initial + rate”.

  • Where is ”;”?

Symbol Table

/27


Semantic analyzer
Semantic Analyzer

=

+

< id,1>

=

<id,2>

+

< id,1>

<id,3>

60

<id,2>

<id,3>

int_to_float

60

  • Actions:

    • Check for more static semantic errors, e.g., type errors .

    • May annotate and/or change the abstract syntax tree.

Symbol Table

/27


Intermediate code generator
Intermediate Code Generator

=

+

< id,1>

<id,2>

<id,3>

int_to_float

60

  • Actions: translate from abstract-syntax trees to intermediate codes.

  • One choice for intermediate code is 3-address code :

    • Each statement contains

      • at most 3 operands;

      • in addition to “=”, i.e., assignment, at most one operator.

    • An ”easy” and “universal” format that can be translated into most assembly languages.

t1 = int_to_float(60)

t2 = id3* t1

t3 = id2 + t2

id1 = t3

/27


Optimizer
Optimizer

  • Improve the efficiency of intermediate code.

  • Goal may be to make code run faster , and/or to use the least number of registers · · ·

  • Current trends:

    • to obtain smaller, but maybe slower, equivalent code for embedded systems;

    • to reduce power consumption.

t1 = int_to_float(60)

t2 = id3* t1

t3 = id2 + t2

id1 = t3

t1 = id3* 60.0

id1 = id2 + t1

/27


Code generation
Code Generation

  • A compiler may generate

    • pure machine codes (machine dependent assembly language) directly, which is rare now ;

    • virtual machine code.

  • Example:

    • PASCAL  compiler P-code interpreter execution

    • Speed is roughly 4 times slower than running directly generated machine codes.

  • Advantages:

    • simplify the job of a compiler;

    • decrease the size of the generated code: 1/3 for P-code ;

    • can be run easily on a variety of platforms

      • P-machine is an ideal general machine whose interpreter can be written easily;

      • divide and conquer;

      • recent example: JAVA and Byte-code.

/27


Code generation example
Code Generation Example

LDF R2, id3

MULF R2, R2, #60.0

LDF R1, id2

ADDF R1, R1, R2

STF id1, R1

t1 = id3* 60.0

id1 = id2 + t1

/27


Practical considerations 1 2
Practical Considerations (1/2)

  • Preprocessing phase:

    • macro substitution:

      • #define MAXC 10

    • rational preprocessing: add new features for old languages.

      • BASIC

      • C  C ++

    • compiler directives:

      • #include <stdio.h>

    • non-standard language extensions.

      • adding parallel primitives

/27


Practical considerations 2 2
Practical Considerations (2/2)

  • Passes of compiling

    • A pass reads an input file and writes an output file.

    • First pass reads the text file once.

    • May need to read the text one more time for any forward addressed objects, i.e., anything that is used before its declaration.

    • Example: C language

goto error_handling;

· · ·

error_handling:

· · ·

/27


Reduce number of passes
Reduce Number of Passes

  • Each pass takes I/O time.

  • Back-patching : leave a blank slot for missing information, and fill in the empty slot when the information becomes available.

  • Example: C language

    when a label is used

    • if it is not defined before, save a trace into the to-be-processed table

      • label name corresponds to LABEL TABLE[i]

    • code generated: GOTO LABEL TABLE[i]

      when a label is defined

    • check known labels for redefined labels

    • if it is not used before, save a trace into the to-be-processed table

    • if it is used before, then find its trace and fill the current address into the trace

  • Time and space trade-off !

/27


ad