920 likes | 1.45k Views
CS450 Compiler Design. 2. Overview. 1Introduction2Language processors (tombstone diagrams, bootstrapping)3Architecture of a compiler. CS450 Compiler Design. 3. Levels of Programming Languages. High-level program. class Triangle { ... float area() { return b*h/2; }. Low-level progr
E N D
1. CS450 Compiler Design 1 Compiler Designan Overview
2. CS450 Compiler Design 2 Overview
1 Introduction
2 Language processors (tombstone diagrams, bootstrapping)
3 Architecture of a compiler
3. CS450 Compiler Design 3 Levels of Programming Languages
4. CS450 Compiler Design 4 Levels of Programming Languages Some high-level languages:
C, C++, Java, Pascal, Ada, Fortran, Cobol, Scheme, Prolog, Smalltalk, ...
Some low-level languages:
x86 assembly language, PowerPC assembly language, SPARC assembly language, MIPS assembly language, ARM assembly language, ...
5. CS450 Compiler Design 5 Levels of Programming Languages What makes a high-level language different from a low-level language?
6. CS450 Compiler Design 6 Abstraction A high-level language is more abstract than a low-level language.
More abstract? What does that mean?
Abstraction: Separate the ‘how’ from the ‘what’.
Or what is implemented from how is it implemented.
e.g. procedural abstraction = separate ‘what does it do’ from ‘how does it do it’
HL languages abstract away from the underlying machine => much more portable
7. CS450 Compiler Design 7 Levels of Programming Languages Q: How do the following make a HL language more abstract?
8. CS450 Compiler Design 8 Language Processors: What are they? Examples:
Editors
Translators (e.g. compiler, assembler, disassembler)
Interpreters
9. CS450 Compiler Design 9 Language Processors: Why do we need them?
10. CS450 Compiler Design 10 Programming Language Specification Why?
A communication device between people who need to have a common understanding of the PL:
language designer, language implementer, user
What to specify?
Specify what is a ‘well formed’ program
syntax
contextual constraints (also called static semantics):
scoping rules
type rules
Specify what is the meaning of (well formed) programs
semantics (also called runtime semantics)
11. CS450 Compiler Design 11 Programming Language Specification Why?
What to specify?
How to specify ?
Formal specification: use some kind of precisely defined formalism
Informal specification: description in English.
Usually a mix of both (e.g. Java specification)
Syntax => formal specification using CFG/BNF
Contextual constraints and semantics => informal
12. CS450 Compiler Design 12 Syntax Specification Syntax is specified using “Context Free Grammars”:
A finite set of terminal symbols
A finite set of non-terminal symbols
A start symbol
A finite set of production rules
Usually CFG are written in “Bachus Naur Form” or BNF notation.
A production rule in BNF notation is written as:
N ::= a where N is a non terminal and a a sequence of terminals and non-terminals
N ::= a | b | ... is an abbreviation for several rules with N
as left-hand side.
13. CS450 Compiler Design 13 Syntax Specification A CFG defines a set of strings. This is called the language of the CFG.
Example:
Start ::= Letter
| Start Letter
| Start Digit
Letter ::= a | b | c | d | ... | z
Digit ::= 0 | 1 | 2 | ... | 9
Q: What is the “language” defined by this grammar?
14. CS450 Compiler Design 14 Example: Syntax of “Mini-Triangle” Mini-Triangle is a very simple Pascal-like programming language.
An example program:
15. CS450 Compiler Design 15 Example: Syntax of “Mini-Triangle”
16. CS450 Compiler Design 16 Example: Syntax of “Mini-Triangle” (continued)
17. CS450 Compiler Design 17 Example: Syntax of “Mini-Triangle” (continued)
18. CS450 Compiler Design 18 Syntax Trees A syntax tree or parse tree is an ordered labeled tree such that:
a) terminal nodes (leaf nodes) are labeled by terminal symbols
b) non-terminal nodes (internal nodes) are labeled by non-terminal symbols.
c) each non-terminal node labeled by N has children X1, X2, ... Xn (in this order) such that N := X1 X2 ... Xn is a production.
19. CS450 Compiler Design 19 Syntax Trees Example:
20. CS450 Compiler Design 20 Concrete and Abstract Syntax The previous grammar specified the concrete syntax of Mini-Triangle.
21. CS450 Compiler Design 21 Example: Concrete/Abstract Syntax of Commands
22. CS450 Compiler Design 22 Example: Concrete/Abstract Syntax of Commands
23. CS450 Compiler Design 23 Example: Concrete Syntax of Expressions
24. CS450 Compiler Design 24 Example: Abstract Syntax of Expressions
25. CS450 Compiler Design 25 Abstract Syntax Trees Abstract Syntax Tree for: d:=d+10*n
26. CS450 Compiler Design 26 Contextual Constraints
27. CS450 Compiler Design 27 Scope Rules
28. CS450 Compiler Design 28 Type Rules
29. CS450 Compiler Design 29 Semantics
30. CS450 Compiler Design 30 Semantics
31. CS450 Compiler Design 31 Semantics
32. CS450 Compiler Design 32 Semantics
33. CS450 Compiler Design 33 Conclusion / Summary This course is about compilers
Compilers are language processors
translate high-level language into low-level language
help bridge the semantic gap
Language specification
needed for communication between language designers, implementers, and users
Three “parts” we will study during this course
Syntax of the language: usually formal: Extended BNF
Contextual constraints: usually informal: scope rules and type rules (written in English)
Semantics: usually informal: descriptions in English
34. CS450 Compiler Design 34 Overview Introduction
Language processors (tombstone diagrams, bootstrapping)
Architecture of a compiler
35. CS450 Compiler Design 35 Compilers and other translators Examples:
Chinese => English
Java => JVM byte codes
Scheme => C
C => Scheme
x86 Assembly Language => x86 binary codes
36. CS450 Compiler Design 36 Assemblers versus Compilers An assembler and a compiler both translate a “more high level” language into a “more low-level” one.
37. CS450 Compiler Design 37 Assemblers versus Compilers
38. CS450 Compiler Design 38 Terminology
39. CS450 Compiler Design 39 Tombstone Diagrams What are they?
diagrams consist of a set of “puzzle pieces” we can use to reason about language processors and programs
different kinds of pieces
combination rules (not all diagrams are “well formed”)
40. CS450 Compiler Design 40 Tombstone diagrams: Combination rules
41. CS450 Compiler Design 41 Compilation
42. CS450 Compiler Design 42 Cross compilation
43. CS450 Compiler Design 43 Two Stage Compilation
44. CS450 Compiler Design 44 Compiling a Compiler
45. CS450 Compiler Design 45 Interpreters
46. CS450 Compiler Design 46 Interpreters
47. CS450 Compiler Design 47 Interpreters
48. CS450 Compiler Design 48 Interpreters versus Compilers Compilers typically offer more advantages when
programs are deployed in a production setting
programs are “repetitive”
the instructions of the programming language are complex
Interpreters typically are a better choice when
we are in a development/testing/debugging stage
programs are run once and then discarded
the instructions of the language are simple
49. CS450 Compiler Design 49 Interpretive Compilers
50. CS450 Compiler Design 50 Interpretive Compilers
51. CS450 Compiler Design 51 Portable Compilers
52. CS450 Compiler Design 52 Portable Compilers In the previous example we have seen that portability is not an “all or nothing” kind of deal.
It is useful to talk about a “degree of portability” as the percentage of code that needs to be re-written when moving to a dissimilar machine.
In practice 100% portability is not possible.
53. CS450 Compiler Design 53 Example: a “portable” compiler kit
54. CS450 Compiler Design 54 Example: a “portable” compiler kit
55. CS450 Compiler Design 55 Example: a “portable” compiler kit
56. CS450 Compiler Design 56 Bootstrapping
57. CS450 Compiler Design 57 Bootstrapping
58. CS450 Compiler Design 58 Bootstrapping an Interpretive Compiler to Generate M code
59. CS450 Compiler Design 59 Bootstrapping an Interpretive Compiler to Generate M code
60. CS450 Compiler Design 60 Bootstrapping an Interpretive Compiler to Generate M code
61. CS450 Compiler Design 61 Bootstrapping an Interpretive Compiler to Generate M code
62. CS450 Compiler Design 62 Bootstrapping an Interpretive Compiler to Generate M code
63. CS450 Compiler Design 63 Full Bootstrap
64. CS450 Compiler Design 64 Full Bootstrap
65. CS450 Compiler Design 65 Full Bootstrap
66. CS450 Compiler Design 66 Full Bootstrap
67. CS450 Compiler Design 67 Half Bootstrap
68. CS450 Compiler Design 68 Half Bootstrap
69. CS450 Compiler Design 69 Half Bootstrap
70. CS450 Compiler Design 70 Bootstrapping to Improve Efficiency
71. CS450 Compiler Design 71 Bootstrapping to Improve Efficiency
72. CS450 Compiler Design 72 Overview 1 Introduction
2 Language processors (tombstone diagrams, bootstrapping)
3 Architecture of a compiler
73. CS450 Compiler Design 73 The Major “Phases” of a Compiler
74. CS450 Compiler Design 74 Different Phases of a Compiler The different phases can be seen as different transformation steps to transform source code into object code.
The different phases correspond roughly to the different parts of the language specification:
Syntax analysis <-> Syntax
Contextual analysis <-> Contextual constraints
Code generation <-> Semantics
75. CS450 Compiler Design 75 Example Program We now look at each of the three different phases in a little more detail. We look at each of the steps in transforming an example Triangle program into TAM code.
76. CS450 Compiler Design 76 1) Syntax Analysis
77. CS450 Compiler Design 77 1) Syntax Analysis --> AST
78. CS450 Compiler Design 78 2) Contextual Analysis --> Decorated AST
79. CS450 Compiler Design 79 2) Contextual Analysis --> Decorated AST
80. CS450 Compiler Design 80 Contextual Analysis Finds scope and type errors.
81. CS450 Compiler Design 81 3) Code Generation Assumes that program has been thoroughly checked and is well formed (scope & type rules)
Takes into account semantics of the source language as well as the target language.
Transforms source program into target code.
82. CS450 Compiler Design 82 3) Code Generation
83. CS450 Compiler Design 83 Compiler Passes A “pass” is a complete traversal of the source program, or a complete traversal of some internal representation of the source program (such as an AST).
A pass can correspond to a “phase” but it does not have to!
Sometimes a single pass corresponds to several phases that are interleaved in time.
What and how many passes a compiler does over the source program is an important design decision.
84. CS450 Compiler Design 84 Single Pass Compiler
85. CS450 Compiler Design 85 Multi Pass Compiler
86. CS450 Compiler Design 86 Example: Single Pass Compilation of ...
87. CS450 Compiler Design 87 Compiler Design Issues
88. CS450 Compiler Design 88 Language Issues Example Pascal:
Pascal was explicitly designed to be easy to implement with a single pass compiler:
Every identifier must be declared before its first use.
89. CS450 Compiler Design 89 Language Issues Example Pascal:
Every identifier must be declared before it is used.
How to handle mutual recursion then?
90. CS450 Compiler Design 90 Language Issues Example Pascal:
Every identifier must be declared before it is used.
How to handle mutual recursion then?
91. CS450 Compiler Design 91 Example: The Triangle Compiler Driver