introduction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction PowerPoint Presentation
Download Presentation
Introduction

Loading in 2 Seconds...

play fullscreen
1 / 29
afra

Introduction - PowerPoint PPT Presentation

136 Views
Download Presentation
Introduction
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction CPSC 388 Ellen Walker Hiram College

  2. Why Learn About Compilers? • Practical application of important computer science theory • Ties together computer architecture and programming • Useful tools for developing language interpreters • Not just programming languages!

  3. Computer Languages • Machine language • Binary numbers stored in memory • Bits correspond directly to machine actions • Assembly language • A “symbolic face” for machine language • Line-for-line translation • High-level language (our goal!) • Closer to human expressions of problems, e.g. mathematical notation

  4. Assembler vs. HLL • Assembler Ldi $r1, 2 -- put the value 2 in R1 Sto $r1, x -- store that value in X • HLL X = 2;

  5. Characteristics of HLL’s • Easier to learn (and remember) • Machine independent • No knowledge of architecture needed • … as long as there is a compiler for that machine!

  6. Early Milestones • FORTRAN (Formula Translation) • IBM (John Backus) 1954-1957 • First High-level language, and first compiler • Chomsky Hierarchy (1950’s) • Formal description of natural language structure • Ranks languages according to the complexity of their grammar

  7. Chomsky Hierarchy • Type 3: Regular languages • Too simple for programming languages • Good for tokens, e.g. numbers • Type 2: Context Free languages • Standard representation of programming languages • Type 1: Context Sensitive Languages • Type 0: Unrestricted

  8. Another View of the Hierarchy CSL CFL RL

  9. Formal Language & Automata Theory • Machines to recognizes each language class • Turing Machine (computable languages) • Push-down Automaton (context-free languages) • Finite Automaton (regular languages) • Use machines to prove that a given language belongs to a class • Formally prove that a given language does not belong to a class

  10. Practical Applications of Theory • Translate from grammar to formal machine description • Implement the formal machine to parse the language • Tools: • Scanner Generator (RL / FA): LEX, FLEX • Parser Generator (CFL / FA): YACC, Bison

  11. Beyond Parsing • Code generation • Optimization • Techniques to “mindlessly” improve code • Usually after code generation • Rarely “optimal”, simply better

  12. Phases of a Compiler • Scanner -> tokens • Parser -> syntax tree • Semantic Analyzer -> annotated tree • Source code optimizer -> intermediate code • Code generator -> target code • Target code optimizer -> better target code

  13. Additional Tables • Symbol table • Tracks all variable names and other symbols that will have to be mapped to addresses later • Literal table • Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program

  14. Scanner • Read a stream of characters • Perform lexical analysis to generate tokens • Update symbol and literal tables as needed • Example: Input: a[j] = 4 + 1 Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM

  15. Parser • Performs syntax analysis • Relates the sequence of tokens to the grammar • Builds a tree that represents this relationship, the parse tree

  16. Partial Grammar • assign-expr -> expr = expr • array-expr -> ID [ expr ] • expr -> array-expr • expr -> expr + expr • expr -> ID • expr -> NUM

  17. Example Parse assign-expression expression = expression array-expression add-expression ID [ expression ] expression + expression ID NUM NUM

  18. Abstract Syntax Tree assign-expression expression expression array-expression add-expression ID expression expression expression ID NUM NUM

  19. Semantic Analyzer • Determine the meaning (not structure) of the program • This is “compile-time” or static semantics only • Example; a[j] = 4 + 1 • a refers to an array location • a contains integers • j is an integer • j is in the range of the array (not checked in C) • Parse or Syntax tree is “decorated” with this information

  20. Source Code Optimizer • Simplify and improve the source code by applying rules • Constant folding: replace “4+2” by 6 • Combine common sub-expressions • Reordering expressions (often prior to constant folding) • Etc. • Result: modified, decorated syntax tree or Intermediate Representation

  21. Code Generator • Generates code for the target machine • Example: • MOV R0, j value of j into R0 • MUL R0, 2 2*j in R0 (int = 2 wds) • MOV R1, &a value of a in R1 • ADD R1, R0 a+2*j in R1 (addr of a[j]) • MOV *R1, 6 6 into address in R1

  22. Target Code Optimizer • Apply rules to improve machine code • Example: • MOV R0, j • SHL R0 (shift to multiply by 2) Use more complex • MOV &a[R0], 6 machine instruction to replace simpler ones

  23. Major Data Structures • Tokens • Syntax Tree • Symbol Table • Literal Table • Intermediate Code • Temporary files

  24. Structuring a Compiler • Analysis vs. Synthesis • Analysis = understanding the source code • Synthesis = generating the target code • Front end vs. Back end • Front end: parsing & intermediate code generation (target machine-independent) • Back end: target code generation • Optimization included in both parts

  25. Multiple Passes • Each pass process the source code once • One pass per phase • One pass for several phases • One pass for entire compilation • Language definition can preclude one-pass compilation

  26. Runtime Environments • Static (e.g. FORTRAN) • No pointers, no dynamic allocation, no recursion • All memory allocation done prior to execution • Stack-based (e.g. C family) • Stack for nested allocation (call/return) • Heap for random allocation (new) • Fully dynamic (LISP) • Allocation is automatic (not in source code) • Garbage collection required

  27. Error Handling • Each phase finds and handles its own types of errors • Scanning: errors like: 1o1 (invalid ID) • Parsing: syntax errors • Semantic Analysis: type errors • Runtime errors handled by the runtime environment • Exception handling by programmer often allowed

  28. Compiling the Compiler • Using machine language • Immediately executable, hard to write • Necessary for the first (FORTRAN) compiler • Using a language with an existing compiler and the same target machine • Using the language to be compiled (bootstrapping)

  29. Bootstrapping • Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL) • Write a complete compiler in the language subset • Compile the complete compiler using the “quick & dirty” compiler