Building correct compilers l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 76

Building “Correct” Compilers PowerPoint PPT Presentation


  • 155 Views
  • Updated On :
  • Presentation posted in: General

Building “Correct” Compilers. K. Vikram and S. M. Nazrul A. Outline. Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt

Related searches for Building “Correct” Compilers

Download Presentation

Building “Correct” Compilers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Building correct compilers l.jpg

Building “Correct” Compilers

K. Vikram and S. M. Nazrul A.


Outline l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Outline3 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


The seven grand challenges l.jpg

Introduction

The Seven Grand Challenges

  • In Vivo  In Silico

  • Science for Global Ubiquitous Computing

  • Memories for Life

  • Scalable Ubiquitous Computing Systems

  • The Architecture of the Brain and Mind

  • Dependable Systems Evolution

  • Journeys in Non-classical computations


The seven grand challenges5 l.jpg

Introduction

The Seven Grand Challenges

  • In Vivo  In Silico

  • Science for Global Ubiquitous Computing

  • Memories for Life

  • Scalable Ubiquitous Computing Systems

  • The Architecture of the Brain and Mind

  • Dependable Systems Evolution

  • Journeys in Non-classical computations


Dependable systems evolution l.jpg

Introduction

Dependable Systems Evolution

  • A long standing problem

    • Loss of financial resources, human lives

  • Compare with other engineering fields!

  • Non-functional requirements

    • Safety, Reliability, Availability, Security, etc.


Why the sudden interest l.jpg

Introduction

Why the sudden interest?

  • Was difficult so far, but now …

  • Greater Technology Push

    • Model checkers, theorem provers, programming theories and other formal methods

  • Greater Market Pull

    • Increased dependence on computing


A small but significant step l.jpg

Introduction

A small but significant step

Building Correct Compilers


Outline9 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Why are correct compilers hard to build l.jpg

Motivation

Why are correct compilers hard to build?

  • Bugs don’t manifest themselves easily

  • Where is the bug – program or compiler?

  • Possible solutions

    • Check semantic equivalence of the two programs (translation validation, etc.)

    • Prove compilers sound (manually)

  • Drawbacks?

    • Conservative, Difficult, Actual code not verified


Testing l.jpg

DIFF

Motivation

Testing

Compiled

Prog

Source

compiler

input

output

exp-

ected

output

run!

  • To get benefits, must:

    • run over many inputs

    • compile many test cases

  • No correctness guarantees:

    • neither for the compiled prog

    • nor for the compiler


Verify each compilation l.jpg

Semantic

DIFF

Motivation

Verify each compilation

Compiled

Prog

Source

compiler

  • Translation validation

  • [Pnueli et al 98, Necula 00]

  • Credible compilation

  • [Rinard 99]

  • Compiler can still have bugs.

  • Compile time increases.

  • “Semantic Diff” is hard.


Proving the whole compiler correct l.jpg

Correctness

checker

Motivation

Proving the whole compiler correct

Compiled

Prog

Source

compiler


Proving the whole compiler correct14 l.jpg

compiler

Correctness

checker

Motivation

Proving the whole compiler correct

  • Option 1: Prove compiler correct by hand.

  • Proofs are long…

  • And hard.

  • Compilers are proven correct as written on paper. What about the implementation?

Correctness checker

Link?

Proof

Proof

Proof

«¬

 $

 \ r

t  l

/ .


Gcc bugs mailing list l.jpg

Motivation

gcc-bugs mailing list

Searched for “incorrect” and “wrong” in the gcc-bugs mailing list.

Some of the results:

  • c/9525: incorrect code generation on SSE2 intrinsics

  • target/7336: [ARM] With -Os option, gcc incorrectly computes the elimination offset

  • optimization/9325: wrong conversion of constants: (int)(float)(int) (INT_MAX)

  • optimization/6537: For -O (but not -O2 or -O0) incorrect assembly is generated

  • optimization/6891: G++ generates incorrect code when -Os is used

  • optimization/8613: [3.2/3.3/3.4 regression] -O2 optimization generates wrong code

  • target/9732: PPC32: Wrong code with -O2 –fPIC

  • c/8224: Incorrect joining of signed and unsigned division

And this is only for February 2003!

On a mature compiler!


Need for automation l.jpg

Motivation

Need for Automation

compiler

  • This approach: proves compiler correct automatically.

Correctness checker

Automatic

Theorem

Prover


This seems really hard l.jpg

Automatic

Theorem

Prover

The Challenge

This seems really hard!

Task of proving

compiler correct

Complexity of proving a compiler correct.

Complexity that an automatic theorem prover can handle.


Outline18 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Brief detour thru atp l.jpg

Automated Theorem Proving

Brief detour thru ATP

  • Started with AI applications

  • Reasoning about FOL sound and complete

    • 1965: Unification and Resolution

  • Combinatorial Explosion. SAT (NP-Complete) and FOL (decidable)

  • Refinements of Resolution, Term Rewriting, Higher order Logics

  • Interactive Theorem Proving

  • Efficient Implementation Techniques

  • Coq, Nuprl, Isabelle, Twelf, PVS, Simplify, etc.


Outline20 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Focus on optimizations l.jpg

Optimizations

Focus on Optimizations

  • Optimizations are the most error prone

  • Only phase that performs transformations that can potentially change semantics

  • Front-end and back-end are relatively static


Common optimizations l.jpg

Optimizations

Common Optimizations

  • Constant Propagation: replace constant valued variables with constants

  • Common sub-expression elimination: avoid recomputing value if value has been computed earlier in the program

  • Loop invariant removal: move computations into less frequently executed portions of the program

  • Strength Reduction: replace expensive operations (multiplication) with simpler ones (addition)

  • Dead code removal: eliminate unreachable code and code that is irrelevant to the output of the program


Constant propagation examples l.jpg

Optimizations

Constant Propagation Examples


Constant propagation condition l.jpg

Optimizations

Constant Propagation Condition

  • Suppose x is used at program point p

  • If

    • on all possible execution paths from START of procedure to p

    • x has constant value c at p

    • then replace x by c


The analysis algorithm l.jpg

Optimizations

The Analysis Algorithm

  • Build the control flow graph (CFG) of the program

    • Make flow of control explicit

  • Perform symbolic evaluation to determine constants

  • Replace constant-valued variable uses by their values and simplify expressions and control flow


Building the cfg l.jpg

Optimizations

Building the CFG


Building the cfg27 l.jpg

Optimizations

Building the CFG

  • Composed of Basic Blocks

    • Straight line code without any branches or merges of control flow

  • Nodes of CFG

    • Statements (basic blocks)/switches/merges

  • Edges of CFG

    • Possible control flow sequence


Symbolic evaluation l.jpg

Optimizations

Symbolic Evaluation

  • Assign each variable the bottom value initially

  • Propagate changes in variable values as statements are executed

  • Based on the idea of Abstract Interpretation


Symbolic evaluation29 l.jpg

Optimizations

Symbolic Evaluation

  • Flow Functions

    • x := [email protected] = [email protected]{eval(e, [email protected])/x}

  • Confluence Operation

    • join over all incoming edges


Symbolic evaluation30 l.jpg

Optimizations

Symbolic Evaluation

  • Flow Functions

    • x := [email protected] = ƒ ([email protected])

  • Confluence Operation

    • join over all incoming edges


The dataflow analysis algorithm l.jpg

Optimizations

The Dataflow analysis algorithm

  • Associate one state vector with each edge of CFG. Initialize all entries to

  • Set all entries on outgoing edge from START to

  • Evaluate the expression and update the output edge

  • Continue till a fixed point is reached


Example evaluation l.jpg

Optimizations

Example Evaluation


Termination condition l.jpg

Optimizations

Termination Condition

  • If each flow function ƒ is monotonic

    • i.e. x ≤ y => ƒ (x) ≤ ƒ (y)

  • And if the lattice is of finite height

  • The dataflow algorithm terminates


Other optimizations l.jpg

Optimizations

Other Optimizations

All Paths

Any Path

Forward

Flow

Backward

Flow


Outline35 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Making the problem easier l.jpg

Automatic

Theorem

Prover

Overview

Making the problem easier

Task of proving

compiler correct


Making the problem easier37 l.jpg

Automatic

Theorem

Prover

Overview

Making the problem easier

Task of proving

optimizer correct

  • Only prove optimizer correct.

  • Trust front-end and code-generator.


Making the problem easier38 l.jpg

Automatic

Theorem

Prover

Overview

Making the problem easier

Task of proving

optimizer correct

Write optimizations in Cobalt, a domain-specific language.


Making the problem easier39 l.jpg

Automatic

Theorem

Prover

Overview

Making the problem easier

Task of proving

optimizer correct

Write optimizations in Cobalt, a domain-specific language.

Separate correctness from profitability.


Making the problem easier40 l.jpg

Automatic

Theorem

Prover

Overview

Making the problem easier

Task of proving

optimizer correct

Write optimizations in Cobalt, a domain-specific language.

Separate correctness from profitability.

Factor out the hard and common parts of the proof, and prove them once by hand.


The design l.jpg

Overview

The Design

Interpreter

Input

Output

Cobalt Program


The design42 l.jpg

Overview

The Design


The compiler l.jpg

if (…) {

x := …;

} else {

y := …;

}

…;

Overview

The Compiler

Front

End

Source Code

10011011

00010100

01101101

Back

End

Binary Executable


Results l.jpg

Overview

Results

  • Cobalt language

    • realistic C-like IL, operates on a CFG

    • implemented const prop and folding, branch folding, CSE, PRE, DAE, partial DAE, and simple forms of points-to analyses

  • Correctness checker for Cobalt opts

    • using the Simplify theorem prover

  • Execution engine for Cobalt opts

    • in the Whirlwind compiler


Cobalt rhodium l.jpg

Overview

Cobalt  Rhodium  ?


Caveats l.jpg

Overview

Caveats

  • May not be able to express your opt Cobalt:

    • no interprocedural optimizations for now.

    • optimizations that build complicated data structures may be difficult to express.

  • A sound Cobalt optimization may be rejected by the correctness checker.

  • Trusted computing base (TCB) includes:

    • front-end and code-generator, execution engine, correctness checker, proofs done by hand once


Outline47 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Constant prop straight line code l.jpg

REPLACE

Forward Optimizations

Constant Prop (straight-line code)

y := 5

statement y := 5

statements that

don’t define y

x := y

x := 5

statement x := y


Adding arbitrary control flow l.jpg

REPLACE

Forward Optimizations

Adding arbitrary control flow

if

statement y := 5

y := 5

y := 5

y := 5

is followed by

statements that

don’t define y

until

x := y

x := 5

statement x := y

then

transform statement to x := 5


Constant prop in l.jpg

Forward Optimizations

Constant prop in

English

if

statement y := 5

is followed by

statements that

don’t define y

until

statement x := y

then

transform statement to x := 5


Constant prop in51 l.jpg

Forward Optimizations

Constant prop in

Cobalt

if

statement y := 5

stmt(Y := C)

boolean expressions evaluated at nodes in the CFG

is followed by

followed by

¬ mayDef(Y)

statements that

don’t define y

until

until

statement x := y

X := Y

then

X := C

transform statement to x := 5

English version

Cobalt version


Outline52 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Proving correctness automatically l.jpg

Proving Optimizations Correct

Proving correctness automatically

y := 5

y := 5

y := 5

  • Witnessing region

  • Invariant: y == 5

x := y

x := 5


Constant prop revisited l.jpg

Proving Optimizations Correct

Constant prop revisited

  • Ask a theorem prover to show:

  • A statement satisfying stmt(Y := C) establishes Y == C

  • A statement satisfying ¬mayDef(Y) maintains Y == C

  • The statements X := Y and X := C have the same semantics in a program state satisfying Y == C

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

Y == C


Generalize to any forward optimization l.jpg

Proving Optimizations Correct

Generalize to any forward optimization

  • Ask a theorem prover to show:

  • A statement satisfying 1 establishes P

  • A statement satisfying 2 maintains P

  • The statements s and s’ have the same semantics in a program state satisfying P

1

followed by

2

until

s

s’

with witness

We showed by hand once that these conditions imply correctness.

P


Outline56 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Profitability heuristics l.jpg

Profitability Heuristics

Profitability heuristics

  • Optimization correct  safe to perform any subset of the matching transformations.

  • So far, all transformations were also profitable.

  • In some cases, many transformations are legal, but only a few are profitable.


The two pieces of an optimization l.jpg

Profitability Heuristics

The two pieces of an optimization

  • Transformation pattern:

    • defines which transformations are legal.

1

followed by

2

until

s

s’

with witness

P

filtered through

choose

  • Profitability heuristic:

    • describes which of the legal transformations to actually perform.

    • does not affect soundness.

    • can be written in a language of the user’s choice.

  • This way of factoring an optimization is crucial to our ability to prove optimizations sound automatically.


Profitability heuristic example pre l.jpg

Profitability Heuristics

Profitability heuristic example: PRE

  • PRE as code duplication followed by CSE


Profitability heuristic example pre60 l.jpg

Profitability Heuristics

Profitability heuristic example: PRE

  • PRE as code duplication followed by CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

  • Code duplication

x := a + b;


Profitability heuristic example pre61 l.jpg

Profitability Heuristics

Profitability heuristic example: PRE

  • PRE as code duplication followed by CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

}

x :=

  • Code duplication

  • CSE

  • self-assignment removal

x := a + b;

a + b;

x;


Profitability heuristic example pre62 l.jpg

Profitability Heuristics

Profitability heuristic example: PRE

Legal placements of x := a + b

Profitable placement

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;


Outline63 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


The cobalt language l.jpg

Pure Analyses

The Cobalt Language

  • Operates on a Control Flow Graph

  • A rewrite rule

  • A guard to ensure appropriate conditions

  • A predicate condition

  • Filtered thru the choose function


The cobalt language65 l.jpg

Pure Analyses

The Cobalt Language

  • Pure analyses also possible

    • Verify properties

    • For use by other transformations


Constant prop revisited again l.jpg

Pure Analyses

Constant prop revisited (again)

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

Y == C


Maydef in cobalt l.jpg

Pure Analyses

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

Y == C


Maydef in cobalt68 l.jpg

Pure Analyses

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

  • Very conservative!

  • Can we do better?

Y == C


Maydef in cobalt69 l.jpg

Pure Analyses

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

  • Very conservative!

  • Can we do better?

Y == C


Maydef in cobalt70 l.jpg

Pure Analyses

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

Y == C


Maydef in cobalt71 l.jpg

Pure Analyses

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

  • mayPntTo is a pure analysis.

  • It computes dataflow info, but performs no transformations.

Y == C


Maypntto in cobalt l.jpg

Pure Analyses

mayPntTo in Cobalt

decl X

stmt(decl X)

followed by

¬ stmt(... := &X)

defines

s

addrNotTaken(X)

with witness

mayPntTo(X,Y) ,

¬ addrNotTaken(Y)

“no location in the store points to X”


Outline73 l.jpg

Outline

  • Introduction: Setting the high level context

  • Motivation

  • Detours

    • Automated Theorem Proving

    • Compiler Optimizations thru Dataflow Analysis

  • Overview of the Cobalt System

  • Forward optimizations in cobalt

  • Proving Cobalt Optimizations Correct

  • Profitability Heuristics

  • Pure Analyses

  • Concluding Remarks


Expressiveness of cobalt l.jpg

Concluding Remarks

Expressiveness of Cobalt

  • Constant propagation, folding

  • Copy propagation

  • Common Subexpression Elimination

  • Branch Folding

  • Partial Redundancy Elimination

  • Loop invariant code motion

  • Partial Dead Assignment Elimination


Future work l.jpg

Concluding Remarks

Future work

  • Improving expressiveness

    • interprocedural optimizations

    • one-to-many and many-to-many transformations

  • Inferring the witness

  • Generate specialized compiler binary from the Cobalt sources.


Summary and conclusion l.jpg

Concluding Remarks

Summary and Conclusion

  • Optimizations written in a domain-specific language can be proven correct automatically.

  • The correctness checker found several subtle bugs in Cobalt optimizations.

  • A good step towards proving compilers correct automatically.


  • Login