- 205 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Building Correct Compilers' - denver

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Building “Correct” Compilers

OutlineOutlineOutlineOutlineOutlineOutlineOutlineOutline

K. Vikram and S. M. Nazrul A.

Outline

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Outline

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

The Seven Grand Challenges

- In Vivo In Silico
- Science for Global Ubiquitous Computing
- Memories for Life
- Scalable Ubiquitous Computing Systems
- The Architecture of the Brain and Mind
- Dependable Systems Evolution
- Journeys in Non-classical computations

The Seven Grand Challenges

- In Vivo In Silico
- Science for Global Ubiquitous Computing
- Memories for Life
- Scalable Ubiquitous Computing Systems
- The Architecture of the Brain and Mind
- Dependable Systems Evolution
- Journeys in Non-classical computations

Dependable Systems Evolution

- A long standing problem
- Loss of financial resources, human lives
- Compare with other engineering fields!
- Non-functional requirements
- Safety, Reliability, Availability, Security, etc.

Why the sudden interest?

- Was difficult so far, but now …
- Greater Technology Push
- Model checkers, theorem provers, programming theories and other formal methods
- Greater Market Pull
- Increased dependence on computing

Outline

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Why are correct compilers hard to build?

- Bugs don’t manifest themselves easily
- Where is the bug – program or compiler?
- Possible solutions
- Check semantic equivalence of the two programs (translation validation, etc.)
- Prove compilers sound (manually)
- Drawbacks?
- Conservative, Difficult, Actual code not verified

DIFF

Motivation

Verify each compilationCompiled

Prog

Source

compiler

- Translation validation
- [Pnueli et al 98, Necula 00]
- Credible compilation
- [Rinard 99]

- Compiler can still have bugs.
- Compile time increases.
- “Semantic Diff” is hard.

Correctness

checker

Motivation

Proving the whole compiler correct- Option 1: Prove compiler correct by hand.
- Proofs are long…
- And hard.
- Compilers are proven correct as written on paper. What about the implementation?

Correctness checker

Link?

Proof

Proof

Proof

«¬

$

\ r

t l

/ .

gcc-bugs mailing list

Searched for “incorrect” and “wrong” in the gcc-bugs mailing list.

Some of the results:

- c/9525: incorrect code generation on SSE2 intrinsics
- target/7336: [ARM] With -Os option, gcc incorrectly computes the elimination offset
- optimization/9325: wrong conversion of constants: (int)(float)(int) (INT_MAX)
- optimization/6537: For -O (but not -O2 or -O0) incorrect assembly is generated
- optimization/6891: G++ generates incorrect code when -Os is used
- optimization/8613: [3.2/3.3/3.4 regression] -O2 optimization generates wrong code
- target/9732: PPC32: Wrong code with -O2 –fPIC
- c/8224: Incorrect joining of signed and unsigned division
- …

And this is only for February 2003!

On a mature compiler!

Need for Automation

compiler

- This approach: proves compiler correct automatically.

Correctness checker

Automatic

Theorem

Prover

Theorem

Prover

The Challenge

This seems really hard!Task of proving

compiler correct

Complexity of proving a compiler correct.

Complexity that an automatic theorem prover can handle.

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Brief detour thru ATP

- Started with AI applications
- Reasoning about FOL sound and complete
- 1965: Unification and Resolution
- Combinatorial Explosion. SAT (NP-Complete) and FOL (decidable)
- Refinements of Resolution, Term Rewriting, Higher order Logics
- Interactive Theorem Proving
- Efficient Implementation Techniques
- Coq, Nuprl, Isabelle, Twelf, PVS, Simplify, etc.

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Focus on Optimizations

- Optimizations are the most error prone
- Only phase that performs transformations that can potentially change semantics
- Front-end and back-end are relatively static

Common Optimizations

- Constant Propagation: replace constant valued variables with constants
- Common sub-expression elimination: avoid recomputing value if value has been computed earlier in the program
- Loop invariant removal: move computations into less frequently executed portions of the program
- Strength Reduction: replace expensive operations (multiplication) with simpler ones (addition)
- Dead code removal: eliminate unreachable code and code that is irrelevant to the output of the program

Constant Propagation Condition

- Suppose x is used at program point p
- If
- on all possible execution paths from START of procedure to p
- x has constant value c at p
- then replace x by c

The Analysis Algorithm

- Build the control flow graph (CFG) of the program
- Make flow of control explicit
- Perform symbolic evaluation to determine constants
- Replace constant-valued variable uses by their values and simplify expressions and control flow

Building the CFG

- Composed of Basic Blocks
- Straight line code without any branches or merges of control flow
- Nodes of CFG
- Statements (basic blocks)/switches/merges
- Edges of CFG
- Possible control flow sequence

Symbolic Evaluation

- Assign each variable the bottom value initially
- Propagate changes in variable values as statements are executed
- Based on the idea of Abstract Interpretation

Symbolic Evaluation

- Flow Functions
- x := e [email protected] = [email protected]{eval(e, [email protected])/x}
- Confluence Operation
- join over all incoming edges

Symbolic Evaluation

- Flow Functions
- x := e [email protected] = ƒ ([email protected])
- Confluence Operation
- join over all incoming edges

The Dataflow analysis algorithm

- Associate one state vector with each edge of CFG. Initialize all entries to
- Set all entries on outgoing edge from START to
- Evaluate the expression and update the output edge
- Continue till a fixed point is reached

Termination Condition

- If each flow function ƒ is monotonic
- i.e. x ≤ y => ƒ (x) ≤ ƒ (y)
- And if the lattice is of finite height
- The dataflow algorithm terminates

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Theorem

Prover

Overview

Making the problem easierTask of proving

optimizer correct

- Only prove optimizer correct.
- Trust front-end and code-generator.

Theorem

Prover

Overview

Making the problem easierTask of proving

optimizer correct

Write optimizations in Cobalt, a domain-specific language.

Theorem

Prover

Overview

Making the problem easierTask of proving

optimizer correct

Write optimizations in Cobalt, a domain-specific language.

Separate correctness from profitability.

Theorem

Prover

Overview

Making the problem easierTask of proving

optimizer correct

Write optimizations in Cobalt, a domain-specific language.

Separate correctness from profitability.

Factor out the hard and common parts of the proof, and prove them once by hand.

x := …;

} else {

y := …;

}

…;

Overview

The CompilerFront

End

Source Code

10011011

00010100

01101101

Back

End

Binary Executable

Results

- Cobalt language
- realistic C-like IL, operates on a CFG
- implemented const prop and folding, branch folding, CSE, PRE, DAE, partial DAE, and simple forms of points-to analyses
- Correctness checker for Cobalt opts
- using the Simplify theorem prover
- Execution engine for Cobalt opts
- in the Whirlwind compiler

Caveats

- May not be able to express your opt Cobalt:
- no interprocedural optimizations for now.
- optimizations that build complicated data structures may be difficult to express.
- A sound Cobalt optimization may be rejected by the correctness checker.
- Trusted computing base (TCB) includes:
- front-end and code-generator, execution engine, correctness checker, proofs done by hand once

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Forward Optimizations

Constant Prop (straight-line code)y := 5

statement y := 5

statements that

don’t define y

x := y

x := 5

statement x := y

Forward Optimizations

Adding arbitrary control flowif

statement y := 5

y := 5

y := 5

y := 5

is followed by

statements that

don’t define y

until

x := y

x := 5

statement x := y

then

transform statement to x := 5

Constant prop in

English

if

statement y := 5

is followed by

statements that

don’t define y

until

statement x := y

then

transform statement to x := 5

Constant prop in

Cobalt

if

statement y := 5

stmt(Y := C)

boolean expressions evaluated at nodes in the CFG

is followed by

followed by

¬ mayDef(Y)

statements that

don’t define y

until

until

statement x := y

X := Y

then

X := C

transform statement to x := 5

English version

Cobalt version

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Constant prop revisited

- Ask a theorem prover to show:
- A statement satisfying stmt(Y := C) establishes Y == C
- A statement satisfying ¬mayDef(Y) maintains Y == C
- The statements X := Y and X := C have the same semantics in a program state satisfying Y == C

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

Y == C

Generalize to any forward optimization

- Ask a theorem prover to show:
- A statement satisfying 1 establishes P
- A statement satisfying 2 maintains P
- The statements s and s’ have the same semantics in a program state satisfying P

1

followed by

2

until

s

s’

with witness

We showed by hand once that these conditions imply correctness.

P

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Profitability heuristics

- Optimization correct safe to perform any subset of the matching transformations.
- So far, all transformations were also profitable.
- In some cases, many transformations are legal, but only a few are profitable.

The two pieces of an optimization

- Transformation pattern:
- defines which transformations are legal.

1

followed by

2

until

s

s’

with witness

P

filtered through

choose

- Profitability heuristic:
- describes which of the legal transformations to actually perform.
- does not affect soundness.
- can be written in a language of the user’s choice.

- This way of factoring an optimization is crucial to our ability to prove optimizations sound automatically.

Profitability heuristic example: PRE

- PRE as code duplication followed by CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

- Code duplication

x := a + b;

Profitability heuristic example: PRE

- PRE as code duplication followed by CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

}

x :=

- Code duplication
- CSE
- self-assignment removal

x := a + b;

a + b;

x;

Profitability heuristic example: PRE

Legal placements of x := a + b

Profitable placement

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

The Cobalt Language

- Operates on a Control Flow Graph
- A rewrite rule
- A guard to ensure appropriate conditions
- A predicate condition
- Filtered thru the choose function

The Cobalt Language

- Pure analyses also possible
- Verify properties
- For use by other transformations

Constant prop revisited (again)

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

Y == C

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

- Very conservative!
- Can we do better?

Y == C

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

- Very conservative!
- Can we do better?

Y == C

mayDef in Cobalt

stmt(Y := C)

followed by

¬ mayDef(Y)

until

X := Y

X := C

with witness

- mayPntTo is a pure analysis.
- It computes dataflow info, but performs no transformations.

Y == C

mayPntTo in Cobalt

decl X

stmt(decl X)

followed by

¬ stmt(... := &X)

defines

s

addrNotTaken(X)

with witness

mayPntTo(X,Y) ,

¬ addrNotTaken(Y)

“no location in the store points to X”

- Introduction: Setting the high level context
- Motivation
- Detours
- Automated Theorem Proving
- Compiler Optimizations thru Dataflow Analysis
- Overview of the Cobalt System
- Forward optimizations in cobalt
- Proving Cobalt Optimizations Correct
- Profitability Heuristics
- Pure Analyses
- Concluding Remarks

Expressiveness of Cobalt

- Constant propagation, folding
- Copy propagation
- Common Subexpression Elimination
- Branch Folding
- Partial Redundancy Elimination
- Loop invariant code motion
- Partial Dead Assignment Elimination

Future work

- Improving expressiveness
- interprocedural optimizations
- one-to-many and many-to-many transformations
- Inferring the witness
- Generate specialized compiler binary from the Cobalt sources.

Summary and Conclusion

- Optimizations written in a domain-specific language can be proven correct automatically.
- The correctness checker found several subtle bugs in Cobalt optimizations.
- A good step towards proving compilers correct automatically.

Download Presentation

Connecting to Server..