The spa project golf and esp
Download
1 / 60

The SPA Project GOLF and ESP - PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on

The SPA Project GOLF and ESP. Manuvir Das Microsoft Research (joint work with Manuel Fahndrich, Jakob Rehof). SPA Group Mentor. Software Productivity Tools. Jim Larus runs the group research.microsoft.com/spt SLAM, Vault, Behave, PipelineServer … Focus on software reliability.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The SPA Project GOLF and ESP' - toby


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The spa project golf and esp

The SPA ProjectGOLF and ESP

Manuvir Das

Microsoft Research

(joint work with Manuel Fahndrich, Jakob Rehof)



Software productivity tools
Software Productivity Tools

  • Jim Larus runs the group

  • research.microsoft.com/spt

  • SLAM, Vault, Behave, PipelineServer …

  • Focus on software reliability


What s wrong with analysis
What’s wrong with analysis?

  • A: We don’t write or look at real code

  • B: We don’t solve real problems


Why does this happen
Why does this happen?

  • Analysis is a mix of theory and practice

  • But

    • Math and theory are elegant

    • experimentation needs infrastructure

    • engineering is boring


Today we ll talk about
Today we’ll talk about …

  • Doing analysis research the right way

  • My day job

    • Slicing and Partial Evaluation

    • Pointer analysis

    • Error detection


Slicing and partial evaluation
Slicing and Partial Evaluation

  • PE: Which computations depend only on known inputs? Do these early.

  • Or, which computations may depend on unknown inputs? Don’t do these early.

  • Insight: If a computation depends on unknown input, there must be an unknown input in its slice.


Forward slicing and bta
Forward slicing and BTA

  • Binding-time analysis

    • identify static computations

  • BTA via slicing

    • mark all unknown input nodes

    • forward slice from marked nodes and mark

    • all unmarked nodes are static computations


Why is this interesting
Why is this interesting?

  • Slicing incorporates control dependence

  • Previous work used reaching definitions

read(y);

x = 0;

while (y != 0) { y--; x++; }

z = x;

read(y);

x = 0;

while (y != 0) { y--; x++; }

z = x;

read(y);

x = 0;

while (y != 0) { y--; x++; }

z = x;

  • We can now prove correctness


This project had flaws
This project had flaws …

  • A: We don’t write or look at real code

    • cubic algorithm, ran on 2k lines in 30 minutes

    • only one benchmark (ray tracer)

  • B: We don’t solve real problems

    • who uses PE in practice?

    • was the lack of safety critical?

    • why not use a timer?


Then i visited msr
Then I visited MSR …

  • Daniel Weise – 1.5 million lines of real code

  • Real problems – software reliability

  • I was hooked!

    • find buffer overflows using static analysis

    • oops, need pointer analysis


Papers don t tell the whole truth
Papers don’t tell the whole truth!

  • Implemented Ste96, engineered it

    • lightning fast, but poor results

  • Lots of papers on how to improve

    • structures, signatures, SH97

  • Tried it all, nothing worked on real code

  • Needed Andersen (subtyping) on real code


Frameworks are good
Frameworks are good

  • A spectrum from Ste96 to And94

    • DGC POPL 98 : unification vs flow

    • SH POPL 97 : buckets within ECRs

  • Frameworks

    • give us a way of tuning precision vs efficiency

    • help us understand the problem


Frameworks are bad
Frameworks are bad

  • The real issue: how do you find the best trade-off point in a principled manner?

  • What if the parameter being varied is not the key concept?

    • CFA varies control depth rather than data

    • SH 97 picks random categories

    • DGC 98 alters the behaviour of the same statement


Back to pointer analysis
Back to pointer analysis …

  • No way to run Andersen on MLOC


So i hid in my office
So, I hid in my office …

  • Stared at SPEC code, wrote perl scripts

    • every feature is used

    • code is idiomatic

    • pointers are never assigned, except heap

    • most pointers arise through parameter passing

    • some code is just too hard for any analysis

  • Result: new algorithm driven by real code


Pointer analysis landscape

FSCS:

Flow-sensitive

Context-sensitive

FICS:

Flow-insensitive

Context-sensitive

FSCI:

Flow-sensitive

Context-insensitive

Precision

Cost

FICI:

Flow-insensitive

Context-insensitive

Pointer Analysis Landscape


Fici pointer analysis

Imprecise

Precise

Andersen

(cubic)

Expensive

500 KLOC in several minutes, 2GB

Steensgaard

(almost linear)

Cheap

1.5 MLOC in 1 minute, 100 MB

FICI Pointer Analysis

One level flow

(quadratic)


Andersen s algorithm

r1

p

q

r2

r1

q

p

r2

r3

Andersen’s Algorithm

p = &q;

p = q;


Andersen s algorithm1

s1

r1

p

s2

q

r2

s3

r1

s1

q

p

r2

s2

Andersen’s Algorithm

p = *q;

*p = q;


Steensgaard s algorithm

p

q

p

p

q

q

Steensgaard’s Algorithm

p = q;


Motivation for one level flow
Motivation for One Level Flow

foo(&s1);

foo(&s2);

bar(&s3);

foo(struct s *p) { *p.a = 3; bar(p);}

bar(struct s *q) { *q.b = 4;}


Simplified example

p

q

p

q

s1

s2

s3

s1,s2,s3

Simplified Example

p = &s1;

p = &s2;

q = &s3;

q = p;

*p.a = 3;

*q.b = 4;


One level flow

p

p

q

q

One Level Flow

p = q;


p = &s1;

p = &s2;

q = &s3;

q = p;

*p.a = 3;

*q.b = 4;

p = &s1;

p = &s2;

q = &s3;

q = p;

*p.a = 3;

*q.b = 4;

p = &s1;

p = &s2;

q = &s3;

q = p;

*p.a = 3;

*q.b = 4;

p = &s1;

p = &s2;

q = &s3;

q = p;

*p.a = 3;

*q.b = 4;

p

p

q

q

s1

s1

s3

s3

s2

s2

Simplified Example

p = &s1;

p = &s2;

q = &s3;

q = p;

*p.a = 3;

*q.b = 4;


Olf simple reachability

e

OLF: Simple Reachability

Single query: Linear

All queries: Quadratic


Olf cached reachability

x

y

OLF: Cached Reachability

MAX

MS Word : From 1 hour to 30 seconds for all queries




This project had flaws too
This project had flaws too …

  • B: We don’t solve problems

    • solved an open problem in pointer analysis

  • But

    • never got around to buffer overflow

    • didn’t use PTA for optimization

  • addressed these issues later, but

  • should have been driven by the problem


Since then
Since then …

  • Others have made And94 fast

    • Heintze PLDI 01

    • suggested by OLF results

  • But what about context-sensitivity?

    • crucial for value flow analysis

  • GOLF (DLFR SAS 01)

    • combines OLF and one level of instantiation constraints (Rehof’s lecture)

    • context-sensitive value flow on MLOC


Olf call example
OLF: Call Example

id(r) {return r;}

p = id(&x);

q = id(&y);

*p = 3;

r = &x;

p = r;

r = &y;

q = r;

*p = 3;


Olf call example1

r

p

x

*r

*p

y

*q

q

OLF: Call Example

r = &x;

p = r;

r = &y;

q = r;

*p = 3;


Golf call example

r

p

x

(

)

*r

*p

y

[

*q

]

q

GOLF: Call Example

id(r) {return r;}

p = id(&x);

q = id(&y);

*p = 3;


We have an analysis that is
We have an analysis that is …

  • fast enough to run on MLOC

  • good enough for static optimization

    • who cares; leave it to the chip makers!

  • not good enough for dynamic optimization (MDCE PASTE 01)

  • not good enough to track interesting correctness properties in real code


Correctness the killer app
Correctness: the killer app

  • Hardware can

    • speed up programs

    • enforce correctness at run-time

  • Hardware cannot

    • enforce correctness before product is shipped

  • Testers can

    • find errors on some paths

  • Testers cannot

    • find errors on all paths

  • So, use static analysis to find errors


Esp vision
ESP Vision

  • Error Detection via Scalable Program Analysis

  • Must be driven by real code

  • Must be sound (report all errors)

  • Must report few false positives

  • Use knowledge of tradeoffs in analysis

  • Let user help the analysis


Step 1 identify the problem
Step 1: Identify the problem

  • Solve a realistic problem:

    • partial correctness

    • user specified, finite-state properties

  • Solve a non-trivial problem:

    • don’t check uninits, NULL pointers

    • check locking protocols, resource usage


Parameterized protocol tracking

INIT(l)

Ret

Lock(l)

Unlock(l)

LOCKED(l)

Lock(l)

Ret

ERROR(l)

Parameterized Protocol Tracking

  • User specified

    • FSM with parameterized actions

    • patterns

  • Rest is automatic


Step 2 examine real code
Step 2: Examine real code

  • Find common idioms

  • Understand level of precision needed

  • Windows device drivers

    • mostly control dominated protocols

    • global data flow needs CS, but not FS/PS

    • path feasibility seems to matter


Sample driver code
Sample driver code

STATUS Initialize(Object o)

{

Object p = o;

if (p->needLock)

KeAcquireSpinLock(p);

p->data = 0;

if (p->needLock)

KeReleaseSpinLock(p);

return OK;

}


Step 3 break up the problem
Step 3: Break up the problem

  • Three distinct entities to be tracked

    • the temporal sequence of actions along a particular control flow path

    • the data involved in the actions

    • the data involved in path feasibility

  • Can use different levels of static analysis to track each entity


Data analysis vs control analysis
Data analysis vs control analysis

  • RHS 95: Cost is Ο(ED3). What is D?

    • dataflow: D is generally related to program size

    • program size grows because of pointers, globals

  • What if there is only a single global FSM?

    • D is just the #states in the FSM!

  • Control is cheap, data is expensive


Step 4 design static analyses
Step 4: Design static analyses

  • track the temporal sequence of actions along a particular control flow path

    • cannot use flow-insensitive analysis

    • RHS95 is too expensive

  • eliminate the data involved in the actions

    • use GOLF value flow

  • now we have a control property, use RHS95

  • both analyses are context-sensitive


Data elimination
Data elimination

STATUS Initialize(Object o)

{

Object p = o;

if (p->needLock)

KeAcquireSpinLock(p);

p->data = 0;

if (p->needLock)

KeReleaseSpinLock(p);

return OK;

}


Data elimination1

I

L

E

Data elimination

Initialize()

{

if (*)

Lock;

if (*)

Unlock;

}


Do we need context sensitivity
Do we need context-sensitivity?

  • What if GOLF cannot provide MUST info?

void Initialize(Object o1, Object o2) {

LockWrapper(o1);

LockWrapper(o2);

KeReleaseSpinLock(o1);

KeReleaseSpinLock(o2);

}

void LockWrapper(Object p) {

KeAcquireSpinLock(p);

}


Interface nodes
Interface nodes

  • Limit scope of value flow to interface nodes

  • Produce RHS summaries for interface nodes

void LockWrapper(Object p) {

KeAcquireSpinLock(p);

}

p: INIT -> LOCKED, LOCKED -> ERROR

  • Copy summaries to callers


Back to our example

i

o1

p

j

o2

Back to our example …

void Initialize(Object o1, Object o2) {

i: LockWrapper(o1);

j: LockWrapper(o2);

KeReleaseSpinLock(o1);

KeReleaseSpinLock(o2);

}

void LockWrapper(Object p) {

KeAcquireSpinLock(p);

}


Consider the abstraction
Consider the abstraction!

  • ESP makes an upfront abstraction

    • interface nodes in the GOLF graph

    • Plus: linear size, controls overall cost

    • Minus: may be too coarse

  • SLAM allows tuning of abstraction

    • but now we are back in the framework game


Path sensitivity
Path sensitivity

  • PSCS is too expensive (need to track data)

    • function calls

    • loops

    • sequential, unrelated diamonds

  • Function calls

    • use dataflow summaries

    • can only track local correlations

  • Loops and diamonds

    • use abstract simulation


Abstract simulation
Abstract simulation

  • Split simulator state into concrete state + FSM state

  • At join points, merge simulator states with identical FSM states

  • Extended constant-prop lattice of concrete states per FSM state

  • Polynomial bound, better than dataflow

  • Handles common case efficiently


Esp analysis
ESP Analysis

  • Three increasingly precise phases of static analysis

    • Phase I : global FICS value flow analysis

      • use GOLF to build a call graph and answer value flow questions

    • Phase II : global FSCS protocol tracking

      • use RHS95 combined with polymorphic value flow to track protocols attached to data

    • Phase III : local PS feasibility analysis

      • use local abstract simulation + summaries from Phase II to find infeasible paths


Step 5 answer two questions
Step 5: Answer two questions

  • Will this be precise enough?

    • manual inspection of drivers

    • some level of false positives is OK

  • Will this scale?

    • PI : Yes. DLFR SAS 01 on 1.5MLOC

    • PII : Yes! Use RHS95, control vs data

    • PIII : Yes. Local, abstract simulation


Step 6 commit to the full process
Step 6: Commit to the full process

  • Analysis is only 10% of error detection

  • Collaborate with others

    • PREFix team

    • SQL Server team

  • Be willing to work on deployment

  • Do the dirty work



Final word on esp
Final word on ESP

  • Related work

    • PREFix/PREFast

    • SLAM

    • ESC

    • Metal

  • Problem driven analysis research

  • All about scaling and real code


My view of program analysis
My view of program analysis

  • Static analysis is all about a tradeoff

    • efficiency vs precision

  • Tradeoff along several dimensions

    • path feasibility : FI, FS, PS

    • path validity : CI, CS

    • level of abstraction : AI

  • We’ve studied the theory …


You need to be engineers
… you need to be engineers

  • Engineering is not menial labour

  • Engineers can write papers

  • Engineers produce real tools

  • Engineers understand that the space of programs is limited in practice


You need to solve problems
… you need to solve problems

  • Make connections between research areas

  • Don’t be intimidated by the literature

  • Static analysis is a means to an end

  • Focus on software reliability

  • Correctness is wide open

    • plenty of opportunity

    • critical problem

    • needs good people


ad