probability statistics o pportunities for formal methods research n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Probability & Statistics : O pportunities for Formal Methods (?) Research PowerPoint Presentation
Download Presentation
Probability & Statistics : O pportunities for Formal Methods (?) Research

Loading in 2 Seconds...

play fullscreen
1 / 32

Probability & Statistics : O pportunities for Formal Methods (?) Research - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Probability & Statistics : O pportunities for Formal Methods (?) Research. Sriram K. Rajamani Microsoft Research India. 3 example projects. I. Statistical Debugging. Statistical Analysis. What we measure Path coverage for each test case (intra-procedural, a cyclic)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Probability & Statistics : O pportunities for Formal Methods (?) Research' - kieran-bowman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
probability statistics o pportunities for formal methods research

Probability & Statistics: Opportunities for Formal Methods (?) Research

Sriram K. Rajamani

Microsoft Research India

statistical analysis
Statistical Analysis
  • What we measure
    • Path coverage for each test case (intra-procedural, acyclic)
    • Outcome (success/failure)
  • Compute four statistics for every path
statistical analysis1
Statistical Analysis

Context: How much is the context of a path correlated with failure?

Increase: How much more is the path

correlated with failure?

Recall: What fraction of all failures occur

when this path is covered?

Confidence: Overall measure that combines

increase and recall

slide10

0

1

11011

.

.

Potential

root causes

Test results

Pass/fail

Test suite

Automated/Manual

Visual Studio unit tests

Visual Studio Test Elements

Holmes

Statistical analysis

Code coverage

Holmes path coverage

Historical debugging

Trishul M. Chilimbi, Ben Liblit, Krishna K. Mehra, Aditya V. Nori, Kapil Vaswani: HOLMES: Effective statistical debugging via efficient path profiling. ICSE 2009: 34-44

sample bug report
Sample Bug Report

The customer experiences some deadlocks on a server. The problem is random and may occur from several times a week to once a month. The system looks hung because the global resource 'ObpInitKillMutant' is help by a thread which tries to close a file forever. So all the processes having a thread waiting on 'ObpInitKillMutant' stop working fine. Drivers such as TCP/IP continue to respond normally but it's impossible to connect to any share.

0: kd> !thread 82807020

ChildEBPRetAddrArgs to Child

80c7a028 00000000 00000000 ntkrnlmp!IopAcquireFileObjectLock+0x58

82a6d7a0 80c7a028 00120089 ntkrnlmp!IopCloseFile+0x79

82a6d7a0 80c7a010 80f6da40 ntkrnlmp!ObpDecrementHandleCount+0x112

00000324 7ffdef01 00000000 ntkrnlmp!NtClose+0x170

00000324 7ffdef01 00000000 ntkrnlmp!KiSystemService+0xc9

00000324 80159796 000000c9 ntkrnlmp!ZwClose+0xb

000000c9 e185f648 00000000 ntkrnlmp!ObDestroyHandleProcedure+0xd

809e3008 801388e4 82a6d926 ntkrnlmp!ExDestroyHandleTable+0x48

00000001 82a6d7a0 7ffde000 ntkrnlmp!ObKillProcess+0x44

00000001 82a6d7a0 82a6d7f0 ntkrnlmp!PspExitProcess+0x54

00000000 f0941f04 0012fa70 ntkrnlmp!PspExitThread+0x447

ffffffff 00000000 00002a60 ntkrnlmp!NtTerminateProcess+0x13c

ffffffff 00000000 00002a60 ntkrnlmp!KiSystemService+0xc9

00000000 00000000 00000000 NTDLL!NtTerminateProcess+0xb

REGISTERS:

eax=00000005 ebx=e3185488 ecx=0000083c edx=e2dddc68

Textual description

of bug

Stack trace

Processor state

desire use existing search engines
Desire: Use existing search engines

The customer experiences some deadlocks on a server. The problem is random and may occur from several times a week to once a month. The system looks hung because the global resource 'ObpInitKillMutant' is help by a thread which tries to close a file forever. So all the processes having a thread waiting on 'ObpInitKillMutant' stop working fine. Drivers such as TCP/IP continue to respond normally but it's impossible to connect to any share.

0: kd> !thread 82807020

ChildEBPRetAddrArgs to Child

80c7a028 00000000 00000000 ntkrnlmp!IopAcquireFileObjectLock+0x58

82a6d7a0 80c7a028 00120089 ntkrnlmp!IopCloseFile+0x79

82a6d7a0 80c7a010 80f6da40 ntkrnlmp!ObpDecrementHandleCount+0x112

00000324 7ffdef01 00000000 ntkrnlmp!NtClose+0x170

00000324 7ffdef01 00000000 ntkrnlmp!KiSystemService+0xc9

00000324 80159796 000000c9 ntkrnlmp!ZwClose+0xb

000000c9 e185f648 00000000 ntkrnlmp!ObDestroyHandleProcedure+0xd

809e3008 801388e4 82a6d926 ntkrnlmp!ExDestroyHandleTable+0x48

00000001 82a6d7a0 7ffde000 ntkrnlmp!ObKillProcess+0x44

00000001 82a6d7a0 82a6d7f0 ntkrnlmp!PspExitProcess+0x54

00000000 f0941f04 0012fa70 ntkrnlmp!PspExitThread+0x447

ffffffff 00000000 00002a60 ntkrnlmp!NtTerminateProcess+0x13c

ffffffff 00000000 00002a60 ntkrnlmp!KiSystemService+0xc9

00000000 00000000 00000000 NTDLL!NtTerminateProcess+0xb

REGISTERS:

eax=00000005 ebx=e3185488 ecx=0000083c edx=e2dddc68

Textual description

of bug

Stack trace

Processor state

slide17

Features as typed documents

  • Trees with 4 constructors:
  • (1) unordered bag of terms,
  • (2) ordered list of terms,
  • (3) weighted terms, and
  • (4) key-value pairs

Type BaseType = String |Int

Type textDoc = Bag( BaseType set)

Type NamedDoc =

Value(BaseType)

| KeyValuePair(BaseType Doc)

Type Doc =

Null

| Base(BaseType)

| Bag(NamedDoc set)

| Ordered(NamedDoc list)

  • We use typed documents to represent features
  • Domain specific feature modeling done by a domain expert
  • Indexing and searching possible generically from the type structure.
indexing and searching features
Indexing and searching features

We define a transformer T such that:

  • T maps typed documents to bags of words.

2. FeatureSimilarityScore(doc1, doc2) = FullTextScore(T(doc1),T(doc2))

Enables us to use existing fulltext search engines to index and search features

slide19

Bug repository

Index

T

Similar bugs

T

Query

Stack trace

Code snippets

Emails

Feature parsers

Feature parsers

Stack trace parser

Register information parser

Debug

advisor

report

Relationship graph

slide20

Bug repository

Index

T

Similar bugs

T

Query

Stack trace

Code snippets

Emails

Feature parsers

Feature parsers

Stack trace parser

Register information parser

Link Analysis

Debug

advisor

report

Repositories

Version control

Bug repository

Debug logs

Relationship builder

Relationship graph

( of millions of nodes)

Relationship graph

precision recall
Precision/Recall

B. Ashok, Joseph M. Joy, Hongkang Liang, Sriram K. Rajamani, Gopal Srinivasa, Vipindeep Vangala: DebugAdvisor: a recommender system for debugging. ESEC/SIGSOFT FSE 2009: 373-382

the specification challenge
The Specification Challenge

Question: Can we have programmers write high-level guidelines and infer low-level specifications automatically?

  • Probabilistic inference can be used to calculate low-level specifications from high-level guidelines (such as programmer intuitions)
  • Before you perform validation you need a specification
  • It is very hard to get people to write specifications
    • detailed, laborious, error prone, hard to maintain
web application vulnerabilities
Web Application Vulnerabilities

Source

void ProcessRequest()

{

string s = GetUserInput("name");

s = Validate(s);

ExecuteQuery(“select …" + s + “…”);

}

Sanitizer

Sink

Critical

Database

Every (dataflow) path from Source to Sink should go through a Sanitizer

propagation graph
Propagation Graph

void ProcessRequest()

{

string s1 = ReadData1("name");

string s2 = ReadData2("encoding");

string s11 = Prop1(s1);

string s22 = Prop2(s2);

string s111 = Cleanse(s11);

string s222 = Cleanse(s22);

WriteData("Parameter " + s111);

WriteData("Header " + s222);

}

Propagation graph

m1→ m2 iff information flows from m1tom2

information flow vulnerabilities
Information flow vulnerabilities

void ProcessRequest()

{

string s1 = ReadData1("name");

string s2 = ReadData2("encoding");

string s11 = Prop1(s1);

string s22 = Prop2(s2);

string s111 = Cleanse(s11);

string s222 = Cleanse(s22);

WriteData("Parameter " + s111);

WriteData("Header " + s222);

}

source

source

sanitizer

sink

information flow vulnerabilities1
Information flow vulnerabilities

void ProcessRequest()

{

string s1 = ReadData1("name");

string s2 = ReadData2("encoding");

string s11 = Prop1(s1);

string s22 = Prop2(s2);

string s111 = Cleanse(s11);

string s222 = Cleanse(s22);

WriteData("Parameter " + s111);

WriteData("Header " + s222);

}

source

source

sanitizer

sink

slide28
Goal

Given a propagation graph, can we infer a specification or ‘complete’ a partial specification?

Idea

Use Bayesian Reasoning!

the essential idea
The Essential Idea
  • Assumptions (beliefs) that serve as basis
    • Most (information flow) paths in propagation graphs are secure. That is very few paths go from a source to a sink with no intervening sanitizer.
    • It is unlikely that a path between a source and a sink contains two or more sanitizers
    • If information flows from function1 to function2, function1 is more likely to be a source than function2
    • If information flows from function1 to function2, function2 is more likely to be a sink than function1
  • All such beliefs, with numeric weights, are encoded in a factor graph
  • Solution computed using belief propagation algorithm (we use infer.net from MSR Cambridge)
discovered specifications

Discovered Vulnerabilities

Discovered Specifications

Original

With Merlin

False positives eliminated

Current status:

Final false positive rate for CAT.NET for 10 webapps with Merlin < 1%

V. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, Anindya Banerjee: Merlin: specification inference for explicit information flow problems. PLDI 2009: 75-86

other problems where we have tried this idea
Other problems where we have tried this idea
  • User-mode/kernel-mode pointer inference
  • Type inference for ownership/aliasing type systems (Beckman & Nori)
  • Label inference in Kripke structures

Hypothesis:

  • General framework for a wide variety of annotation inference problems
  • Allows mixing logical and heuristic inference
slide32

Reflection...

  • Software is more than code
    • Software engineering is more than verification/bug-finding
  • Are the techniques I presented “formal methods”?
  • Analyzing programs together with data can solve several new problems in software productivity
    • Logic + Probability is essential if you want to analyze Code + Data