1 / 17

Data Modeling for Program Analysis

Data Modeling for Program Analysis. Scott McPeak OSQ Retreat. A Program Verifier. Verification assures that a program meets some specification, e.g. "no segfaults" Full correctness vs. partial specs This is undecidable: annotations. new obligations. useful facts. Program. Specification.

bpowe
Download Presentation

Data Modeling for Program Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Modeling forProgram Analysis Scott McPeakOSQ Retreat

  2. A Program Verifier • Verification assures that a program meets some specification, e.g. "no segfaults" • Full correctness vs. partial specs • This is undecidable: annotations new obligations useful facts Program Specification Annotations

  3. Verifier Architecture Verification condition generation (semantics) "proved" program predicates Theorem prover (collectively imply program meets spec) annotations "not proved" specification (hardcoded)

  4. Verification Benefits • Potential for reducing costs of testing and debugging is enormous • Memory safety • Concurrency safety • Adherence to domain-specific protocols • Annotation appeal: capture "why" info • Could prove absence of certain security violations

  5. Run Time is Too Late • Doesn't reduce testing cost • Run-time cost may be significant • Cumulative across different analyses • Recovery after run-time failure? • Delay between introduction of a bug and the discovery of its effect

  6. Will Anyone Annotate? • Of course, if cost/benefit ratio is right • Benefits can be high (previous slide) • Abstraction is key to controlling cost • Can re-use "why" knowledge; libraries, etc. • Common tasks must be easy (e.g. array of non-null elements) • Module-wide defaults under user control

  7. Development Model code compile verifier testing ... type error failed proof wrong behavior fix diagnosis assistant debugging ... explanation fix

  8. Data Modeling • Program analyzer must abstract application data (otherwise it's just executing!) • Model: family of mathematical objects, and axioms which relate them • Enormous design space, little guidance • Direct impact on success of analysis

  9. Example: Strings • Initial model: two function symbols • size(addr) # of allocated bytes • strlen(addr) least index of a 0 byte • strcpy(d, s) pre: size(d) < strlen(s) post: strlen(d) = strlen(s) • strcat(d, s) pre: size(d) - strlen(d) < strlen(s) post: strlen(d) = pre(strlen(d) + strlen(s))

  10. String as a Set • Add the predicatecontains(addr, ch) ! {T,F} • strcpy(d, s)post: 8 ch. contains(s, ch) , contains(d, ch) • strchr(s, ch) ! rpost: contains(s, ch) )9 i. r = s+i &&: contains(s, ch) ) r = NULL

  11. String as a Sequence • Add another symbol "[]"addr[i] ! ch • strcpy(d, s)post: 8 i. d[i] = s[i] • strchr(s, ch) ! rpost: (9i. s[i]=ch) ) *r=ch &&: (9i. s[i]=ch) ) r=NULL

  12. Example: Integers • "int" is easy to model, right? Well... • Mathematical integers • Finite partition: { <0, =0, =1, >1 } • 32-bit 2's complement with wraparound

  13. Example: Memory struct array mem 8 g a int indexes toplevel obj addr field offsets 3 &x "x" = sel(mem0, addrx) malloc(..) "a.g[3]" = sel(sel(sel(mem0, addra), g), 3) "a" "a.g"

  14. sel(selPtr(obj, rest), index) = v selPtr(obj, sub(rest, index)) = v selPtr(obj, whole) = obj Pointers • Pointers are access paths • "&(a.g[3])" = sub(sub(sub(whole, a), g), 3) • Rules to read via pointers • Can also write, do pointer arithmetic, deeper indexing, e.g. "&(p->x)"

  15. Data Structure Invariants • Classic approach: universal quantifier • 8 a. type(a)=Foo ) a->x = a->y + 1 • Field admission predicate • Bar *p; admission: p!=NULL; • Object state field: "ok" vs. "not ok" • Change a field ! state:="not ok" • Manually certify "ok", precondition=invariant • 8 a. type(a)=Foo ) a->state="ok"

  16. Example: Change Sets • Globals: list of changed / list of unchanged • Not ideal.. name sets of globals? • Hierarchical mem: changed object is easy • new = update(old, obj_addr, some_value) • But changed field (of many objects) is hard • Possible alternative: staged & weakened invariants; state what is still true, rather than naming what has changed

  17. Conclusions • Try to capture invariants implicitly, via representation choices • Be explicit about related entities: inDegree(n)=d vs. inDegree1(n, referrer) • Let user select among possible models, even to choose not to model certain fields • Try to think like a programmer

More Related