1 / 36

Heap Decomposition for Concurrent Shape Analysis

Heap Decomposition for Concurrent Shape Analysis. R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University. G. Ramalingam MSR India. J. Berdine MSR Cambridge. Dagstuhl 08061, February 7, 2008. Thread modular analysis for coarse-grained concurrency.

ryo
Download Presentation

Heap Decomposition for Concurrent Shape Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heap Decompositionfor Concurrent Shape Analysis R. ManevichT. Lev-AmiM. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl 08061, February 7, 2008

  2. Thread modular analysisfor coarse-grained concurrency • E.g., [Qadeer & Flanagan, SPIN’03][Gotsman et al., PLDI’07] … • With each lock lk • subheap h(lk) • Partition heapH = h(lk1) *…* h(lkn) • local invariant I(lk)inferred/specified • When thread t • acquires lk it assumes I(lk) • releases lk it ensures I(lk) • Can analyze each thread “separately” • Avoid explicitly enumerating all thread interleavings

  3. Thread modular analysisfor fine-grained concurrency? • CAS (Compare And Swap) • No locks means more interference between threads • No nice heap partitioning • Still idea of reasoning about threads separately appealing CAS CAS CAS CAS

  4. Overview • State space is too large for two reasons • Unbounded number of objects  infinite • Apply finitary abstractions to data structures (e.g., abstract away length of list) • Exponential in the number of threads • Observation: • Threads operate on part of state • Correlations between different substates often irrelevant to prove safety properties • Our approach: develop abstraction for substates • Abstract away correlations between substates of different threads • Reduce exponential state space

  5. Non-blocking stack [Treiber 1986] [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } #define EMPTY -1typedef int data type;typedef struct node t { data type d; struct node t *n;} Node;typedef struct stack t { struct node t *Top;} Stack; [9] data_type pop(Stack *S){[10] do {[11] Node *t = S->Top;[12] if (t == NULL)[13] return EMPTY;[14] Node *s = t->n;[15] data_type r = s->d;[16] } while (!CAS(&S->Top,t,s));[17] return r;[18] }

  6. t n x Example: successful push [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } Top n n

  7. Top t n x Example: successful push [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } CAS succeeds = n n

  8. Top t n n x Example: unsuccessful push CAS fails [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }  n n

  9. Concrete states with storable threads thread object:name +program location prod1 Top cons1 t pc=7 x t pc=14 n n prod2 cons2 t s pc=16 pc=6 n x t n local variable next field of list

  10. Full state S1 prod1 Top cons1 t pc=7 x t pc=14 n n prod2 cons2 t s pc=16 pc=6 n x t n

  11. Decomposition(S1) M1 M2 M3 M4 prod1 Top A substate represents all full states that contain it cons1 Top Top Top t pc=7 t pc=14 x n n n prod2 n cons2 Decomposition isstate-sensitive(depends on values of pointers and heap connectivity) n t s pc=6 pc=16 n n n x n t n Decomposition(S1) = M1  M2  M3  M4 Note that S1Decomposition(S1)

  12. Full states S1  S2 S2 S1 prod1 prod2 Top Top cons1 cons2 t t pc=7 pc=7 x x t t pc=14 pc=14 n n n n prod2 prod1 cons2 cons1 t t s s pc=16 pc=16 pc=6 pc=6 n n x x t t n n

  13. pc=16 pc=14 pc=16 pc=14 pc=7 pc=6 pc=6 pc=7 Decomposition(S1  S2)improve explanation prod1 Top M4 Top cons1 M2 Top Top t t cons2 prod2 n n n x n t s n n n n t n x M3 n n    M1 Decomposition(S1S2) = (M1K1)  (M2K2)  (M3K3)  (M4K4) prod2 Top Top cons2 (S1S2)  Decomposition(S1S2)Cartesian abstraction ignorescorrelations between substates Top Top t K1 prod1 K3 n t cons1 x n t n n n n n s x K4 n State space exponentially more compact n t n K2 n

  14. Abstraction properties • Substates in each subdomain correspond to a single thread • Abstract away correlations between threads • Exponential reduction of state space • Substates preserve information on part of heap (relevant to one thread) • Substates may overlap • Useful for reasoning about programs withfine-grained concurrency • Better approximate interference between threads

  15. Main results • New parametric abstraction for heaps • Heap decomposition + Cartesian abstraction • Parametric in underlying abstraction + decomposition • Parametric sound transformers • Allows balancing efficiency and precision • Implementation in HeDec • Heap Decomposition + Canonical Abstraction • Used to prove interesting properties of heap-manipulating programs with fine-grained concurrency • Linearizability • Analysis scales linearly in number of threads

  16. Sound transformers {XHj2}j2 {Xj4}j4 {XHj1}j1 {XHj3}j3 # {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’

  17. {XHj2}j2 {XHj4}j4 {XHj1}j1 {XHj3}j3 # # # # {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’ Pointwise transformers efficient often too imprecise

  18. pc=6 Imprecision example [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } Top # : schedules prod1 and executes x->n=t M2 prod2 n t n n x But where do x and t of prod1 point to?

  19. prod2 prod2 Top Top cons1 t t # pc=7 pc=7 x x t pc=14 n n n n prod1 cons2 t s pc=16 pc=6 n n x t n Imprecision example [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } false alarm:possible cyclic list

  20. {XHj2}j2 {XHj4}j4 {XHj1}j1 {XHj3}j3 {XHj1}{XHj1}{XHj1}{XHj1} # #({XHj1}{XHj2}{XHj3}{XHj4}) {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’ Full composition transformers exponential space blow-up precise

  21. {XHj2}j2 {XHj4}j4 {XHj1}j1 {XHj3}j3 {XHj1}{XHj2} {XHj1}{XHj3} {XHj1}{XHj4} Partial composition

  22. {XHj1}{XHj2} {XHj1}{XHj3} {XHj1}{XHj4} # # # #({XHj1}{XHj2}) #({XHj1}{XHj3}) #({XHj1}{XHj4}) {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’ Partial composition efficient and precise

  23. {XHj1}{XHj2} pc=7 pc=6 pc=6 pc=7 Partial composition example prod1 Top M2 Top t prod2 n x n t n n n x n  M1 prod2 Top Top t K1 prod1 n x t n n n n x n K2

  24. {XHj2}j2 {XHj1}j1 {XHj1}{XHj2} pc=6 pc=7 Partial composition example K2M1 K2k1 false alarm avoided prod2 prod2 Top Top t t pc=7 pc=7 x x n n n prod1 n prod1 t t pc=7 n n n x x

  25. Experimental results • List-based fine-grained algorithms • Non-blocking stack [Treiber 1986] • Non-blocking queue [Doherty and Groves FORTE’04] • Two-lock queue [Michael and Scott PODC’96] • Benign data races • Verified absence of nullderef + mem. Leaks • Verified Linearizability • Analysis built on top of existing full heap analysis of [Amit et al. CAV’07] • Scaled analysis from 2/3 threads to 20 threads • Extended to unbounded threads (different work)

  26. Experimental results • Exponential time/space reduction • Non-blocking stack + linearizability

  27. Related work • Disjoint regions decomposition [TACAS’07] • Fixed decomposition scheme • Most precise transformer is FNP-complete • Partial join • [Manevich et al. SAS’04] • Orthogonal to decomposition • In HeDec we combine decomposition + partial join • [Yang et al.] • Handling concurrency for an unbounded number of threads • Thread-modular analysis [Gotsman et al. PLDI’07] • Rely-guarantee [Vafeadis et al. CAV’07] • Thread quantification (submitted)

  28. More related work • Local transformers • Works by Reynolds, O’Hearn, Berdine, Yang, Gotsman, Calcagno • Heap analysis by separation[Yahav & Ramalingam PLDI’04] [Hackett & Rugina POPL’05] • Decompose verification problem itself and conservatively approximate contexts • Heap decomposition for interprocedural analysis [Rinetzky et al. POPL’05] [Rinetzky et al. SAS’05] [Gotsman et al. SAS’06] [Gotsman et al. PLDI’07] • Decompose/compose at procedure boundaries • Predicate/variable clustering [Clark et al. CAV’00] • Statically-determined decomposition

  29. Conclusion • Parametric framework for shape analysis • Scaling analyses of program with fine-grained concurrency • Generalizes thread-modular analysis • Key idea: state decomposition • Also useful for sequential programs • Used prove intricate properties like linearizability • HeDec tool • http://www.cs.tau.ac.il/~tvla#HEDEC

  30. Future/ongoing work • Extended analysis for an unbounded number of threads via thread quantification • Orthogonal technique • Both techniques compose very well • Can we automatically infer good decompositions? • Can we automatically tune transformers? • Can we ruse ideas to non-shape analyses?

  31. Invited questions • How do you choose a decomposition? • How do you choose transformers? • How does it compare to separation logic? • What is a general principle and what is specific to shape analysis? • Caveats / limitations?

  32. How do you choose a decomposition? • In general this an open problem • Perhaps ctrex. refinement can help • Depends on property you want to prove • Aim at causes of combinatorial explosion • Threads • Iterators • For linearizability we used • For each thread t • Thread node, objects referenced by local variables, objects referenced by global variables • Objects referenced by global variables and objects correlated with seq. execution • Locks component: for each lock thread that acquires it

  33. How do you choose transformers? • In general challenging problem • Have to balance efficiency and precision • Have some heuristics • Core subdomains

  34. How does it compare to separation logic? • Relevant separating conjunction *r • Like * but without the disjointness requirement • Do you have an analog of the frame rule? • For disjoint regions decomposition [TACAS’07] • In general no, but instead we can use transformers of different level of precision#(I1  I2) = #precise(I1) #less-precise(I2)where #less-precise is cheap to compute • Perhaps can find conditions for which#(I1  I2) = #precise(I1)  I2 • Relativized formulae

  35. What is a general principle and what is specific to shape analysis? • Decomposing abstract domains is general • Substate abstraction + Cartesian product • Parametric transformers for Cartesian abstractions is general • Chopping down heaps by heterogeneous abstractions is shape-analysis specific

  36. Caveats / limitations? • Decomposition + transformers defined by user • Not specialized for program/property • Too much overlap between substates can lead to more expensive analyses • Too fine decomposition requires lots of composition • Partial composition is a bottle neck • We have the theory for finer grained compositions + incremental transformers but no implementation • Instantiated framework for just one abstraction (Canonical Abstraction) • Can this be useful for separation logic-based analyzers?

More Related