Carnegie Mellon University

SAT-Based Decision Procedures for Subsets of First-Order Logic Part II: Separation Logic Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant

Outline • Background • SAT-based Decision Procedures • Equality with Uninterpreted Functions • Translating to propositional formula • Exploiting positive equality and sparse transitivity • Separation Logic • Translating to propositional formula • Hybrid encoding techniques

Suitable for verifying wider class of systems Terms (T ) Integer Expressions ITE(F, T1, T2)If-then-else Fun (T1, …, Tk) Function application T + 1 Increment T – 1 Decrement Formulas (F )Boolean Expressions F, F1F2, F1F2 Boolean connectives T1 = T2 Equation T1 < T2 Inequality Pred(T1, …, Tk) Predicate application Separation Logic with Uninterpreted Functions (SUF)

Eliminate function and predicate applications using fresh variables and ITE expressions [Bryant, German, Velev, CAV’99] f(x) v1andf(y) ITE(x = y, v1, v2) v Integer variable Formulas (F )Boolean Expressions F, F1F2, F1F2 Boolean connectives T1 = T2 Equation T1 < T2 Inequality Pred(T1, …, Tk) Predicate application Separation Predicate b Boolean variable SUF  Separation Logic Terms (T ) Integer Expressions ITE(F, T1, T2) If-then-else Fun (T1, …, Tk) Function application T + 1 Increment T - 1 Decrement

Boolean Formula SAT Solver satisfiable/unsatisfiable Eager Boolean Encoding Methods for Separation Logic Separation Logic Formula Small Domain Encoding (SD) Per-Constraint Encoding (EIJ)

x x x+1 x+1 0x1x00y1y00y1y00z1z00z1z00x1x0 + 1 y y z z Values increase Small Domain Encoding (SD) [Bryant, Lahiri, Seshia, CAV’02] x  y  y  z  z  x+1 Observation: To check satisfiability, need to consider all possible relative orderings of finitely-many expressions • Can use Boolean encoding of finite range of values • 4 values in this case, so 2-bit encoding

e1 x  y y  z e2 e1 e2 e3 e3 z  x+1  Overall Boolean Encoding e1 e2 e4 New Separation Predicate  e4 x  z e4 e3 Transitivity Constraints Per-Constraint Encoding (EIJ) [Strichman, Seshia, Bryant, CAV’02] x  y  y  z  z  x+1

c3 + c4 c3 + c2 c1 + c4 c1 + c2 c4 c3 Enforcing Transitivity Constraints xy + c1 • Graph Representation of Separation Constraints • Directed multigraph where edges labeled by constants • Fourier-Motzkin Elimination • Eliminate nodes in succession • Possibly exponential growth in edges x c1 x y z c1 c2 y

c3 + c4 c3 + c2 c1 + c4 c1 + c2 c4 c3 Introducing New Predicates xy + c1 x c1 x y z Sample Predicates c1 c2 y Sample Transitivity Constraint Sample Ordering Constraint (for c1 < c2)

Comparing Eager Encoding Methods • Of SD and EIJ encoding methods, which one is better? • Comparison with respect to • Size of resulting Boolean formula • Performance of SAT solver

Example: N = 6813 • Method • Boolean Encoding Size • EIJ • > 1000000 • SD • 54465 Size of Boolean Encoding: SD better than EIJ • Let N be size of original separation logic formula • Size of a directed acyclic graph representation • SD encoding size is worst-case O(N2) • EIJ encoding size is worst-case O(2N) • Can generate O(2N) transitivity constraints

Impact on SAT problem: SD vs EIJ • Experimentally compared zChaff performance on SD and EIJ encodings of several unsatisfiable formulas • Sample result: EIJ better than SD for zChaff

Impact on SAT: Why is EIJ better than SD? • Conjecture: For SD, SAT solver has to “discover” transitivity constraints as conflict clauses • Violation of transitivity constraint might be discovered only after assigning bits of several bit-vectors • EIJ adds all such constraints a priori • Less learning and backtracking required by the SAT solver

Eager Encoding Tradeoffs • SD encoding • Polynomial size encoding • Worse for SAT solvers • EIJ encoding • Worst-case exponential size encoding • Better for SAT solvers • Can we automatically select between SD and EIJ based on the input formula?

Selection Strategy Seshia, Lahiri, Bryant, DAC ‘03 • Problem: • Computationally hard to estimate number of transitivity constraints • Can we use a different metric? • Idea: Identify feature of the input formula that varies monotonically with run-time of EIJ (but not with run-time of SD) Estimate number of transitivity constraints, C NO YES C > T ? Use SD encoding Use EIJ encoding

A Good Formula Feature: Number of Separation Predicates

Revised Selection Strategy Easy to count number of separation predicates Very approximate measure of # of transitivity constraints • Constraints only relate predicates that share variables • Also need to automate setting of threshold T • Statistically estimate from “training” set of benchmarks Count number of separation predicates, m NO YES m > T ? Use SD encoding Use EIJ encoding

{x,y,z} shared Identifying Variable Classes Æ Ç Ç u¸v Æ z¸x+1 u= v-2 y¸z x¸y {u,v} shared Assignments to {u,v} are independent of those to {x,y,z}

Compute 1. Variable classes based on predicates 2. Number of separation predicates for each class {u,v}, mk {x,y,z}, m1 mk > T ? m1 > T ? YES YES NO NO SD SD EIJ EIJ Encode each class using SD or EIJ based on local decision Encoded Boolean Formula Hybrid Encoding Technique Separation Logic Formula

Automatically Selecting a Threshold Value: Intuition EIJ run time increases drastically beyond a certain number of separation predicates

Automatically Selecting a Threshold Value using Clustering Cluster total time (Y-axis) values, minimizing variance of each cluster

Experimental Evaluation Setup • Compared Hybrid against • SD and EIJ encodings • Cooperating Validity Checker (CVC) based on lazy encoding method [Stump et al.’02] • Stanford Validity Checker (SVC) – non SAT-based [Barrett et al. ’96] • CVC & SVC can handle more expressive logics than SUF • Benchmarks • 49 unsatisfiable SUF formulas • Load-store unit, out-of-order unit, device driver code, compiler validation, DLX pipeline • Threshold value calculated from subset of 16 benchmarks • Worked well for 39 out of the 49 benchmarks • Setup • Used zChaff SAT solver • Imposed timeout of 1800 sec. on total time (Encoding+SAT)

Hybrid vs. SD (39/49 benchmarks) Hybrid better SD better

Hybrid vs. EIJ (39/49 benchmarks) Hybrid better EIJ better

Hybrid vs. Lazy Encoding (CVC) (39/49 benchmarks) Hybrid better CVC better

Hybrid vs. Non-SAT-based Procedure (SVC) (39/49 benchmarks) Hybrid better SVC better

SD outperforms Hybrid on 10/49 benchmarks Hybrid better SD better

Conclusions & Ongoing Work • Hybrid combination of EIJ and SD encodings • is robust to formula variations • outperforms lazy encoding methods (CVC) • outperforms non-SAT-based methods (SVC) • Ongoing & Future work • Alternate estimators for number of transitivity constraints • Threshold setting technique based on clustering applies to other CAD problems too • Combination of lazy and eager encoding techniques might perform well on satisfiable formulas? • More on UCLID project webpage http://www.cs.cmu.edu/~uclid

Carnegie Mellon University