Caching in Backtracking Search

Caching in Backtracking Search Fahiem BacchusUniversity of Toronto

Introduction • Backtracking search needs only space linear in the number of variable (modulo the size of the problem representation). • However, its efficiency can greatly benefit from using more space to cache information computed during search. • Caching can provable yield exponential improvements in the efficiency of backtracking search. • Caching is an any-space feature. We can use as much or as little space for caching as we want without affecting soundness or completeness. • Unfortunately, caching can also be time consuming. How do we exploit the theoretical potential of caching in practice? Fahiem Bacchus, University of Toronto

Introduction • We will examine this question for • The problem of finding a single solution • And for problems that require considering all solutions • Counting the number of solutions/computing probabilities • Finding optimal solutions. • We will look at • The theoretical advantages offered by caching. • Some of the practical issues involved with realizing these theoretical advantages. • Some of the practical benefits obtained so far. Fahiem Bacchus, University of Toronto

Outline • Caching when searching for a single solution. • Clause learning in SAT. • Theoretical results. • Its practical application and impact. • Clause learning in CSPs. • Caching when considering all solutions. • Formula caching for sum of products problems • Theoretical results. • Practical application Fahiem Bacchus, University of Toronto

1. Caching when searching for a single solution Fahiem Bacchus, University of Toronto

1.1 Clause Learning in SAT 6 Fahiem Bacchus, University of Toronto 3/10/2014

Clause Learning in SAT (DPLL) • Clause learning is the most successful form of caching when searching for a single solution [Marques-Silva and Sakallah, 1996; Zhang et al., 2001]. • Has revolutionized DPLL SAT solvers (i.e., Backtracking SAT solvers). Fahiem Bacchus, University of Toronto

Clause Learning in SAT 1. Branch on a variable 2. Perform Propagation Unit Propagation Fahiem Bacchus, University of Toronto

Clause Learning in SAT • Every inferred literal is labeled with a clausal reason. • The clausal reason for a literal is a subset of the previous literals on the path whose setting implies the literal Fahiem Bacchus, University of Toronto

Contradiction: 1. D is forced to be both True and False. 2. The clause (Q,P,D) has been falsified Falsified clauses are called conflict clauses. Clause Learning in SAT Fahiem Bacchus, University of Toronto

Clause Learning in SAT • Clause learning occurs when a contradiction is reached. • This involves a sequence of resolution steps. • Any implied literal in a clausal reason can be resolved away by resolving the clause with the clausal reason for the implied literal. 11 Fahiem Bacchus, University of Toronto

Clause Learning in SAT • SAT solvers utilize a particular sequence of resolutions against the conflict clause. • 1-UIP learning [Zhang et al., 2001]—iteratively resolve away the deepest implied literal in the clause until the clause contains only one literal from the level the contradiction was generated. Fahiem Bacchus, University of Toronto

Far Backtracking in SAT 1-UIP Clause • Once the 1-UIP clause is learnt the SAT Solver backtracks to the level this clause became unit. • It then uses the clause to force a new literal. • Performs UP • Continues its search. Fahiem Bacchus, University of Toronto

Theoretical Power of Clause Learning • The power of clause learning has been examined from the point of view of the theory of proof complexity [Cook & Reckhow 1977]. • This area looks at the question of how large proofs can become and their relative sizes in in different propositional proof systems. • DPLL with Clause learning performing resolution (a particular type or resolution). • Various restricted versions of resolution have been well studied. • [Buresh-Oppenhiem, Pitassi 2003] contains a nice review of previous results and a number of new results in this area. Fahiem Bacchus, University of Toronto

Theoretical Power of Clause Learning • Every DPLL search tree refuting an UNSAT instance contains a TREE-Resolution. • TREE-Resolution proofs can be exponentially larger than REGULAR-Resolutions proofs. • REGULAR-Resolutions proofs can be exponentially larger than general (unrestricted) resolution proofs.  UNSAT formulas min_size(DPLL Search Tree) ≥ min_size(TREE-Resolution) >> min_size(REGULAR-Resolution) >> min_size(general resolution) 15 Fahiem Bacchus, University of Toronto 3/10/2014

Theoretical Power of Clause Learning • Furthermore every TREE-Resolution proof is a REGULAR-Resolution proof and every REGULAR-Resolution proof is a general resolution proof.  UNSAT formulas min_size(DPLL Search Tree) ≥ min_size(TREE-Resolution) ≥ min_size(REGULAR-Resolution) ≥ min_size(general resolution) 16 Fahiem Bacchus, University of Toronto 3/10/2014

Theoretical Power of Clause Learning • [Beame, Kautz, and Sabharwal 2003] showed that clause learning can SOMETIMES yield exponentially smaller proofs than REGULAR. • Unknown if general resoution proofs are some times smaller.  UNSAT formulas min_size(DPLL Search Tree) ≥ min_size(TREE-Resolution) >> min_size(REGULAR-Resolution) >> min_size(Clause Learning DPLL Search Tree) ≥ min_size(general resolution) 17 Fahiem Bacchus, University of Toronto 3/10/2014

Theoretical Power of Clause Learning • It is still unknown if REGULAR or even TREE resolutions can sometimes be smaller than the smallest Clause Learning DPLL Search tree. 18 Fahiem Bacchus, University of Toronto 3/10/2014

Theoretical Power of Clause Learning • It is also easily observed [Beame, Kautz, and Sabharwal 2003] that with restarts clause learning can make the DPLL Search Tree as small as the smallest general resolution proof on any formula. • UNSAT formulasmin_size(Clause Learning + Restarts DPLL Search Tree)= min_size(general resolution) 19 Fahiem Bacchus, University of Toronto 3/10/2014

Theoretical Power of Clause Learning • In sum. Clause Learning, especially with restarts, has the potential to yield exponential reductions in the size of the DPLL search tree. • With clause learning DPLL can potentially solve problems exponentially faster. • That this can happen in practice has been irrefutably demonstrated by modern SAT solvers. • Modern SAT solvers have been able to exploit the theoretical potential of clause learning. 20 Fahiem Bacchus, University of Toronto 3/10/2014

Theoretical Power of Clause Learning • The theoretical advantages of clause learning also hold for CSP backtracking search • So the question that arises is can the theoretical potential of clause learning also be exploited in CSP solvers. 21 Fahiem Bacchus, University of Toronto 3/10/2014

1.1 Clause Learning in CSPs 22 Fahiem Bacchus, University of Toronto 3/10/2014

Clause Learning in CSPs • Joint work with George Katsirelos who just completed his PhD with me “NoGood Processing in CSPs” • Learning has been used in CSPs, but have not had the kind of impact Clause Learning has had in SAT.[Decther 1990; T. Schiex & G. Verfaillie 1993; Frost & Dechter 1994; Jussien & Barichard 2000] • This work has investigated NoGood learning. A NoGood is a set of variable assignments that cannot be extended to a solution. 23 Fahiem Bacchus, University of Toronto 3/10/2014

NoGood Learning • NoGood Learning is NOT Clause Learning. • It is strictly less powerful. • To illustrate this let us consider encoding a CSP as a SAT problem, and compare what Clause Learning will do on the SAT encoding to what NoGood Learning would do. 24 Fahiem Bacchus, University of Toronto 3/10/2014

Propositional Encoding of a CSP—the propositions. • A CSP consists of a • Set of variables Vi and constraints Cj • Each variable has a domain of values Dom[Vi] = {d1, …, dm}. • Consider the set of propositions Vi=dj one for each value of each variable. • Vi=dj means that Vi has been assigned the value dj. • True when the assignment has been made. • ¬(Vi=dj) means that Vi has not been assigned the value dj • True when dj has been pruned from Vi’s domain. • if Vi has been assigned a different value, all other values (including dj) are pruned from its domain. • Usually write Vi≠djinstead of ¬(Vi=dj). • We encode the CSP using clauses over these assignment propositions. 25 Fahiem Bacchus, University of Toronto 3/10/2014

Propositional Encoding of a CSP—the clauses. • For each variable V with Dom[V]={d1,…,dk} we have the following clauses: • (V=d1,V=d2,…,V=dk) (must have a value) • For every pair of values (di, dk)the clause (V ≠ di, V ≠ dk) (has a unique value) • For each constraint C(X1,…,Xk) over some set of variables we have the following clauses: • For each assignment to its variables that falsifies the constraint we have a clause blocking that assignment. • If C(a,b,…,k) = FALSE then we have the clause (X1 ≠ a, X2 ≠ b, …, Xk ≠ k) • This is the direct encoding of [Walsh 2000]. 26 Fahiem Bacchus, University of Toronto 3/10/2014

DPLL on this Encoded CSP. • Unit Propagation on this encoding is essentially equivalent to Forward Checking on the original CSP. 27 Fahiem Bacchus, University of Toronto 3/10/2014

DPLL on the encoded CSP • Variables Q, X, Y, Z, ... • Dom[Q] = {0,1} • Dom[X,Y,Z] = {1,2,3} • Constraints • Q + X + Y ≥ 3 • Q + X + Z ≥ 3 • Q + Y + Z ≤ 3 Fahiem Bacchus, University of Toronto

DPLL on the encoded CSP • Clause learning Fahiem Bacchus, University of Toronto

DPLL on the encoded CSP • Clause learning A 1-UIP Clause Fahiem Bacchus, University of Toronto

DPLL on the encoded CSP • This clause is not a NoGood! • It asserts that we cannot have Q = 0, Y = 2, and Z ≠ 1 simultaneously. • This is a set of assignments and domain prunings that cannot lead to a solution. • A NoGood is only a set of assignments. • To obtain a NoGood we have to further resolve awayZ = 1 from the clause. Fahiem Bacchus, University of Toronto

DPLL on the encoded CSP • NoGood learning • This clause is a NoGood. It says that we cannot have the set of assignments Q = 0, X = 1, Y = 2 • NoGood learning requires resolving the conflicts back to the decision literals. Fahiem Bacchus, University of Toronto

NoGoods vs. Clauses (Generalized NoGoods) • Unit propagation over a collection of learnt NoGoods is ineffective. • Nogoods are clauses containing negated literals only, e.g.,(Z ≠ 1, Y ≠ 0, X ≠ 3). If one of these clauses becomes unit, e.g., (X ≠ 3), the forced literal can only satisfy other NoGood clauses, it can never reduce the length of those clauses. • A single clause can represent an exponential number of NoGoods • (Q ≠ 1, Z = 1, Y = 1) is equivalent to (Domain = {1, 2, 3})(Q ≠ 1, Z ≠ 2, Y ≠ 2) (Q ≠ 1, Z ≠ 3, Y ≠ 2) (Q ≠ 1, Z ≠ 2, Y ≠ 3) (Q ≠ 1, Z ≠ 3, Y ≠ 3) Fahiem Bacchus, University of Toronto

NoGoods vs. Clauses (Generalized NoGoods) • The 1-UIP clause can prune more branches during the future search than the NoGood clause [Katsirelos 2007]. • Clause Learning can yield super-polynomially smaller search trees than NoGood Learning [Katsirelos 2007] Fahiem Bacchus, University of Toronto

Encoding to SAT • With all of these benefits of clause learning over NoGood learning the natural question isWhy not encode CSPs to SAT and immediately obtain the benefits of Clause Learning already implemented in modern SAT solvers? Fahiem Bacchus, University of Toronto

Encoding to SAT • The SAT theory produced by the direct encoding is not very effective. • Unit Prop. on this encoding only achieves Forward Checking (a weak form of propagation). • Under the direct encoding constraints of arity k yield 2O(k) clauses. Hence the resultant SAT theory is too large. • No direct way of exploiting propagators. • Specialized polynomial time algorithms for doing propagation on constraints of large arity. Fahiem Bacchus, University of Toronto

Encoding to SAT • Some of these issues can be address by better encodings, e.g., [Bacchus 2007, Katsirelos & Walsh 2007, Quimper & Walsh 2007]. But overall complete conversion to SAT is currently impractical. Fahiem Bacchus, University of Toronto

Clause Learning in CSPs without encoding • We can perform Clause Learning in a CSP solver by the following steps: • The CSP solver must keep track of the chronological sequence of variable assignments and value prunings made as we descend each path in the search tree. Fahiem Bacchus, University of Toronto

Clause Learning in CSPs without encoding • Each item must be labeled with a clausal reason consisting of items previously falsified along the path. Fahiem Bacchus, University of Toronto

Clause Learning in CSPs without encoding • Contradictions are labeled by falsified clauses, e.g., Domain Wipe Outs can be labeled by the must have a value clause. • From this information clause learning can be performed whenever a contradiction is reached. • These clauses can be stored in a clausal database • Unit Propagation can be run on this database as new value assignments or value prunings are preformed. • The inferences of Unit Propagation agument the other constraint propagation done by the CSP solver. Fahiem Bacchus, University of Toronto

Higher Levels of Local Consistency • Note that this technique works irrespective of kinds of inference performed during search. • That is, we can use any kind of inference we want to infer a new value pruning or new variable assignment—as long as we can label the inference with a clausal reason. • This raises the question of how do we generate clausal reasons for other forms inference. • [Katsirelos 2007] answers this question for the most commonly used form of inference: Generalized Arc Consistency. • Including ways of obtain clausal reasons from various types of GAC propagators, ALL-DIFF, GCC. Fahiem Bacchus, University of Toronto

Some Empirical Data [Katsirelos 2007] • GAC with NoGood learning helps a bit. • GAC with clause learning but where GAC labels it inferences with NoGoods offers only minor improvements. • To get significant improvements must do clause learning as well have proper clausal reasons from GAC. Fahiem Bacchus, University of Toronto

Observations • Caching techniques have great potential, but to make them effective in practice it can sometimes require resolving a number of different issues. • This work goes a long ways towards achieving the goal of exploiting the theoretical potential of Clause Learning. Prediction: Clause learning will play a fundamental role in the next generation of CSP solvers, and these solvers will often be orders of magnitude more effective than current solvers. Fahiem Bacchus, University of Toronto

Open Issues • Many issues remain open. Here we mention only one: Restarts. • As previously pointed out, clause learning gains a great deal more power with restarts. With restarts it can be as powerful as unrestricted resolution. • Restarts play an essential role in the performance of SAT solvers. Both full restarts and partial restarts. Fahiem Bacchus, University of Toronto

Search vs. Inference • With restarts and clause learning, the distinction of search vs. inference is turned on its head. • Now search is performing inference. • Instead the distinction becomes systematic vs. opportunistic inference. • Enforcing a high level of consistency during search is performing systematic inference. • Searching until we learn a good clause is opportunistic. • Sat solvers perform very little systematic inference, only Unit Propagation, but they perform lots of opportunistic inference. • CSP solvers essentially do the opposite. Fahiem Bacchus, University of Toronto

One Open Question • In SAT solvers opportunistic inference is feasible: if a learnt clause turns out not to be useful it doesn’t matter much as the search to learn that clause did not take much time. Search (nodes/second rate) is very fast. • In CSP solvers the enforcement of higher levels of local consistency makes restarts and opportunistic inference very expensive. Search (nodes/second rate) is very slow. Is high levels of consistency really the most effective approach for solving CSP once Clause learning is available? Fahiem Bacchus, University of Toronto

2. Formula Caching when considering all solutions. Fahiem Bacchus, University of Toronto

Considering All Solutions? • One such class of problems are those that can be expressed as Sum-Of-Product problems [Decther 1999]. • Finite Set of Variables, V1, V2, …, Vn • A finite domain of values for each variable Dom[Vi]. • A finite set of real valued localfunctions f1, f2, …, fm. • Each function is local in the sense that it only depends on a subset of the variables. f1(V1, V2), f2(V2, V4, V6), … • The locality of the functions can be exploited algorithmically. Fahiem Bacchus, University of Toronto

Sum of Products • The sum of products problem is to compute from this representation • The local functions assign a value to every complete instantiation of the variables (the product) and we want to compute some amalgamation of these values • A number of different problems can be cast as instances of sum-of-product [Decther 1999]. Fahiem Bacchus, University of Toronto

Sum of Products—Examples • #CSPs count the number of solutions. • Inference in Bayes Nets. • Optimization: the functions are sub-objective functions returning real values and the global objective is to maximize the sum of the sub-objects (cf. soft constraints, generalized additive utility). Fahiem Bacchus, University of Toronto

Caching in Backtracking Search

Caching in Backtracking Search

Presentation Transcript

Backtracking

Backtracking

Search Engine Caching

Backtracking

Backtracking

Backtracking

Backtracking

Backtracking search: look-back

Backtracking

Beam-Stack Search: Integrating Backtracking with Beam Search

Backtracking, Search, Heuristics

Constraint Programming and Backtracking Search Algorithms

BackTracking

Backtracking

Search, Trees, Games, Backtracking

Backtracking

Backtracking search: look-back

Backtracking

Backtracking