Book: Constraint Processing Author: Rina Dechter

Chapter 10: Hybrids of Search and Inference Time-Space Tradeoffs Book: Constraint Processing Author: Rina Dechter Zheying Jane Yang Instructor: Dr. Berthe Choueiry

Outline • Combining Search and Inference--Hybrid Idea - Cycle Cutset Scheme • Hybrids: Conditioning-first • - Hybrid Algorithm for Propositional Theories • Hybrids: Inference-first • - The Super Cluster Tree Elimination • - Decomposition into non-separable Components • - Hybrids of hybrids • A case study of Combinational Circuits • - Parameters of Primary Join Trees • - Parameters Controlling Hybrids • Summary

Part I Hybrids of Search and Inference

Two primary constraint processing schemes: 1.Conditioning or search Based on depth-first BT search time require may be exponential, but need very little memory 2. Inference or derivation Variable-elimination Both of time and space exponential but in the size of induced width. Good at when problem is sparse (W* is low)

Inference Search CSP CSP’ solutions search subproblems subsolutions CSP subproblems Inference Subsolutions Hybrids idea • Two ways to combine them: • To use inference procedures as a preprocessing to search, • then yield a restricted search space, then use search to find solutions. • To alternatebetween both methods, • apply search to subset of variables, then perform inference on the rest

Comparison of BT andVE Backtracking (Search) Elimination(Inference) O(n·exp(w*)) w*<=n Worst-case time O(exp(n)) O(n·exp(w*)) w*<=n Space O(n) Find one solution Knowledge compilation Task When induce-width is small, such as w*<=b, variable-elimination is polynomial time solvable, thus it is far more efficient than backtracking search.

Break cycle here x5 x4 x4 x2 x5 x2  x3 x2 x3 x1 x2 x1 10.1 The Cycle Cutset Scheme Definition 5.3.5 (cycle-cutset) Given an undirected graph, a subset of nodes in the graph is a cycle-cutset if its removal becomes a graph having no cycles . Example: Figure 10.3. Figure 10.3 An instantiated variable cuts its own cycle

search subproblems subsolutions CSP subproblems Inference subsolutions The Cycle Cutset Scheme (Cond’) • When finding a cycle-cutset, • The resulting network is cycle free, and can be solved by the • inference-based tree-solving algorithms. (complicated problem turned • to be easy problem) • If a solution to this cycle-cutset is found, then a solution to the entire • problem is obtained. • Otherwise, another instantiation of the cycle-cutset variables • should be considered until a solution is found.

Tradeoff between Finding a Small Cycle-cutset and Using Search + Tree-Solving A small cycle-cutset is desirable. However, finding a minimal-size cycle-cutset is NP-hard Compromise between BT search and tree-solving algorithm Step1: using BT to keep track of the connectivity status of the constraint graph Step2: As soon as the set of instantiated variables constitutes a cycle-cutset, the search algorithm is switched to the tree-solving algorithm. Step3: Either finding a consistent extension for the remaining variables  find solution. Step4: Or no such extension exists, in this case, BT takes place again and try another instantiation.

A A B C D B E E D C D C C D A constraint graph and a constraint-tree generated by the cutset (C,D) Example

Tree structure E A C C A A C B F F D E A D E D Backup B A F Cycle Cutset C B C (a) Constraint graph (b) Its ordered graph (c) Constraint graph is broken into Tree variables part and cycle-cutset variables part Example Figure 10.4 * I think Figure 10.4 (C) on the text book is wrong!

Two extreme cases: • When the original problem has a tree-constraint graph, the cycle-cutset • scheme coincides with a tree-algorithm. • When the constraint graph is complete, the algorithms reverts • to regular backtracking. (Why?)

Time and space complexity of the algorithm cycle-cutset • In the worst-case, all possible assignments to the cutset • variables need to be tested, the time complexity yields:O(nkc+2) • nis number of variables, • cis the cycle-cutset size, • kis domain size, • kcis the number of tree-structured subproblems, • then for each tree-structure requiresO((n-c)k2)steps for • the tree algorithm. • Thus we have time complexityO((n-c)kc+2). • Then we haveO(nkc+2) • Space complexity is linear.

10.1.1 Structure-based recursive search • An algorithm that performs search only, but consults a • tree-decomposition to reduces its complexity. • The algorithm is used in binary constraint network which is a tree. • Given a network, • Removes a node x1, • Generates two subtrees of size n/2 (appro.), • Let Tnbe the time needed to solve this binary tree starting at x1, it obeys the recurrence relation, suppose x1has at most kvalues, • we have the formula: • Tn = k2Tn/2 • T1 = k • Then we have Tn = n  k logn+1

Theorem 10.1.3 A constraint problem with n variables and k values having a tree decomposition of tree-width w*, can be solved by recursive search in linear space and in O(nk2w*(logn+1) time

10.2 Hybrids: conditioning first • Conditioning in cycle-cutset. • This suggestion a framework of hybrid algorithms parameterized • by a boundb on the induced-width of subproblems solved by • inference. • And conditioning in b-cutset.

Conditioning Set or b-cutset The algorithm removes a set of cutset variables, gets a constraint graph with an induced-width bounded by b. we call such a conditioning set or b-cutset. Definition 10.2.1(b-cutset) Given a graph G, a subset of nodes is called a b-cutset iff when the subset is removed the resulting graph has an induced-width less than or equal to b. A minimal b-cutset of a graph has a smallest size among all b-cutsets of the graph. A cycle-cutsetis a 1-cutset of a graph.

How to Find a b-cutset Definition 10.2.3 (finding a b-cutset)Given a graph G = (V,E) and a constant b, find a minimal b-cutset. Namely, find a smallest subset of nodes U, such that G’ = (V-U, E’), where E’ includes all the edges in E that are not incident to nodes in U, has induced-width less than or equal b. Step1: Given a ordering d = {x1,…,xn} of G, a b-cutsetrelative to d is obtained by processing the nodes from last to first. Step2: When node x is processed, if its induced-width is greater than b, it is added to the b-cutset. Step3:Else, its earlier neighbors are connected. Step4: The adjusted induced-width relative to a b-sutsetisb. Step5: A minimal b-cutsetis a smallest among all b-cutset.

The Purpose of Algorithm elim-cond(b) • Original problem is divided two smaller parts: • Cutset variables • Remaining variables (subproblems) • We can runs BT search on the cutset variables • Run bucket elimination on the remaining variables. • This yields elim-cond(b) algorithm • The constant bcan be used to control the balance between • search and variable-elimination, and thus effect the tradeoff • between time and space.

Input: A constraint network R= (X,D,C),Y  X which is ab-cutset. d is an ordering withY such that the adjusted induced-width, relative toY alongd, is bounded byb, Z= X-Y. Output: A consistent assignment, if there is one. 1. While If next partial solution of Yfound by BT, do (a)  adaptive - consistency (R Y= ). (b) if is not false, return solution = ( , ). 2. Endwhile. 3. Return: the problem has no solution. z z y y z y Algorithm elim-cond(b) Algorithm elim-cond(b), text book page 280, Figure 10.5

The Complicity of elim-cond(b) Theorem 10.2.2 Given R=(X,D,C), if elim-cond(b) is applied along ordering dwhen Yis a b-cutsetof size cb, then the space complexity of elim-cond(b) is bounded by O(nexp(b)), and its time complexity is bounded by O(nexp(cb + b)). Proof: If b-cutset assignment is fixed, the time and space complexity of the inference portion (by variable-elimination) is bounded by O(nkb). In BT portion, it checks all possible values of theb-cutset, which is inO(kcb)time. And linear space. Thus the total time complexity isO(n kb kcb) = O(n k(b+cb))

Part II Trade-off between search and inference

Parameterbcan be used to control the trade-off between search and inference • If b wd*, where d is the ordering used by elim-cond(b) the algorithm coincides with adaptive-consistency. • As b decreases, the algorithm adds more nodes into cutset, where cb is increased, the algorithm requires less space and more time. (Why?)

Trade-off between search and inference • Let size of the smallest cutset is 1-cutset, c1, • Smallest induced width is w*, • Then we have inequality c1 w* -1 1+ c1  w*, • The left side b+cbis the exponent that determines • the time complexity of elim-cond(b) algorithm. • While w* dominates the complexity of the bucket-elimination. • Each time we increase the bby 1, and the size of the cutset cbis decreased. • 1+ c1  2+c2  …b+ cb,...  w* + cw* = w*, • When cw* = 0,it means that the subproblem has the induced-width = w*, • there are no vertices in cutset. • Thus search and variable elimination can also be interleaved dynamically. • Thus we get the a hybrid scheme whose time complexity decrease as its • space increase until it reaches the induced-width.

Algorithm DP-DR(b) • A variant Algorithm of elim-cond(b) for processing proposittional CNF. Hybrid of DPLL and DR • For Backtracking DPLL algorithm applies look-ahead • using unit-propagation at each node • Bucket-elimination algorithm applies Directional • Resolution (DR) algorithm. • It is special version of elim-cond(b) which incorporates dynamic variable-ordering. Check on Chapter 8, using resolution as its variable-elimination operator, page 232. Figure 10.9 (page 284): Algorithm DP-DR(b).

Ed(y). y DP-DR ( , b) Input: A CNF theory  over variables X; a bound b. Output: A decision of whether  is satisfiable. If it is, an assignment to its conditioning variables, and the conditional directional extension • If unit-propagate() = false, return (false) • ElseXX- {variables in unit clauses} • If no more variables to process, return true • Else while Q  X s.t. degree(Q) <=b in the current conditioned graph • resolve over Q • if no empty clause is generated, • add all resolvents to the theory • else return false • XX – {Q} • EndWhile • Select a variable Q  X; X  X –{Q} • Y  Y{Q} • Return(DP-DR( Q, b)  ( Q, b)).

y y y • The theory  conditioned on the assignment Y = is called a • conditional theory of  relative to , and is denoted by . • Conditional graph of , denoted G (y). (which is obtained • by deleting the nodes in Y and all their incident edges from G(). • The conditional induced width of a theory , denoted , • is the induced width of the graph G (y). y y Wy* Represent a Propositional Theory as an Interaction Graph • The interaction graph of a propositional theory , • denoted G(). • Each propositional variable denotes one node in the graph • Each pair of nodes in the same clause denotes an edge in • the graph,which yields a complete graph.

C B C B Resolution over A D D A A E E Example • 1 = {(C), (AB C), (A B E), (B C D)} • If we apply resolution over variable Athen we should get a new clause • (BC E) • The result graph should has an edge between nodes E and C. (Figure 10.6, • in the text book)

A B B B C C C D D D A=0 A=1 E E E W*=2 W*=4 W*=3 Example • Given a theory  = {(CE), (AB C D), (A B E D), (B C D)} • If we do conditioning on A. • When A=0 {( B  C  D), (B  C  D), ( C  E)}. • When A=1 {(B E D), ( B  C  D), ( C  E)}. • Delete Aand its incident edges from the interaction graph, • De can also delete some other edges B and E, because when • A=0 the clause (AB E D), we can remove edge between • is always satisfied.

Example: A DCDR(b=2) W* > 2 Conditioning B Input A=0 A=1 Bucket A Bucket B Bucket C Bucket D Bucket E A A C A B CD, A B ED B C D C E C D B CDB ED B B W*  2 Elimination D C D D E D E D E E

Complexity of DP-DR(b) • Theorem 10.2.5 • The time complexity of algorithm DP-DR(b) is O(nexp(cb + b)), • where cb is the largest cutset • to be conditioned upon. • The space complexity is O(nexp(b)).

Empirical evaluation of DP-DR (b) • Evaluation of DP-DR(b) on Conditioning-first hybrids • Empirial results from experiments with different structuredCNFs. • Use random uniform 3-CNFs with 100 variables and 400 clauses • (2, 5) -trees with 40 cliques and 15 clauses per clique • (4, 8) -trees with 50 cliques and 20 clauses per clique • In general, (k, m) -trees are trees of cliques each having m+k nodes • and separators size of k. • The randomly generated 3-CNFs were designed to have an interaction • graph that corresponds to (k, m)-trees. • The performance of DP-DR(b) depends on the induced width of the theories. • When b=5, the overall performance is best. See Figure 10.10, on page 285.

Part III Non-separable components and tree-decomposition

10.3 Hybrids: Inference-first • The another approach for combining conditioning and inference based on structured constraint networks—by using tree-decomposition. • The algorithm CTE (Cluster-Tree-Elimination) computes (Chapter9, p261) • Variable elimination on separators first (size is small) • Search on tree clusters (size is relative large) • Thus, we can trade even more space for time by allowing large cliques • but smaller separators. • This can be achieved by combining adjacent nodes in a tree-decomposition • that connected by “fat separators”, • Rule to keep apart only those nodes that are linked by bounded size separators.

Tree Decomposition • Definition 9.2.5 (tree-decomposition) Let R=(X, D, C) be a CSP problem. • A tree-decomposition for R is a triple <T, ,(kai) (psai)>, where T = (V, E) is a tree, and , and  are labeling functions where associate each vertex v  V with two sets, (v)  X and (v)  C, that satisfy the following conditions: • For each constraint Ri  C, there is at least one vertex v  V such that • Ri  (v), and scope (Ri)  (v). • For each variable x X, the set {v  V | x (v)} induces a connected • subtree of T. (This is the connectness property.) Definition 9.2.6 (tree-width, hyper-width, separtor) The tree-width of a tree-decomposition <T, , > is tw = maxv V |(v)|, and its hyper-width is hw = maxv V| (v)|. Given two adjacent vertices u and v of a tree-decomposition, the separator of u and v is defined as sep(u,v) = (u) (v). Definitions from chapter9 Page260

Tree-Decomposition • Assume a CSP problem has a tree-decomposition, which has tree- width rand separator size s. • Assume the space restrictions do not allow memory space up to • O(exp(s)) • One way to overcome this problem is to collapse these nodes in the tree that are connected by large separators • let previous connected two nodes include the variables and constraints • The resulting tree-decomposition will has larger subproblems but smaller separators. • As s decrease, both r and hw increase.

Theorem 10.3.1 Given a tree-decomposition T over nvariables, separator sizes s0, s1,…, st and secondary tree-decompositions having a corresponding maximal number of nodes in any cluster, r0, r1,…, rt. The complexity of CTE when applied to each secondary tree-decompositions Ti is O(n·exp(ri)) time, and O(n·exp(si)) space (i ranges over all the secondary tree-decomposition). • Secondary tree decomposition Ti generated by combining • adjacent nodes whose separator sizes are strictly greater • than si.

Example I AB A H B G F BCD B GEFH BD C GFE BDG GD A primal constraint graph D E (a) GDEF Fat separator size=3 AB AB B B Separator smallest size=1 BCD Separator smaller size=2 BD GD BCDGEFH BDG GDEFH (c) (b)

A A D D B B C C E E F F G G (a) the primal constraint graph (b) Induced triangulated graph I I H H B,C,D A, B, C, D B, C, D, F B, E, F D, F, G F, G, I B, F, H F B,F B,F FG (c) Super tree clusters Example II

10.3.1 Super Cluster Tree Elimination(b) • Each clique is processed by search, • each solution created by BTsearch is projected on the separator • the projected solutions are accumulated. • We call the resulting algorithm SUPER CLUSTER ELIMINATION(b), • or SCTE(B). • It takes a primary tree-decomposition and generates a tree-decomposition • whose separators’ size is bounded by b, which is subsequently processed • by CTE.

G,F G,F G,F F F F,B,C F,B,C D,B,A D,B,A F B,C B,C A,B A,B A,B B,A A,B,C A,B,C A,B,C,D,F A Join-tree Super-bucket-tree Bucket-tree A Superbuckets Bucket-elimination algorithm can be extended to bucket-trees (section 9.3) Bucket-tree is a tree-decomposition by merging adjacent buckets to generate a super-bucket-tree (SBT) in a similar way to generate super clusters. In the top-down phase of bucket-elimination several variables are eliminated at once. Figure 10.13:

Definition 10.3.3 (non-separable components) A connected graph G = (V, E) is said to have separation node vif there exist nodes a and b such that all paths connection a and bpass through v. • A graph that has aseparation node is called separable, and one that has none is called non-separable. • A subgraph with no separation nodes is called a non-separable component (or a bi-connected component).

10.3.2Decomposition into non-separable components • Generally we cannot find the best-decomposition having a bounded separators’ size in polynomial time. • Tree-decomposition--all the separators are singeton variables: It requires only linear space {O(n·exp(sep)), where size of sep=1). • The variables of the nodes are those appearing in each component • The constraints can be placed into a component that contains • their scopes • Applying CTE to such a tree requires linear space, (CTE space • complexity is O(N·exp(sep)), chapter9, page263) • Each node corresponds to a component • But it is time exponential in the components’ sizes

A D F J F E C2 E B C C I C1 (a) H G,F,J (b) C,F,E E,H,I A,B,C,D Example: Decomposition into non-separable components G C3 C4 Textbook, page 289

Because in super-bucket tree, each node Cidictates a super-cluster, i.e., • Figure 10.14 on page 289, C1 includes variables {A, B, C, D}; • C2includes variables (F, C, E},…. • If the C1 sends message to C2, it needs to place its message inside the • receiving super-bucket C2. • The message C1denotes new Domain of constraints • Computed by bucket C1 and sent to bucket C2 is place inside bucket C2. • How to execute message passing along a super-bucket tree C2 See the example on page 290. Execution messages passing in super-bucket tree

Part IV Hybrids of hybrids

10.3.3 Hybrids of hybrids Advantage: The space complexity of this algorithm is linear but its time complexity can be much better than the cycle-cutset scheme or the non-separable component alone. Example: Case study for c432, c499,… see Figure 10.19

Algorithm HYBRID(b1,b2) • Combine two approaches—conditioning and inference • Given a space parameter b1 • Firstly, find a tree-decomposition with separator bounded by • b1by the super-clustering approach. • Instead of pure search in each cluster applying elim-cond(b2), • (b2≤b1), • Time complexity will be significantly reduced. • If Cb2* is the size of the maximum b2-cutsetin each clique of • the b1-tree-decomposition • The result algorithm is space exponential in b1 • (separator is restricted by b1) • But time exponential in Cb2* (cycle-cutsize is bound by b2)

Two special cases 1. Apply the cycle-cutset scheme (hybrid(b1,1)) in each clique. Real circuits problem—for circuit diagnosis tasks shows that the reduction in complexity bounds for complex circuits is tremendous. 2. When b1=b2, for b=1, hybrid(1,1) corresponds to applying the non-separable components as a tree-decomposition and utilizing the cycle-cutset in each component.

10.4 A case study of combinatorial circuits Textbook Page 291-294 Method • Using triangulation approach to decompose each circuit graph. • Selecting an ordering for the nodes • Triangulating the graph • Generating the induced graph • Identifying its maximum cliques (maximal-cardinality) • Using heuristic the min-degree ordering, will yield the smallest cliques sizes and separators. • See Fig 10.18

Book: Constraint Processing Author: Rina Dechter

Book: Constraint Processing Author: Rina Dechter

Presentation Transcript

Duality in Optimization and Constraint Satisfaction

Rikki – Tikki – Tavi From the Jungle Book By: Rudyard Kipling

MLA: Book: One Author

Complexity of Approximating Constraint Satisfaction Problems

Number-Theoretic Algorithms

Chapter 5 Downstream Processing

General Principles of Constraint Programming

Building Bayesian Networks

Combinatorial Optimization for Graphical Models

RNA SPLICING AND PROCESSING CHAPTER 21 ( GENES X )

Constraint Satisfaction Problems ( CSPs )

Constraint Program Solvers

Lecture 01 – Part C Constraint Satisfaction Problems

CS B551: Elements of Artificial Intelligence

Constraint Satisfaction Problems

Prince Sultan College for Women

Chapter 6

Lecture 01 – Part C Constraint Satisfaction Problems

Lecture 6: Constraint Satisfaction Problems

Constraint Handling Rules (CHR): Rule-Based Constraint Solving and Deduction