1 / 189

Combinatorial Optimization for Graphical Models

Combinatorial Optimization for Graphical Models. Rina Dechter Donald Bren School of Computer Science University of California, Irvine, USA. Radu Marinescu Cork Constraint Computation Centre University College Cork, Ireland. Simon de Givry & Thomas Schiex

ipo
Download Presentation

Combinatorial Optimization for Graphical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combinatorial Optimization for Graphical Models Rina Dechter Donald Bren School of Computer Science University of California, Irvine, USA Radu Marinescu Cork Constraint Computation Centre University College Cork, Ireland Simon de Givry & Thomas Schiex Dept. de Mathématique et Informatique Appliquées INRA, Toulouse, France

  2. Outline • Introduction • Graphical models • Optimization tasks for graphical models • Fundamental operations on cost functions • Inference • Bucket Elimination, Bucket-Tree and Cluster-Tree Elimination • Polynomial classes based on structure • Search (OR) • Branch-and-Bound and Best-First Search • Local search and Partial search • Lower-bounds and relaxations • Bounded inference, local consistency and continuous relaxations • Exploiting problem structure in search • AND/OR search spaces (trees, graphs) • Other hybrids of search and inference • Cutset decomposition, boosting search with VE, super-bucket scheme • Software & Applications

  3. Outline • Introduction • Graphical models • Optimization tasks for graphical models • Solving optimization problems by inference and search • Fundamental operations on cost functions • Inference • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  4. A B C D F G Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D} • Primal graph = • Variables --> nodes • Functions, Constraints - arcs F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f)

  5. Constraint graph E A A B red green red yellow green red green yellow yellow green yellow red A E D B D F B G F C G C Constraint Networks Map coloring Variables: countries (A B C etc.) Values: colors (redgreenblue) Constraints:

  6. ConstrainedOptimization Example: power plant scheduling

  7. P(S) Smoking P(C|S) P(B|S) Bronchitis Cancer Dyspnoea P(D|C,B) P(X|C,S) X-Ray Probabilistic Networks BN = (X,D,G,P) P(D|C,B) P(S,C,B,X,D) = P(S)· P(C|S)·P(B|S)·P(X|C,S)·P(D|C,B) MPE= Find a maximum probability assignment, given evidence MPE= find argmax P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B)

  8. The “alarm” network - 37 variables, 509 parameters (instead of 237) MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP Monitoring Intensive-Care Patients

  9. Oil sales Test cost Drill cost Test Test result Oil produced Chance variables: over domains. Decision variables: CPT’s for chance variables: Reward components: Utility function: Oil sale policy Drill Sales cost Seismic structure Oil underground Market information Influence Diagrams Task: find optimal policy: Influence diagram ID = (X,D,P,R).

  10. A F B C E D Graphical Models • A graphical model (X,D,F): • X = {X1,…Xn} variables • D = {D1, … Dn} domains • F = {f1,…,fr} functions(constraints, CPTS, CNFs …) • Operators: • combination • elimination (projection) • Tasks: • Beliefupdating: X-y j Pi • MPE: maxX j Pj • CSP: X j Cj • Max-CSP: minX j Fj Conditional ProbabilityTable (CPT) Relation Primal graph (interaction graph) • All these tasks are NP-hard • exploit problem structure • identify special cases • approximate

  11. Sample Domains for GM • Web Pages and Link Analysis • Communication Networks (Cell phone Fraud Detection) • Natural Language Processing (e.g. Information Extraction and Semantic Parsing) • Battle-space Awareness • Epidemiological Studies • Citation Networks • Intelligence Analysis (Terrorist Networks) • Financial Transactions (Money Laundering) • Computational Biology • Object Recognition and Scene Analysis …

  12. Types of Constraint Optimization • Valued CSPs, Weighted CSPs, Max-CSPs, Max-SAT • Most Probable Explanation (MPE) • Linear Integer Programs • Examples: • Problems translated from planning • Unit scheduling maintenance • Combinatorial auctions • Maximum-likelihood haplotypes in linkage

  13. Outline • Introduction • Graphical models • Optimization tasks for graphical models • Solving optimization problems by inference and search • Fundamental operations on cost functions • Inference • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  14. Graphical Models Reasoning Time: exp(n) Space: linear Search: Conditioning Incomplete Simulated Annealing Complete Gradient Descent Time: exp(w*) Space: exp(w*) Depth-first search Branch-and-Bound A* search Incomplete Hybrids: Local Consistency Complete Unit Resolution mini-bucket(i) Adaptive Consistency Tree Clustering Dynamic Programming Resolution Inference: Elimination

  15. Bucket Elimination(Variable Elimination) = ¹ D = C A ¹ C B = A contradiction = Bucket E: E ¹ D, E ¹ C Bucket D: D ¹ A Bucket C: C ¹ B Bucket B: B ¹ A Bucket A:

  16. The Idea of Conditioning

  17. A A D D B B C C F F E E G G … A=1 A=k D D B B D B C F C F C F E E G G E G Conditioning vs. Elimination Conditioning (search) Elimination (inference) k “sparser” problems 1 “denser” problem

  18. Outline • Introduction • Graphical models • Optimization tasks for graphical models • Solving optimization problems by inference and search • Fundamental operations on cost functions • Inference • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  19. Fundamental Operations on Cost Functions • To be completed …

  20. Outline • Introduction • Inference • Bucket Elimination • Bucket-Tree Elimination and Cluster-Tree Elimination • Polynomial classes based on structure • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  21. F(a,b)+F(a,c)+F(a,d)+F(b,c)+F(b,d)+F(b,e)+F(c,e) OPT = Variable Elimination F(a,b)+F(b,c)+F(b,d)+F(b,e) F(a,c)+F(c,e) + F(a,d) + Computing the Optimal Cost Solution A Constraint graph B C B C D E D E

  22. OPT Finding Algorithm elim-opt(Dechter, 1996)Non-serial Dynamic Programming(Bertele & Briochi, 1973) Elimination operator bucket B: F(a,b) F(b,c) F(b,d) F(b,e) B bucket C: F(c,a) F(c,e) C bucket D: F(a,d) D bucket E: e=0 E bucket A: A

  23. Generating the Optimal Assignment B: F(a,b) F(b,c) F(b,d) F(b,e) C: F(c,a) F(c,e) D: F(a,d) E: e=0 A:

  24. OPT Complexity Algorithm elim-opt(Dechter, 1996)Non-serial Dynamic Programming(Bertele & Briochi, 1973) Elimination operator bucket B: F(a,b) F(b,c) F(b,d) F(b,e) B bucket C: F(c,a) F(c,e) C bucket D: F(a,d) D bucket E: e=0 E exp(w*=4) ”induced width” (max clique size) bucket A: A

  25. Induced-width • Width along ordering d, w(d): • max # of previous neighbors (parents) • Induced width along ordering d, w*(d): • The width in the ordered induced graph, obtained by connecting “parents” of each node X, recursively from top to bottom E D C B A

  26. A B C D E constraint graph B E C D D C E B A A Complexity of Bucket Elimination Bucket-Elimination is time and space The effect of the ordering: r = number of functions Finding smallest induced-width is hard!

  27. Bucket-Tree Elimination • Describe BTE

  28. Cluster-Tree Elimination • Describe CTE

  29. Outline • Introduction • Inference • Search (OR) • Branch-and-Bound and Best-First search • Branching schemes, variable and value orderings • Local search and Partial search • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  30. C A F D B E A B C D E F The Search Space Objective function: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

  31. C A F D B E A B C D E F The Search Space 0 0 0 1 2 0 0 4 0 1 0 1 3 1 5 4 0 2 2 5 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Arc-cost is calculated based on cost components.

  32. C A F D B E A B C D E F The Value Function 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it

  33. C A F D B E A B C D E F An Optimal Solution 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it

  34. 2. Best-First Search Always expand the node with the highest heuristic value f(xp) Needs lots of memory 1. Branch-and-Bound Use heuristic function f(xp) to prune the depth-first search tree Linear space f  L L Basic Heuristic Search Schemes Heuristic function f(xp) computes a lower bound on the best extension of xp and can be used to guide a heuristic search algorithm. We focus on:

  35. Best-First vs. Depth-first Branch-and-Bound • Best-First (A*): (optimal) • Expand least number of nodes given h • Requires to store all search tree • Depth-first Branch-and-Bound: • Can use only linear space • If find an optimal solution early will expand the same space as Best-First (if search space is a tree) • B&B can improve heuristic function dynamically

  36. How to Generate Heuristics • The principle of relaxed models • Mini-Bucket Elimination • Bounded directional consistency ideas • Linear relaxation for integer programs

  37. Outline • Introduction • Inference • Search (OR) • Lower-bounds and relaxations • Bounded inference • Mini-Bucket Elimination • Static and dynamic mini-bucket heuristics • Local consistency • Continuous relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  38. Mini-Bucket Approximation Split a bucket into mini-buckets => bound complexity bucket (X) = { h1, …, hr, hr+1, …, hn } { h1, …, hr } { hr+1, …, hn }

  39. Mini-Bucket Elimination Mini-buckets minBΣ bucket B: F(b,e) F(a,b) F(b,d) A bucket C: F(c,e) F(a,c) B C F(a,d) hB(a,d) bucket D: E D hC(e,a) hB(e) bucket E: hE(a) hD(a) bucket A: L = lower bound

  40. Semantics of Mini-Bucket: Splitting a Node Variables in different buckets are renamed and duplicated (Kask et. al., 2001), (Geffner et. al., 2007), (Choi, Chavira, Darwiche , 2007) Before Splitting:Network N After Splitting:Network N' U U Û

  41. MBE-MPE(i) Algorithm Approx-MPE(Dechter & Rish, 1997) • Input: i – max number of variables allowed in a mini-bucket • Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe

  42. Properties of MBE(i) • Complexity: O(r exp(i)) time and O(exp(i))space • Yields an upper-bound and a lower-bound • Accuracy: determined by upper/lower (U/L) bound • As i increases, both accuracy and complexity increase • Possible use of mini-bucket approximations: • As anytime algorithms • As heuristics in search • Other tasks: similar mini-bucket approximations for: • Belief updating, MAP and MEU(Dechter & Rish, 1997)

  43. Anytime Approximation

  44. Empirical Evaluation(Rish & Dechter, 1999) • Benchmarks • Randomly generated networks • CPCS networks • Probabilistic decoding • Task • Comparing approx-mpe and anytime-mpe versus bucket-elimination (elim-mpe)

  45. Time (sec) Algorithm cpcs360 cpcs422 elim-mpe 115.8 1697.6 anytime-mpe( ), 70.3 505.2 anytime-mpe( ), 70.3 110.5 CPCS networks – medical diagnosis(noisy-OR model) Test case: no evidence

  46. Outline • Introduction • Inference • Search (OR) • Lower-bounds and relaxations • Bounded inference • Mini-Bucket Elimination • Static and dynamic mini-bucket heuristics • Local consistency • Continuous relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications

  47. 0 D 0 B E 0 D 1 A B 1 D E 1 Generating Heuristic for Graphical Models(Kask & Dechter, AIJ’01) Given a cost function C(a,b,c,d,e) = F(a) + F(b,a) + F(c,a) + F(e,b,c) + F(d,b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension D f*(a,e,d) = minb,c F(a,b,c,d,e) = = F(a) + minb,c F(b,a) + F(c,a) + F(e,b,c) + F(d,a,b) = g(a,e,d) • H*(a,e,d)

  48. Generating Heuristics (cont.) H*(a,e,d) = minb,c F(b,a) + F(c,a) + F(e,b,c) + F(d,a,b) = minc [F(c,a)+ minb [F(e,b,c) + F(b,a) + F(d,a,b)]] >=minc [F(c,a)+ minb F(e,b,c) + minb [F(b,a) + F(d,a,b)]] = minb [F(b,a) + F(d,a,b)] + minc [F(c,a)+ minb F(e,b,c)] = hB(d,a) + hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) + H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.

  49. Generating Heuristics (cont.) H*(a,e,d) = minb,c F(b,a) + F(c,a) + F(e,b,c) + F(d,a,b) = minc [F(c,a)+ minb [F(e,b,c) + F(b,a) + F(d,a,b)]] >=minc [F(c,a)+ minb F(e,b,c) + minb [F(b,a) + F(d,a,b)]] = minb [F(b,a) + F(d,a,b)] + minc [F(c,a)+ minb F(e,b,c)] = hB(d,a) + hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) + H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.

  50. B: F(E,B,C) F(D,A,B) F(B,A) C: F(C,A) hB(E,C) D: hB(D,A) 0 D B 0 E 0 E: hC(E,A) D 1 A B 1 A: F(A) hE(A) hD(A) D E 1 f(a,e,D) = F(a) + hB(D,a) + hC(e,a) g h – is admissible Static MBE Heuristics • Given a partial assignment xp, estimate the cost of the best extension to a full solution • The evaluation function f(xp) can be computed using function recorded by the Mini-Bucket scheme A B C E D Cost Network f(a,e,D))=g(a,e) + H(a,e,D )

More Related