Combinatorial Problems I: Finding Solutions

Combinatorial Problems I: Finding Solutions Ashish Sabharwal Cornell University March 3, 2008 2nd Asian-Pacific School on Statistical Physics and Interdisciplinary Applications KITPC/ITP-CAS, Beijing, China

Computer Science Engineering Mathematics Cross-fertilization of ideas for the study and design of Intelligent Systems Operations Research Economics Phase transition Physics Cognitive Science Research part of Cornell’s Intelligent Information Systems Institute (IISI) Director: Carla Gomes

Combinatorial Problems Examples • Routing: Given a partially connected networkon N nodes, find the shortest path between X and Y • Traveling Salesperson Problem (TSP): Given apartially connected network on N nodes, find a paththat visits every node of the network exactly once[much harder!!] • Scheduling: Given N tasks with earliest start times, completion deadlines, and set of M machines on which they can execute, schedule them so that they all finish by their deadlines

Problem Instance, Algorithm • Specific instantiation of the problem • E.g. three instances for the routing problem with N=8 nodes: • Objective: a single, genericalgorithm for the problem that can solve any instance of that problem A sequence of steps, a “recipe”

Measuring the Effectiveness of Algorithms • Capture scaling with input size N, rather than runtime on specific instances • The most common notion in Computer Science is worst-case complexity: What is the longest time (or number of steps) the algorithm might take on any input of size N?Perhaps only N steps, 100 N+5 N linear time, O(N)Maybe N2 steps, or N2 + 4 N + 6 quadratic ,O(N2)Maybe N3 + 1000 log N cubic, O(N3)… … …Maybe 2N, or 2N + N1000 exponential, O(2N)

exponential polynomial Polynomial vs. Exponential Complexity Polynomial time: “tractable”, canhope to solve very large problemswith enough computing power E.g. known routing / shortestpath algorithms [O(N3)] Exponential time: quickly run intoscalability issues as N increases E.g. best known algorithms for TSP

Are some problems inherently harder than others?A large amount of work on answering this question: computational complexity theory

Computational Complexity Hierarchy EXP-complete: games like Go, … Hard EXP PSPACE-complete: QBF, adversarial planning, chess (bounded), … PSPACE #P-complete/hard: #SAT, sampling, probabilistic inference, … P^#P PH NP-complete: SAT, scheduling, graph coloring, puzzles, … NP P-complete: circuit-value, … P In P: sorting, shortest path, … Easy Note:widely believed hierarchy; know P≠EXP for sure

NP-Completeness • P : class of problems for which a solution can be found in poly time e.g. can find a shortest path in poly time • NP: class of problems for which a solution can be verified in poly time e.g. can’t find a TSP solution in poly time (as far as we know) but, given a candidate solution (a “witness”) can verify the correctness of the witness in poly time “N”: non-deterministic, with the power of “guessing” “P”: polynomial time • NP-complete: the “hardest” problems within NP

NP-Completeness One of the biggest discoveries in Computer Science: All NP-complete problems are equally hard![worst-case complexity] • An algorithm for any one NP-complete problem can be used to solve any other NP-complete problem with only a polynomial overhead! • There are catalogues of 10,000’s of such problemse.g. “Boolean satisfiability” or SAT, TSP, scheduling, (bounded) planning, chip verification, 0-1 integer programming, graph coloring, logical inference, … [Similarly for PSPACE-complete, #P-complete, etc.]

Can one design a single algorithm that can efficiently solve thousands of different problems of interest?

The Quest for Machine Reasoning A cornerstone of Artificial Intelligence Objective:Develop foundations and technology to enable effective, practical, large-scale automated reasoning. Machine Reasoning (1960-90s) Current reasoning technology Computational complexityof reasoning appearsto severely limit real-world applications Revisiting the challenge: Significant progress with new ideas / tools for dealing with complexity (scale-up), uncertainty, and multi-agent reasoning

General Automated Reasoning GeneralInferenceEngine ModelGenerator(Encoder) Probleminstance Solution Domain-specific Generic e.g. logistics, chess,planning, scheduling, ... applicable to all domainswithin range of modeling language Research objective Better reasoning and modeling technology Impact Faster solutions in several domains

Simple Example: Knowledge Base Variables (binary) X1 = email_ received X2 = in_ meeting X3 = urgent X4 = respond_to_email X5 = near_deadline X6 = postpone X7 = air_ticket_info_request X8 = travel_ request X9 = info_request • Rules: • X1 & (not X2) & X3  X4 • X2  not X4 • X5  X3 or X6 • 4. X7  X8 • 5. X8  X9 • 6. X8  X5 • 7. X6  not X9 Question: Given: X1= true; X2 = false; X7=true. What is X4 = ? Answer Development: Inference Chain Step 1: X7  X8 (rule 4) Step 2: X8  X5 (rule 6) Step 3: X5  X3 or X6 (rule 3) M Case A: X6 = true Step 4: X6  not X9 Step 5: X9  not X8 Step 6: Contradiction Backtrack to M Case B: X3 = true X1 & (not X2) & X3  X4 Step 7: X4 = true (Rule 1) Reasoning Complexity • EXPONENTIAL COMPLEXITY: INHERENT • ANworst case • N= No. of Variables/Objects A= Object states • TIME/SPACE • Granularity  Object states • Current implementations trade • time with soundness Search for rules to apply For N variables: 2N cases drive complexity! Check Contradictions

Complexity Exponential Exponential Complexity Growth: The Challenge of Complex Domains Note: rough estimates, for propositional reasoning 1M 5M War Gaming 10301,020 0.5M 1M VLSI Verification 10150,500 Case complexity 100K 450K Military Logistics 106020 20K 100K Chess (20 steps deep) 103010 No. of atoms on the earth 10K 50K Deep space mission control Seconds until heat death of sun 1047 100 200 1030 Car repair diagnosis Protein folding Calculation (petaflop-year) Variables 100 10K 20K 100K 1M Rules (Constraints) [Credit: Kumar, DARPA; Cited in Computer World magazine]

Progress in Last 15 Years Focus: Combinatorial Search Spaces Specifically, the Boolean satisfiability problem, SAT Significant progress since the 1990’s. How much? • Problem size: We went from 100 variables, 200 constraints (early 90’s) to 1,000,000 vars. and 5,000,000 constraints in 15 years.Search space: from 10^15 to 10^300,000.[Aside: “one can encode quite a bit in 1M variables.”] • Tools: 50+ competitive SAT solvers available Overview of the state of the art: Plenary talk at IJCAI-05 (Selman); Discrete App. Math. article (Kautz-Selman ’06)

How Large are the Problems? A bounded model checking problem:

SAT Encoding (automatically generated from problem specification) i.e., ((not x1) or x7) ((not x1) or x6) etc. x1, x2, x3, etc. are our Boolean variables (to be set to True or False) Should x1 be set to False??

10 Pages Later: … i.e., (x177 or x169 or x161 or x153 … x33 or x25 or x17 or x9 or x1 or (not x185)) clauses / constraints are getting more interesting… Note x1 …

4,000 Pages Later: …

Finally, 15,000 Pages Later: Search space of truth assignments: Current SAT solvers solve this instance in under 30 seconds!

SAT Solver Progress Solvers have continually improved over time Source: Marques-Silva 2002

How do SAT Solvers Keep Improving? From academically interesting to practically relevant. We now have regular SAT solver competitions. (Germany ’89, Dimacs ’93, China ’96, SAT-02, SAT-03, …, SAT-07) E.g. at SAT-2006 (Seattle, Aug ’06): • 35+ solvers submitted, most of them open source • 500+ industrial benchmarks • 50,000+ benchmark instances available on the www This constant improvement in SAT solvers is the key to making, e.g.,SAT-based planning very successful.

Current Automated Reasoning Tools Most-successful fully automated methods: based on Boolean Satisfiability (SAT) / Propositional Reasoning • Problems modeled as rules / constraints over Boolean variables • “SAT solver” used as the inference engine Applications: single-agent search • AI planning SATPLAN-06, fastest optimal planner; ICAPS-06 competition (Kautz & Selman ’06) • Verification – hardware and software Major groups at Intel, IBM, Microsoft, and universitiessuch as CMU, Cornell, and Princeton.SAT has become the dominant technology. • Many other domains: Test pattern generation, Scheduling,Optimal Control, Protocol Design, Routers, Multi-agent systems,E-Commerce (E-auctions and electronic trading agents), etc.

Recall: General Automated Reasoning GeneralInferenceEngine ModelGenerator(Encoder) Probleminstance Solution Domain-specific Generic e.g. logistics, chess,planning, scheduling, ... applicable to all domainswithin range of modeling language Research objective Better reasoning and modeling technology Impact Faster solutions in several domains

Automated Reasoning with SAT • A simple but useful modeling language: Boolean formulas • Corresponding inference engine: Satisfiability or SAT algorithm (e.g. complete search, local search, message passing) • Numerous applications: hardware and software verification, planning, scheduling, e-commerce, circuit design, open problems in algebra, …

Boolean Logic Defined over Boolean (binary) variablesa, b, c, … Each of these can be True (1, T) or False (0, F) Variables connected together with logic operators: and, or, not (denoted ) E.g. ((c d)  f) is True iff either c is True and d is False, or f is True Fact: All other Boolean logic operators can be expressed with and, or, not E.g. (a  b) same as (a or b) Boolean formula, e.g. F = (a or b) and (a and (b or c)) (Truth) Assignment: any setting of the variables to True or False Satisfying assignment: assignment where the formula evaluates to True E.g. F has 3 satisfying assignments: (0,1,0), (0,1,1), (1,0,0)

Boolean Logic: Example F = (a or b) and (a and (b or c)) Note: True often written as 1, False as 0 • There are 23 = 8 possible truth assignments to a, b, c • (a=0,b=1,c=0) representing (a=False, b=True, c=False) • (a=0,b=0,c=1) • … • Exactly 3 truth assignments satisfy F • (a=0,b=1,c=0) • (a=0,b=1,c=1) • (a=1,b=0,c=0)

Boolean Logic: Expressivity All discrete single-agent search problems can be cast as a Boolean formula Variables a, b, c, … often represent “states” of the system, “events”, “actions”, etc. (more on this later, using Planning as an example) Very general encoding language. E.g. can handle • Numbers (k-bit binary representation) • Floating-point numbers • Arithmetic operators like +, x, exp(), log() • … SAT encodings (generated automatically from high level languages) routinely used in domains like planning, scheduling, verification, e-commerce, network design, … Recall Example: “event” Variables X1 = email_ received X2 = in_ meeting X3 = urgent X4 = respond_to_email X5 = near_deadline X6 = postpone X7 = air_ticket_info_request X8 = travel_ request X9 = info_request “state” “action” • Rules: • X1 & (not X2) & X3  X4 • X2  not X4 • X5  X3 or X6 • 4. X7  X8 • 5. X8  X9 • 6. X8  X5 • 7. X6  not X9 constraint

Boolean Logic: Standard Representations Each problem constraint typically specified as (a set of) clauses: E.g. (a or b), (c or d or f), (a or c or d), … Formula in conjunctive normal form, or CNF: a conjunction of clauses E.g. F = (a or b) and (a and (b or c)) changes to FCNF = (a or b) and (a or b) and (b or c) Alternative [useful for QBF]: specify each constraint as a term(only “and”, “not”): E.g. (a and d), (b and a and f), (b and d and e), … Formula in disjunctive normal form, or DNF: a disjunction of terms E.g. FDNF = (a and b) or (a and b and c) clauses (only “or”, “not”)

Boolean Satisfiability Testing • A wide range of applications • Relatively easy to test for small formulas (e.g. with a Truth Table) • However, very quickly becomes hard to solve • Search space grows exponentially with formula size(more on this next) SAT technology has been very successful in taming this exponential blow up! • The Boolean Satisfiability Problem, or SAT: • Given a Boolean formula F, • find a satisfying assignment for F • or prove that no such assignment exists.

Fix one variable to True or False Fix another var Fix a 3rd var Fix a 4th var True False False True False SAT Search Space All vars free SAT Problem: Find a path to a True leaf node. For N Boolean variables, the raw search space is of size 2N • Grows very quickly with N • Brute-force exhaustive search unrealistic without efficient heuristics, etc.

k-CNF, 3-CNF k-CNF: all clauses have k literals • 1-CNF SAT: trivial • 2-CNF SAT: solvable in O(N2) time [N = num. of variables] • 3-CNF SAT: NP-complete • 4-CNF SAT: NP-complete • … Note: Any Boolean formula can be converted into CNF. -- with or without extra variables (without  size increase)

exponential polynomial Worst-Case Complexity SAT is an NP-complete problem • Worst-case believed to be exponential(roughly 2N for N variables) • 10,000+ problems in CS are NP-complete (e.g. planning, scheduling, protein folding, reasoning) • P vs. NP --- $1M Clay Prize However, real-world instances are usually not pathological and can often be solved very quickly with the latest technology! Typical-case complexity provides a moredetailed understanding and a more positive picture.

4 outof 8 2 outof 9 … 1 outof 10 exponential polynomial Exponential Complexity Growth Planning (single-agent): find the right sequence of actions HARD: 10 actions, 10! = 3 x 106 possible plans Contingency planning (multi-agent): actions may or may not produce the desired effect! REALLY HARD: 10 x 92 x 84 x 78 x … x 2256 = 10224 possible contingency plans!

Delete Constraints Add Constraints Typical-Case Complexity A key hardness parameter for k-SAT: the ratio of clauses to variables Problems that are not critically constrained tend to be much easier in practicethan the relatively few critically constrained ones [Mitchell, Selman, and Levesque ’92; Kirkpatrick and Selman – Science ’94]

Phase transition Random 3-SAT as of 2004 Linear time algs. Random Walk DP DP’ GSAT Walksat SP Typical-Case Complexity SAT solvers continually getting close to tackling problems in the hardest region! SP (survey propagation) now handles 1,000,000 variablesvery near the phase transition region

Tractable Sub-Structure Can Dominate and Drastically Reduce Solution Cost! 2+p-SAT model: mix 2-SAT (tractable) and 3-SAT (intractable) clauses > 40% 3-SAT: exponential scaling Median runtime  40% 3-SAT: linear scaling! Number of variables (Monasson, Selman et al. – Nature ’99; Achlioptas ’00)

How are other NP-complete problems translated into SAT instances?“SAT encoding”

SAT Encoding Example: Planning Domain Planning Problem  Propositional CNF formulaby axiom schemas Logistics planning: think of a number of trucks and planes that need to transport a bunch of packages from their origin to their destination Discrete time, modeled by integers • state predicates: indexed by time at which they hold E.g. at_location(x,,loc,i), free(x,i+1), route(cityA,cityB,i) • action predicates: indexed by time at which action begins E.g. fly(cityA,cityB,i), pickup(x,loc,i), drive_truck(loc1,loc2,i) • each action takes 1 time step • many actions may occur at the same step

Encoding Rules • Actions imply preconditions and effects fly(x,y,i)  at(x,i) and route(x,y,i) and at(y,i+1) • Conflicting actions cannot occur at same time (A deletes a precondition of B) fly(x,y,i) and yz  not fly(x,z,i) • If something changes, an action must have caused it(Explanatory Frame Axioms) at(x,i) and not at(x,i+1)  y . route(x,y) and fly(x,y,i) • Initial and final states hold at(NY,0) and ... and at(LA,9) and ...

Using SAT Solvers for Planning Modeling and Solving a Planning Problem instantiated propositional clauses instantiate Problem description inhigh level language axiom schemas (manual) length mapping SAT engine(s) interpret satisfying model plan (fully automatic)

Planning Benchmark Complexity Logistics domain – a complex, highly-parallel transportation domain E.g. logistics.d problem: • 2,165 possible actions per time slot • optimal solution contains 74 distinct actions over 14 time slots (out of 5 x 10^46 possible sequential plans of length 14) Satplan [Selman et al.] approach is currently fastest optimal planning approach. Winner ICAPS-05 & ICAPS-06 international planning competitions.

Solution Approaches to SAT

Solving SAT: Systematic Search One possibility: enumerate all truth assignments one-by-one, test whether any satisfies F • Note: testing is easy! • But too many truth assignments (e.g. for N=1000 variables, have 21000  10300 truth assignments) 00000000 00000001 00000010 00000011 …… 11111111 2N

Solving SAT: Systematic Search Smarter approach: the “DPLL” procedure [1960’s] (Davis, Putnam, Logemann, Loveland) • Assign values to variables one at a time (“partial” assignments) • Simplify F • If contradiction (i.e. some clause becomes False), “backtrack”, flip last unflipped variable’s value, and continue search • Extended with many new techniques -- 100’s of research papers, yearly conference on SATe.g., extremely efficient data-structures (representation), randomization, restarts, learning “reasons” of failure • Provides proof of unsatisfiability if F is unsat. [“complete method”] • Forms the basis of dozens of very effective SAT solvers!e.g. minisat, zchaff, relsat, rsat, … (open source, available on the www)

Solving SAT: Local Search • Search space: all 2N truth assignments for F • Goal: starting from an initial truth assignment A0, compute assignments A1, A2, …, As such that As is a satisfying assignment for F • Ai+1 is computed by a “local transformation” to Aie.g. A1 = 000110111 green bit “flips” to red bit A2 = 001110111 A3 = 001110101 A4 = 101110101 … … As = 111010000 solution found! • No proof of unsatisfiability if F is unsat. [“incomplete method”] • Several SAT solvers based on this approach, e.g. Walksat

Solving SAT: Decimation • “Search” space: all 2N truth assignments for F • Goal: attempt to construct a solution in “one-shot” by very carefully setting one variable at a time • Survey Inspired Decimation: • Estimate certain “marginal probabilities” of each variable being True, False, or ‘undecided’ in each solution cluster using Survey Propagation • Fix the variable that is the most biased to its preferred value • Simplify F and repeat • A method rarely used by computer scientists • But has received tremendous success from the physics community on random k-SAT; can easily solve random instances with 1M+ variables! • No searching for solution • No proof of unsatisfiability [“incomplete method”]

The Next Two Lectures • Problems beyond SAT / searching for a single solution • #P-complete: count the number of solutions of a SAT instance • #P-hard: sample a solution uniformly at random for a SAT instance • PSPACE-complete: quantified Boolean formula (QBF)

Thank you for attending! Slides: http://www.cs.cornell.edu/~sabhar/tutorials/kitpc08-combinatorial-problems-I.ppt Ashish Sabharwal : http://www.cs.cornell.edu/~sabhar Bart Selman : http://www.cs.cornell.edu/selman

Combinatorial Problems I: Finding Solutions

Combinatorial Problems I: Finding Solutions

Presentation Transcript

“A” students work (without solutions manual) ~ 10 problems/night .

Game Theoretic Problems in Network Economics and Mechanism Design Solutions

Lima’s Slums: Problems or Solutions?

Introduction to Fractals and Fractal Dimension

PREPARING LABORATORY SOLUTIONS AND REAGENTS I

On Finding Repeats in Strings

AI, OR, and CS

Common Childhood Problems

Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications

Genome Rearrangements: from Biological Problem to Combinatorial Algorithms (and back)

TWS Motorola Solutions Inc QBR Report 05/15/13

Server Consolidation

Combinatorial Chemistry

Combinatorial Pattern Matching

Approximation Algorithms for Stochastic Combinatorial Optimization

Millennium Development Goals: Global Solutions to Global Problems

Motif Finding

Combinatorial Optimization for Graphical Models

Iterative Methods and Combinatorial Preconditioners

Combinatorial Pattern Matching

Analyzing Brain Signals by Combinatorial Optimization