CSCE 580 Artificial Intelligence Ch.4: Features and Constraints

CSCE 580Artificial IntelligenceCh.4: Features and Constraints Fall 2009 Marco Valtorta mgv@cse.sc.edu Every task involves constraint, Solve the thing without complaint; There are magic links and chains Forged to loose our rigid brains. Structures, strictures, though they bind, Strangely liberate the mind. —James Falen

Iterative-deepening-A* (IDA*) works as follows: At each iteration, perform a depth-first search, cutting off a branch when its total cost (g + h) exceeds a given threshold. This threshold starts at the estimate of the cost of the initial state, and increases for each iteration of the algorithm. At each iteration, the threshold used for the next iteration is the minimum cost of all values that exceeded the current threshold. Richard Korf. “Depth-First Iterative-Deepening: An Optimal Admissible Tree Search.” Artificial Intelligence, 27 (1985), 97-109.

Acknowledgment • The slides are based on the textbook [P] and other sources, including other fine textbooks • [AIMA-2] • David Poole, Alan Mackworth, and Randy Goebel. Computational Intelligence: A Logical Approach. Oxford, 1998 • A second edition (by Poole and Mackworth) is under development. Dr. Poole allowed us to use a draft of it in this course • Ivan Bratko. Prolog Programming for Artificial Intelligence, Third Edition. Addison-Wesley, 2001 • The fourth edition is under development • George F. Luger. Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Sixth Edition. Addison-Welsey, 2009

Constraint Satisfaction Problems • Given a set of variables, each with a set of possible values (a domain), assign a value to each variable that either • satisfies some set of constraints • satisfiability problems • hard constraints • minimizes some cost function, where each assignment of values to variables has some cost • optimization problems • soft constraints • Many problems are a mix of hard and soft constraints.

Relationship to Search • The path to a goal isn't important, only the solution is • Many algorithms exploit the multi-dimensional nature of the • problems • There are no predefined starting nodes • Often these problems are huge, with thousands of variables, so systematically searching the space is infeasible • For optimization problems, there are no well-defined goal nodes

Constraint Satisfaction Problems A CSP is characterized by • A set of variables V1, V2, …,Vn • Each variable Vi has an associated domain DVi of possible values • For satisfiability problems, there are constraint relations on various subsets of the variables which give legal combinations of values for these variables • A solution to the CSP is an n-tuple of values for the variables that satisfies all the constraint relations

Examples 4.4 and 4.9 (Crossword Puzzle) • Example 4.4 A classic example of a constraint satisfaction problem is a crossword puzzle. There are two different representations of crossword puzzles in terms of variables: • In one representation, the variables are the numbered squares with the direction of the word (down or across), and the domains are the set of possible words that can be put in. A possible world corresponds to an assignment of a word for each of the variables. • In another representation of a crossword, the variables are the individual squares and the domain of each variable is the set of letters in the alphabet. A possible world corresponds to an assignment of a letter to each square. • Consider the constraints for the two representations of crossword • puzzles of Example 4.4 (page 115). • For the case where the domains are words, the constraint is that the letters where a pair of words intersect must be the same. • For the representation where the domains are the letters, the constraint is that contiguous sequences of letters have to form legal words.

Example 4.8 [P]: Scheduling Activities

CSP as Graph Searching A CSP can be represented as a graph-searching algorithm: • A node is an assignment of values to some of the variables • Suppose node N is the assignment X1 = v1, …, Xk = vk • Select a variable Y that isn't assigned in N • For each value yi in dom(Y ) there is a neighbor X1 = v1,…, Xk = vk, Y = yi if this assignment is consistent with the constraints on these variables. • The start node is the empty assignment. • A goal node is a total assignment that satisfies the constraints

Backtracking Algorithms • Systematically explore D by instantiating the variables one at a time • Evaluate each constraint predicate as soon as all its variables are bound • Any partial assignment that doesn't satisfy the constraint can be pruned Example: Assignment A = 1& B = 1 is inconsistent with constraint A != B regardless of the value of the other variables

Backtracking Search Example (4.13) Suppose you have a CSP with the variables A, B, C, each with domain {1, 2, 3, 4}. Suppose the constraints are A < B and B < C. The size of the search tree, and thus the efficiency of the algorithm, depends on which variable is selected at each time. In this example, there would be 43 = 64 assignments tested in generate-and-test. For the search method, there are 22 assignments generated. Generate-and-test always reaches the leaves of the search tree.

Consistency Algorithms • Idea: prune the domains as much as possible before selecting values from them • A variable is domain consistent if no value of the domain of the node is ruled impossible by any of the constraints

Constraint Network • There is a oval-shaped node for each variable • There is a rectangular node for each constraint relation • There is a domain of values associated with each variable node • There is an arc from variable X to each relation that involves X

Constraint Network for Example 4.15 • There are three variables A, B, C, each with domain {1, 2, 3, 4}. The constraints are A < B and B < C. In the constraint network, shown above (Fig. 4.2) there are 4 arcs: <A, A < B>, <B, A < B>, <B, B < C>, <C, B < C> • None of the arcs are arc consistent. The first arc is not arc consistent because for A = 4 there is no corresponding value for B, for which A < B.

Example Constraint Network: Fig 4.4 For this example (delivery robot Example 4.8): DB = {1, 2, 3, 4} is not domain consistent as B = 3 violates the constraint B != 3

Arc Consistency • An arc <X, r (X, Y )> is arc consistent if, for each value x in dom(X), there is some value y in dom(Y ) such that r(x, y) is satisfied • A network is arc consistent if all its arcs are arc consistent • If an arc <X, r (X, Y )> is not arc consistent, all values of X in dom(X) for which there is no corresponding value in dom(Y ) may be deleted from dom(X) to make the arc X; r (X;Y ) consistent

Arc Consistency Algorithm • The arcs can be considered in turn making each arc consistent. • An arc <X, r (X, Y )> needs to be revisited if the domain of one of the Y 's is reduced. • Three possible outcomes (when all arcs are arc consistent): • One domain is empty => no solution • Each domain has a single value => unique solution • Some domains have more than one value => there may or may not be a solution • If each variable domain is of size d and there are e constraints to be tested then the algorithm GAC does O(ed3) consistency checks. For some CSPs, for example, if the constraint graph is a tree, GAC alone solves the CSP and does it in time linear in the number of variables.

Generalized Arc Consistency Algorithm

Arc consistency algorithm AC-3 • Time complexity: O(n2d3), where n is the number of variables and d is the maximum variable domain size, because: • At most O(n2) arcs • Each arc can be inserted into the agenda (TDA set) at most d times • Checking consistency of each arc can be done in O(d2) time

Generalized Arc Consistency Algorithm • Three possible outcomes: • One domain is empty => no solution • Each domain has a single value => unique solution • Some domains have more than one value => there may or may not be a solution • If the problem has a unique solution, GAC may end in state (2) or (3); otherwise, we would have a polynomial-time algorithm to solve UNIQUE-SAT • UNIQUE-SAT or USAT is the problem of determining whether a formula known to have either zero or one satisfying assignments has zero or has one. Although this problem seems easier than general SAT, if there is a practical algorithm to solve this problem, then all problems in NP can be solved just as easily [Wikipedia; L.G. Valiant and V.V. Vazirani, NP is as Easy as Detecting Unique Solutions. Theoretical Computer Science, 47(1986), 85-94.] • Thanks to Amber McKenzie for asking a question about this!

Finding Solutions when AC Finishes • If some domains have more than one element => search • Split a domain, then recursively solve each half • We only need to revisit arcs affected by the split • It is often best to split a domain in half

Domain Splitting: Examples 4.15, 4.19, 4.22 • Suppose it first selects the arc (A,A < B). For A = 4, there is no value of B that satisfies the constraint. Thus 4 is pruned from the domain of A. Nothing is added to TDA as there is no other arc currently outside TDA. • Suppose that (B, A < B) is selected next. The value 1 can be pruned from the domain of B. Again no element is added to TDA. • Suppose that (B, B < C) is selected next. The value 4 can be removed from the domain of B. As the domain of B has been reduced, the arc (A,A < B) must be added back into the TDA set because potentially the domain of A could be reduced further now that the domain of B is smaller. • If the arc (A,A < B) is selected next, the value A = 3 can be pruned from the domain of A. • The remaining arc on TDA is (C, B < C). The values 1 and 2 can be removed from the domain of C. No arcs are added to TDA and TDA becomes empty. • The algorithm then terminates with DA = {1, 2}, DB = {2, 3}, DC = {3, 4}. While this has not fully solved the problem, it has greatly simplified it.

Domain Splitting: Examples 4.15, 4.19, 4.22 After arc consistency had completed, there are multiple elements in the domains. Suppose B is split. There are two cases: • B = 2. In this case A = 2 is pruned. Splitting on C produces two of the answers. • B = 3. In this case C = 3 is pruned. Splitting on A produces the other two answers. This search tree should be contrasted with the search tree of Figure 4.1 (page 120). The search space with arc consistency is much smaller and not as sensitive to the selection of variable orderings. (Figure 4.1 (page 120) would be much bigger with different variable orderings).

Variable Elimination: Preliminaries The enrolled relation

Variable Elimination: Join

Variable Elimination: Example

Variable Elimination Algorithm

Local Search Local Search: • Maintain an assignment of a value to each variable • At each step, select a neighbor of the current assignment (usually, that improves some heuristic value) • Stop when a satisfying assignment is found, or return the best assignment found Requires: • What is a neighbor? • Which neighbor should be selected? (Some methods maintain multiple assignments.)

Local Search for CSPs • For loop: • Random initialization • Try: random restart • While loop: • Local search (Walk) • Two special cases of the algorithm: • Random sampling • Random walk

Local Search for CSPs • Aim is to find an assignment with zero unsatisfied relations • Given an assignment of a value to each variable, a conflict is an unsatisfied constraint • The goal is an assignment with zero conflicts • Heuristic function to be minimized: the number of conflicts

Iterative Best Improvement [4.8.1 P]

Greedy Descent Variants • Find the variable-value pair that minimizes the number of conflicts at every step • Select a variable that participates in the most number of conflicts. Select a value that minimizes the number of conflicts • Select a variable that appears in any conflict. Select a value that minimizes the number of conflicts • Select a variable at random. Select a value that minimizes the number of conflicts • Select a variable and value at random; accept this change if it does not increase the number of conflicts.

Selecting Neighbors in Local Search • When the domains are small or unordered, the neighbors of an assignment can correspond to choosing another value for one of the variables. • When the domains are large and ordered, the neighbors of an assignment are the adjacent values for one of the variables. • If the domains are continuous, Gradient descent changes each variable proportionally to the gradient of the heuristic function in that direction. The value of variable Xi goes from vi to • Gradient ascent: go uphill; vi becomes

Problems with Hill Climbing

Randomized Algorithms • Consider two methods to find a maximum value: • Hill climbing, starting from some position, keep moving uphill and report maximum value found • Pick values at random and report maximum value found • Which do you expect to work better to find a maximum? • Can a mix work better?

Randomized Hill Climbing As well as uphill steps we can allow for: • Random steps: move to a random neighbor • Random restart: reassign random values to all variables Which is more expensive computationally?

1-Dimensional Ordered Examples Two --dimensional search spaces; step right or left: • Which method would most easily find the maximum? • What happens in hundreds or thousands of dimensions? • What if different parts of the search space have different structure?

Random Walk Variants of random walk: • When choosing the best variable-value pair, randomly sometimes choose a random variable-value pair • When selecting a variable then a value: • Sometimes choose any variable that participates in the most conflicts • Sometimes choose any variable that participates in any conflict (a red node) • Sometimes choose any variable. • Sometimes choose the best value and sometimes choose a random value

Comparing Stochastic Algorithm • How can you compare three algorithms when • one solves the problem 30% of the time very quickly but doesn't halt for the other 70% of the cases • one solves 60% of the cases reasonably quickly but doesn't solve the rest • one solves the problem in 100% of the cases, but slowly? • Summary statistics, such as mean run time, median run time, and mode run time don't make much sense

Runtime Distribution Plots runtime (or number of steps) and the proportion (or number) of the runs that are solved within that runtime

Runtime Distribution [Fig.4.9P]

Variant: Simulated Annealing • Pick a variable at random and a new value at random • If it is an improvement, adopt it • If it isn't an improvement, adopt it probabilistically depending on a temperature parameter, T. • With current assignment n and proposed assignment n’ we move to n’ with probability • Temperature can be reduced • Probability of accepting a change:

Tabu Lists • To prevent cycling we can maintain a tabu list of the k last assignments • Don't allow an assignment that is already on the tabu list • If k = 1, we don't allow an assignment of the same value to the variable chosen • We can implement it more efficiently than as a list of complete assignments • It can be expensive if k is large

Parallel Search A total assignment is called an individual • Idea: maintain a population of k individuals instead of one • At every stage, update each individual in the population • Whenever an individual is a solution, it can be reported • Like k restarts, but uses k times the minimum number of steps

Beam Search • Like parallel search, with k individuals, but choose the k best out of all of the neighbors • When k = 1, it is hill climbing • When k = infinity, it is breadth-first search • The value of k lets us limit space and parallelism

Stochastic Beam Search • Like beam search, but it probabilistically chooses the k individuals at the next generation • The probability that a neighbor is chosen is proportional to its heuristic value • This maintains diversity amongst the individuals • The heuristic value reflects the fitness of the individual • Like asexual reproduction: each individual mutates and the fittest ones survive

Genetic Algorithms • Like stochastic beam search, but pairs of individuals are combined to create the offspring • For each generation: • Randomly choose pairs of individuals where the fittest individuals are more likely to be chosen • For each pair, perform a cross-over: form two offspring each taking different parts of their parents • Mutate some values • Stop when a solution is found

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints