Penalties for Distributed Local Search

Penalties for Distributed Local Search Muhammed B. Basharu

outline • Distributed Constraint Satisfaction Problems • Landscape modification in local search. • Algorithms • DisPeL • Stoch-DisPeL • Multi-DisPeL • Summary

The Distributed Constraint Satisfaction Problem • Formal description of distributed problems that involve several participants within the CSP framework. • Key feature is the physical distribution of information about a problem such that a problem can not be solved in a single location. • Other key assumptions: • Each participant has partial knowledge of the problem. • Privacy – restrictions on how much information participants are willing to reveal. • For example in meeting scheduling, each participant has a separate private calendar.

The Distributed Constraint Satisfaction Problem • Formally, a distributed constraint satisfaction problem is defined as: DisCSP = <A, X, D, C> • Where: • A is a set of autonomous agents, representing participants in the problem. • X is a set of decision variables. • D is the set of domains for each variable. • C is the constraints between variables. • Each variable in a DisCSP is owned/represented by exactly one agent, but each agent may own multiple variables.

Solving DisCSPs • DisCSPs are solved by collaborative search. • Agents exchange possible assignments for the variables they represent until they find values that satisfy all constraints. • Most popular algorithms are distributed backtracking with no-good learning e.g. Asynchronous Backtracking. • Local search has not received much attention; Distributed Breakout and Distributed Stochastic search.

Landscape modification in local search • Local search has the advantage of quick convergence, but the weakness of attraction to local minima. • Schemes that try to deal with local minima by modifying cost landscapes with weights on constraints have attracted a lot of attention. • Weights on violated constraints are increased whenever the search is stuck; which has the effect of modifying the cost landscape such that the search is naturally driven away from unprofitable regions. • We argue that weights might not induce the necessary exploration, especially in problems where the landscapes are dominated by plateaus. (1)

b h a Modifying landscapes with constraint weights P3: violating c1 & c2 X = {a, b} Da = [1..20] Db = [1..20] c1: a - b>= 9 c2: a + 3b<40 P1: violating c1 P2: violating c2

b h b h a a Modifying landscapes with constraint weights (2) • Assuming the search starts off at (a=8 and b=4).

Modifying landscapes with penalties • We propose a finer grained approach focusing on assignments associated with local optima. • A separate penalty is attached to each value in each variable’s domain. • When the search is stuck at a local minimum, sizes of the penalties on those values assigned to variables in violated constraints are increased. (2) where: ai is value assigned to ith variable V(ai) is the number of constraints violated with ai p(ai) is the penalty attached to ai

b h b h a a Modifying landscapes with penalties (2)

Penalties in Distributed Local Search • Distributed Penalty Driven Search (DisPeL) • DisPeL is a greedy iterative improvement algorithm for solving DisCSPs. • To minimise communications costs, sequential improvements are accepted rather than just the best improvements in each iteration. • A total ordering imposed on agents using their lexicographic IDs. • In each iteration, agents take turns to improve the solution but unconnected agents at the same level can make changes simultaneously.

DisPeL (2) • Quasi-local-minima (deadlocks) are resolved with a two phased strategy: • Local perturbations to induce exploration. • “Learn” about and avoid assignments associated with deadlocks if perturbation fails to resolve a deadlock. • Penalties are used to implement both parts of the strategy. • Each agent keeps track of a number of recent deadlocks it attempted to resolve.

t if temporary penalty is imposed 0 otherwise h(di) = V(di) + p(di) + (3) DisPeL (3) • Modified evaluation function for each variable/agent. • In each iteration, agents select the values in their domains with the least sum of constraint violations and penalties. • where: • di is the ith value in the domain • V(di) is the number of constraints violated by selecting di • p(di) is the fixed penalty on di • t is the temporary penalty (t = 3)

Phase 1: perturbing the search • Aim: to induce exploration by forcing agents to try values other than their current assignments. • Earliest deadlocked agent imposes the temporary on its value and asks its “lower” neighbours violating constraints with it to do the same. • Each agent selects the best value in its domain with eqn (3) and discards the temporary penalty after it is used.

Phase 2: Learning with penalties • Use penalties to remember “bad” assignments and also to induce exploration in the search. • Earliest deadlocked agent increases incremental penalty on its current assignment and it asks all its “lower” neighbours to do the same. • But, learning is short term since penalties are regularly discarded.

resolve conflict Impose temporary penalty on current value Ask deadlocked lower priority neighbours to do the same receive messages from neighbours no Add conflict state to no good store update AgentView is conflict in no-good store? yes deadlocked? no Increase incremental penalties on current value yes select value minimising equation (3) Ask all lower priority neighbours to do the same send value to neighbours

Penalty resets • Unlike constraint weights, penalties are “independent” of the underlying landscape and can dominate cost functions. • As a result, penalties can cause distortions which can divert the search away from promising regions of the search space. • Therefore, all penalties on a variables entire domain are reset to zero as follows: • When it has a consistent value. • When its cost function is distorted by penalties i.e. i. Current assignment has the best evaluation.Ii. There is another value in the domain with lower constraint violations.

Penalty resets (2) 100 random DisCSPs (n=60,d=10,p1=0.1,p2=0.5).

Empirical evaluations • Compared DisPeL with the Distributed Breakout Algorithm (DBA). • DBA is earliest distributed local search algorithm, it deals with local minima by modifying cost landscapes with constraint weights. • Performance of both algorithms were compared on DisCSPs where each agent owns just a single variable. • Problems used include: • Distributed graph colouring • Random DisCSPs • Car Sequencing Problems • On each instance, each algorithm had a single attempt with a maximum of 100n iterations before the attempt was deemed unsuccessful.

Results – distributed graph colouring Percentage of instances solved 100 instances for each degree (n=100, k=3).

Results – distributed graph colouring (2) Median Search Costs (Iterations) on successful attempts on Distributed Graph Colouring Problems 100 instances in each point (n=100, k=3).

Other results • In problems with non-binary constraints, DisPeL solved considerably more problems. • Similar difference in search costs between the algorithms. • DisPeL was also tested under conditions of unreliable communications and found that even with a message loss rate of 40% it still solved a high percentage of problems.

Stoch-DisPeL: exploiting randomisation in DisPeL • Deterministic use of penalties make DisPeL vulnerable to the effects bad random initialisations. • Stoch-DisPeL: introduces randomisation when agents attempt deadlock resolution. • Choose with probability p to use the temporary penalty or (1-p) to use incremental penalty. • Empirical results suggests that performance is optimal at p=0.3 • Additional benefit of speeding up the algorithm, since agents no longer check if a deadlock had previously been encountered.

Dealing with bad initialisations • Bad initialisation: problem unsolved with DisPeL after 10,000 iterations. • Good initialisation: solved in 41 iterations with DisPeL.

Results

Results (2)

Dealing with multiple local variables

Dealing with multiple local variables (2) • Extend Stoch-DisPeL for agents with multiple local variables – Multi-DisPeL. • Agents still take turns to improve the solution. • Each agent performs a steepest descent search on its local sub-problem. • Penalties are implemented when internal algorithms are stuck and agents ask neighbours to implement penalties on their variables’ values too.

Multi-DisPeL: evaluations • Compared its performance against similar versions of DBA and Asynchronous Weak Commitment (AWC). • Evaluations with: • DisSAT (using solvable 3-SAT instances from SATLib) • Distributed Graph Colouring • Random DisCSPs. • Variables in each problem were distributed evenly amongst agents. • Evaluation also looked at the effect the number of variables per agent had on performance.

Results – DisSAT Percentage of Distributed SAT Problems Solved for Different Number of Literals per Agent • Multi-DB run for a maximum of 500n iterations with periodic restarts. • Multi-DisPeL run for a maximum of 100n iterations.

Results – DisSAT (2) Average Search Costs from Attempts on Distributed SAT Problems

Other results • Strong performance against the other algorithms on distributed graph colouring and random binary DisCSPs. • Compared to AWC for agents with multiple local variables, Multi-DisPeL had higher average costs but lower median costs in nearly all tests.

Summary • Penalties attached to domain values used to modify cost landscapes to deal with local minima. • Penalties more effective at inducing search exploration than constraint weights. • But, as penalties are independent of the underlying landscape, they have to be discarded regularly. • 3 new local search algorithms for solving DisCSPs were introduced based on the ideas above. • The algorithms showed competitive performance against other distributed local search algorithms.

Penalties for Distributed Local Search