Learning to search
Download
1 / 58

Learning to Search - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Learning to Search. Henry Kautz University of Washington joint work with Dimitri Achlioptas, Carla Gomes, Eric Horvitz, Don Patterson, Yongshao Ruan, Bart Selman CORE – MSR, Cornell, UW. Speedup Learning. Machine learning historically considered Learning to classify objects

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Learning to Search' - veda-webb


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Learning to search

Learning to Search

Henry Kautz

University of Washington

joint work with

Dimitri Achlioptas, Carla Gomes, Eric Horvitz, Don Patterson, Yongshao Ruan, Bart Selman

CORE – MSR, Cornell, UW


Speedup learning
Speedup Learning

  • Machine learning historically considered

    • Learning to classify objects

    • Learning to search or reason more efficiently

      • Speedup Learning

  • Speedup learning disappeared in mid-90’s

    • Last workshop in 1993

    • Last thesis 1998

  • What happened?

    • It failed.

    • It succeeded.

    • Everyone got busy doing something else.


It failed
It failed.

  • Explanation based learning

    • Examine structure of proof trees

    • Explain why choices were good or bad (wasteful)

    • Generalize to create new control rules

  • At best, mild speedup (50%)

    • Could even degrade performance

  • Underlying search engines very weak

    • Etzioni (1993) – simple static analysis of next-state operators yielded as good performance as EBL


It succeeded
It succeeded.

  • EBL without generalization

    • Memoization

    • No-good learning

    • SAT: clause learning

      • Integrates clausal resolution with DPLL

  • Huge win in practice!

    • Clause-learning proofs can be exponentially smaller than best DPLL (tree shaped) proof

    • Chaff (Malik et al 2001)

      • 1,000,000 variable VLSI verification problems


Everyone got busy
Everyone got busy.

  • The something else: reinforcement learning.

    • Learn about the world while acting in the world

    • Don’t reason or classify, just make decisions

    • What isn’t RL?


Another path
Another path

  • Predictive control of search

    • Learn statistical model of behavior of a problem solver on a problem distribution

    • Use the model as part of a control strategy to improve the future performance of the solver

  • Synthesis of ideas from

    • Phase transition phenomena in problem distributions

    • Decision-theoretic control of reasoning

    • Bayesian modeling


Big picture

control / policy

dynamic features

resource allocation / reformulation

Big Picture

runtime

Solver

Problem

Instances

Learning /

Analysis

static features

Predictive

Model


Case study 1 beyond 4 25
Case Study 1: Beyond 4.25

runtime

Solver

Problem

Instances

Learning /

Analysis

static features

Predictive

Model


Phase transitions problem hardness
Phase transitions & problem hardness

  • Large and growing literature on random problem distributions

    • Peak in problem hardness associated with critical value of some underlying parameter

      • 3-SAT: clause/variable ratio = 4.25

  • Using measured parameter to predict hardness of a particular instance problematic!

    • Random distribution must be a good model of actual domain of concern

      • Recent progress on more realistic random distributions...


Quasigroup completion problem qcp
Quasigroup Completion Problem (QCP)

  • NP-Complete

    • Has structure is similar to that of real-world problems - tournament scheduling,classroom assignment, fiber optic routing, experiment design, ...

    • Can generate hard guaranteed SAT instances (2000)


Complexity Graph

20%

20%

42%

42%

50%

50%

Phase Transition

Critically constrained area

Underconstrained

area

Overconstrained area

Phase transition

Almost all solvable

area

Almost all unsolvable

area

Fraction of unsolvable cases

Fraction of pre-assignment


Easy hard easy pattern in local search

Walksat

Order 30, 33, 36

Easy-Hard-Easy pattern in local search

Computational Cost

Underconstrained

area

“Over” constrained area

% holes


Are we ready to predict run times
Are we ready to predict run times?

  • Problem: high variance

log scale


Deep structural features

Rectangular Pattern

Aligned Pattern

Balanced Pattern

Tractable

Very hard

Deep structural features

Hardness is also controlled by structure of constraints, not just the fraction of holes


Random versus balanced
Random versus balanced

Balanced

Random


Random versus balanced1
Random versus balanced

Balanced

Random






Effect of balance on hardness
Effect of balance on hardness

  • Balanced patterns yield (on average) problems that are 2 orders of magnitude harder than random patterns

  • Expected run time decreases exponentially with variance in # holes per row or column

    E(T) = C-ks

  • Same pattern (differ constants) for DPPL!

  • At extreme of high variance (aligned model) can prove no hard problems exist


Intuitions
Intuitions

  • In unbalanced problems it is easier to identify most critically constrained variables, and set them correctly

    • Backbone variables


Are we done
Are we done?

  • Unfortunately, not quite.

  • While few unbalanced problems are hard, “easy” balanced problems are not uncommon

  • To do: find additional structural features that signify hardness

    • Introspection

    • Machine learning (later this talk)

    • Ultimate goal: accurate, inexpensive prediction of hardness of real-world problems


Case study 2 autowalksat

control / policy

dynamic features

Case study 2: AutoWalksat

runtime

Solver

Problem

Instances

Learning /

Analysis

Predictive

Model


Walksat
Walksat

Choose a truth assignment randomly

While the assignment evaluates to false

Choose an unsatisfied clause at random

If possible, flip an unconstrained variable in that clause

Else with probability P (noise)

Flip a variable in the clause randomly

Else flip the variable in the clause which causes the smallest number of satisfied clauses to become unsatisfied.

Performance of Walksat is highly sensitive to the setting of P


The invariant ratio

Mean of the objective function

Std Deviation of the objective function

The Invariant Ratio

  • Shortest expected run time when P is set to minimize

    • McAllester, Selman and Kautz (1997)

+ 10%

7

6

5

4

3

2

1

0


Automatic noise setting
Automatic Noise Setting

  • Probe for the optimal noise level

  • Bracketed Search with Parabolic Interpolation

    • No derivatives required

    • Robust to stochastic variations

    • Efficient













Other features still lurking
Other features still lurking

clockwise – add 10% counter-clockwise – subtract 10%

  • More complex function of objective function?

  • Mobility? (Schuurmans 2000)


Case study 3 restart policies

control / policy

dynamic features

resource allocation / reformulation

Case Study 3: Restart Policies

runtime

Solver

Problem

Instances

Learning /

Analysis

static features

Predictive

Model


Background
Background

Backtracking search methods often exhibit a remarkable variability in performance between:

  • different heuristics

  • same heuristic on different instances

  • different runs of randomized heuristics


Cost distributions

Very long

Very short

Cost Distributions

Observation (Gomes 1997): distributions often have heavy tails

  • infinite variance

  • mean increases without limit

  • probability of long runs decays by power law (Pareto-Levy), rather than exponentially (Normal)


Randomized restarts
Randomized Restarts

  • Solution: randomize the systematic solver

    • Add noise to the heuristic branching (variable choice) function

    • Cutoff and restart search after a some number of steps

  • Provably eliminates heavy tails

  • Very useful in practice

    • Adopted by state-of-the art search engines for SAT, verification, scheduling, …



How to determine restart policy
How to determine restart policy

  • Complete knowledge of run-time distribution (only): fixed cutoff policy is optimal (Luby 1993)

    • argmin t E(Rt) where

      E(Rt) = expected soln time restarting every t steps

  • No knowledge of distribution: O(log t) of optimal using series of cutoffs

    • 1, 1, 2, 1, 1, 2, 4, …

  • Open cases addressed by our research

    • Additional evidence about progress of solver

    • Partial knowledge of run-time distribution


Backtracking problem solvers
Backtracking Problem Solvers

  • Randomized SAT solver

    • Satz-Rand, a randomized version of Satz (Li & Anbulagan 1997)

    • DPLL with 1-step lookahead

    • Randomization with noise parameter for increasing variable choices

  • Randomized CSP solver

    • Specialized CSP solver for QCP

    • ILOG constraint programming library

    • Variable choice, variant of Brelaz heuristic


Formulation of learning problem

Observation horizon

Observation horizon

Short

Long

Median run time

1000 choice points

Formulation of Learning Problem

  • Different formulations of evidential problem

    • Consider a burst of evidence over initial observation horizon

    • Observation horizon + time expended so far

    • General observation policies


Formulation of learning problem1
Formulation of Learning Problem

  • Different formulations of evidential problem

    • Consider a burst of evidence over initial observation horizon

    • Observation horizon + time expended so far

    • General observation policies

Observation horizon + Time expended

Observation horizon

Short

Long

Median run time

t1

t2

t3

1000 choice points


Formulation of dynamic features
Formulation of Dynamic Features

  • No simple measurement found sufficient for predicting time of individual runs

  • Approach:

    • Formulate a large set of base-level and derived features

      • Base features capture progress or lack thereof

      • Derived features capture dynamics

      • 1st and 2nd derivatives

      • Min, Max, Final values

    • Use Bayesian modeling tool to select and combine relevant features


Dynamic features
Dynamic Features

  • CSP: 18 basic features, summarized by 135 variables

    • # backtracks

    • depth of search tree

    • avg. domain size of unbound CSP variables

    • variance in distribution of unbound CSP variables

  • Satz: 25 basic features, summarized by 127 variables

    • # unbound variables

    • # variables set positively

    • Size of search tree

    • Effectiveness of unit propagation and lookahead

    • Total # of truth assignments ruled out

    • Degree interaction between binary clauses, l


Different formulations of task
Different formulations of task

  • Single instance

    • Solve a specific instance as quickly as possible

    • Learn model from one instance

  • Every instance

    • Solve an instance drawn from a distribution of instances

    • Learn model from ensemble of instances

  • Any instance

    • Solve some instance drawn from a distribution of instances, may give up and try another

    • Learn model from ensemble of instances


Sample results csp qwh single
Sample Results: CSP-QWH-Single

  • QWH order 34, 380 unassigned

  • Observation horizon without time

  • Training: Solve 4000 times with random Test: Solve 1000 times

  • Learning: Bayesian network model

    • MS Research tool

    • Structure search with Bayesian information criterion (Chickering, et al. )

  • Model evaluation:

    • Average 81% accurate at classifying run time vs. 50% with just background statistics (range of 98% - 78%)


Learned decision tree

  • Min of 1st derivative of variance in number of uncolored cells across columns and rows.

  • Min depth of all search leaves of the search tree.

  • Change of sign of the change of avg depth of node in search tree.

  • Min number of uncolored cells averaged across columns.

Learned Decision Tree


Restart policies
Restart Policies

  • Model can be used to create policies that are better than any policy that only uses run-time distribution

  • Example:

    Observe for 1,000 steps

    If “run time > median” predicted, restart immediately;

    else run until median reached or solution found;

    If no solution, restart.

    • E(Rfixed) = 38,000 but E(Rpredict) = 27,000

    • Can sometimes beat fixed even if observation horizon > optimal fixed !


Ongoing work
Ongoing work

  • Optimal predictive policies

    • Dynamic features

    • + Run time

    • + Static features

    • Partial information about run time distribution

      • E.g.: mixture of two or more subclasses of problems

  • Cheap approximations to optimal policies

    • Myoptic Bayes


Conclusions
Conclusions

  • Exciting new direction for improving power of search and reasoning algorithms

  • Many knobs to learn how to twist

    • Noise level, restart policies just a start

  • Lots of opportunities for cross-disciplinary work

    • Theory

    • Machine learning

    • Experimental AI and OR

    • Reasoning under uncertainty

    • Statistical physics


ad