1 / 29

Hierarchical Knowledge Gradient for Sequential Sampling

Optimize learning and decision-making by sequencing measurements to produce the best answer from a set of alternatives. Explore the knowledge gradient policy and learn how to maximize expected value.

tstudebaker
Download Presentation

Hierarchical Knowledge Gradient for Sequential Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HIERARCHICAL KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Martijn MesDepartment of Operational Methods for Production and LogisticsUniversity of Twente, The Netherlands Warren PowellDepartment of Operations Research and Financial EngineeringPrinceton University, USA Peter FrazierDepartment of Operations Research and Information EngineeringCornell University, USA Sunday, October 11, 2009INFORMS Annual Meeting San Diego INFORMS Annual Meeting San Diego

  2. OPTIMAL LEARNING • Problem • Find the best alternative from a set of alternatives • Before choosing, you have to option to measure the alternatives • But measurements are noisy • How should you sequence your measurements to produce the best answer in the end? • For problems with a finite number of alternatives • On-line learning (learn as you earn): multi-armed bandit problem • Off-line learning: ranking and selection problem Let’s illustrate the problem… INFORMS Annual Meeting San Diego

  3. WHAT WOULD BE THE BEST PLACE TO GO FISHING? INFORMS Annual Meeting San Diego

  4. WHAT WOULD BE THE BEST PLACE TO BUILD A WIND FARM? INFORMS Annual Meeting San Diego

  5. WHAT WOULD BE THE BEST CHEMICAL COMPOUND IN A DRUG TO FIGHT A PARTICULAR DISEASE? INFORMS Annual Meeting San Diego

  6. WHAT PARAMETER SETTINGS WOULD PRODUCE THE BEST MANUFACTURING CONFIGURATION IN A SIMULATED SYSTEM? Simulation Optimization INFORMS Annual Meeting San Diego

  7. WHERE IS THE MAX OF SOME MULTI-DIMENSIONAL FUNCTION WHEN THE SURFACE IS MEASURED WITH NOISE? Stochastic Search INFORMS Annual Meeting San Diego

  8. BASIC MODEL • We have a set X of distinct alternatives. Each alternative xX is characterized by an independent normal distribution with unknown mean θx and known variance λx. • We have a sequence of N measurement decisions,x0, x1,…, xN-1. The decision xn selects an alternative to sample at time n resulting in an observation . • After the N measurements, we make an implementation decision xN, which is given by the alternative with highest expected reward. INFORMS Annual Meeting San Diego

  9. OBJECTIVE • Our goal is to choose a sampling policy that maximizes the expected value of the implementation decision xN. • Let πΠ be a policy that produces a sequence of measurement decisions xn, n=0,…,N-1. • Objective: Conditional expectation with respect to the policy π. Conditional expectation withrespect to the outcomes INFORMS Annual Meeting San Diego

  10. MEASUREMENT POLICIES [1/2] • Optimal policies: • Dynamic programming (computational challenge) • Special case: multi-armed bandit problem. Can be solved using the Gittins index (Gittins and Jones, 1974). • Heuristic measurement policies: • Pure exploitation: always make the choice that appears to be the best. • Pure exploration: make choices at random so that you are always learning more, but without regard to the cost of the decision. • Hybrid • Explore with probability ρ and exploit with probability 1-ρ • Epsilon-greedy exploration: explore with probability pn=c/n. Goes to zero as n∞, but not too quickly. INFORMS Annual Meeting San Diego

  11. MEASUREMENT POLICIES [2/2] • Heuristic measurement policies, continued: • Boltzmann exploration • Interval estimation • Approximate policies for off-line learning • Optimal computing budget allocation (Chen et al. 1996) • LL(s) – Batch linear loss (Chick et al., 2009) • Maximizing the expected value of a single measurement • (R1, R1, …,R1) policy (Gupta and Miescke, 1996) • EVI (Chick et al., 2009) • “Knowledge gradient” (Frazier and Powell, 2008) INFORMS Annual Meeting San Diego

  12. THE KNOWLEDGE-GRADIET POLICY [1/2] • Updating beliefs • We assume we start with a distribution of belief about the true mean θx, (a Bayesian prior) • Next, we observe • Using Bayes theorem, we can show that our new distribution (posterior belief) about the true mean is • We perform these updates with each observation INFORMS Annual Meeting San Diego

  13. THE KNOWLEDGE-GRADIET POLICY [2/2] • Measurement decisions • The knowledge gradient is the expected value of a single measurement • Knowledge-gradientpolicy INFORMS Annual Meeting San Diego

  14. PROPERTIES OF THE KNOWLEDGE-GRADIET POLICY • Effectively a myopic policy, but also similar to steepest ascent for nonlinear programming. • Myopically optimal: the best single measurement you can make (by construction). • Asymptotically optimal: as the measurement budget grows, we get the optimal solution. • The knowledge gradient is the only stationary policy with this behavior. Many policies are asymptotically optimal (e.g. pure exploration, epsilon greedy) but are not myopically optimal. • But what if the number of alternatives is large relative to the measurement budget? INFORMS Annual Meeting San Diego

  15. CORRELATIONS • There are many problems where making one measurement tells us something about what we might observe from other measurements. • Fishing: nearby locations have similar properties (depth, bottom structure, plants, current, etc.). • Wind farm: nearby locations often share similar wind patterns. • Chemical compounds: structurally similar chemicals often behave similarly. • Simulation optimization: a small adjustment in parameter settings might result in a relative small performance change. • Correlations are particularly important when the number of possible measurements is extremely large relative to the measurement budget (or continuous functions). INFORMS Annual Meeting San Diego

  16. KNOWLEDGE GRADIENT FOR CORRELATED BELIEFS • The knowledge-gradient policy for correlated normal beliefs (Frazier, Powell, and Dayanik., 2009) • Belief is multivariate normal • Significantly outperform methods which ignore correlations • Computing the expectation is more challenging • Assumption: covariance matrix known (or we first have to learn it). INFORMS Annual Meeting San Diego

  17. STATISTICAL AGGREGATION [1/2] • Instead of using a given covariance matrix, we might work with statistical aggregation to allow generalization across alternatives. • Examples: Geographical aggregation Binary tree aggregation for continuous functions INFORMS Annual Meeting San Diego

  18. STATISTICAL AGGREGATION [2/2] • Examples continued: Aggregation of vector valued data (multi-attribute vectors): ignoring dimensions g=7 V(f(a1)) V = value of a driver with certain attributes a1 = location a2 = domicile a3 = capacity type a4 = scheduled time at home a5 = days away from home a6 = available time a7 = geographical constraints a8 = DOT road hours a9 = DOT duty hours a10= Eight-day duty hours g=6 V(a1) g=5 V(a1,f(a2)) g=4 V(a1,a2) g=3 V(a1,…,a3) g=2 V(a1,…,a4) g=1 V(a1,…,a5) g=0 V(a1,…,a10) INFORMS Annual Meeting San Diego

  19. AGGREGATION FUNCTIONS • Aggregation is performed using a set of aggregation functions Gg:XXg, where Xg represents the gth level of aggregation of the original set X. • We use as the estimate of the aggregated alternative Gg(x) on the gth aggregation level after n measurements. • Using aggregation, we express (our estimate of θx) as a weighted combination • We use a Bayesian adaptation of the weights proposed in (George, Powell, and Kulkarni, 2008): Intuition: highest weight to levels with lowest sum of variance and bias. INFORMS Annual Meeting San Diego

  20. HIERARCHIAL KNOWLEDGE GRADIENT (HKG) • Idea: combine the knowledge gradient with: • The weighting equation can be seen as a form of linear regression, so we may use Bayesian regression here: • However, this approach requires an informative prior. • We choose for separate beliefs on the values at each aggregation level. • So, instead of working with a multivariate normal, we have a series of independent normal distributions for each value at each aggregation level. These beliefs are combined using the weighting equation. • In the paper we provide a Bayesian justification of this combination of beliefs. INFORMS Annual Meeting San Diego

  21. HKG IN A NUTSHELL • Compute the knowledge gradients for all xX: • Measure decision: • After observing yxn+1 compute:μxg,n+1, βxg,n+1, δxg,n+1 , wxg,n+1 , σxg,n+1,ε for all xX and gG Split in two terms, one of which depends on the unknown measurement value Expected weight after observing yxn+1 Finding the expectation of the maximum of a set of lines, see (Frazier et al., 2009) Variance of μxg,n+1 INFORMS Annual Meeting San Diego

  22. ILLUSTRATION OF HKG • The knowledge gradient policies prefer to measure alternatives with high mean and/or low precision: • Equal means  measure lowest precision • Equal precisions  measure highest mean • Some MS Excel demos… • Statistical aggregation • Sampling decisions INFORMS Annual Meeting San Diego

  23. NUMERICAL EXPERIMENTS • One-dimensional continuous functions generated by Gaussian process with zero mean and power exponential covariance function. We vary the measurement variance and the length scale parameter ρ. • Multi-dimensional functions: transportation application where the value of a driver depends on his location, domicile, and fleet. INFORMS Annual Meeting San Diego

  24. ONE-DIMENSIONAL FUNCTIONS INFORMS Annual Meeting San Diego

  25. MULTI-DIMENSIONAL FUNCTIONS • HKG finds the best out of 2725 aggregated alternatives in less than 1200 measurements in all 25 replications. INFORMS Annual Meeting San Diego

  26. CONCLUSIONS. HKG… • an extension of the knowledge-gradient policy to problems where an alternative is described by a multi-dimensional vector in a computationally feasible way. • functions are estimated using an appropriately weighted sum of estimates at different levels of aggregation. • it exploits aggregation structure and similarity between alternatives, without requiring a specification of an explicit covariance matrix for our belief (which also avoids the computational challenge of working with large matrices). • It is optimal in the limit, i.e., eventually it always discovers the best alternative. • it efficiently maximizes various functions (continuous, discrete). Besides the aggregation structure, it does not make any specific assumptions about the structure of the function or set of alternatives, and it does not require tuning. INFORMS Annual Meeting San Diego

  27. FURTHER RESEARCH [1/2] • Hierarchical sampling • HKG requires us to scan all possible measurements before making a decision. • As an alternative, we can use HKG to choose regions to measure at successively finer levels of aggregation. • Because aggregated sets have fewer elements than the disaggregated set, we might gain some computational advantage. • Challenge: what measures to use in an aggregated sampling decision? • Knowledge gradient for Approximate dynamic programming • To cope with the exploration versus exploitation problem. • Challenge… INFORMS Annual Meeting San Diego

  28. FURTHER RESEARCH [2/2] • The challenge is to cope with bias in downstream values • Decision has impact on downstream path • Decision has impact on the value of states in the upstream path (off-policy Monte Carlo learning) INFORMS Annual Meeting San Diego

  29. QUESTIONS? Martijn Mes Assistant professor University of Twente School of Management and Governance Operational Methods for Production and Logistics The Netherlands Contact Phone: +31-534894062 Email: m.r.k.mes@utwente.nl Web: http://mb.utwente.nl/ompl/staff/Mes/ INFORMS Annual Meeting San Diego

More Related