1 / 1

Perfect recall:

Belief Propagation for Structured Decision Making. c 1 c 4 d 1. c 1 c 2 d 2. abc. bcd. ab. c 3 d 1. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. c 1. d 1. e. d. abe. ed. d 1. d 1. d 2. d 2. c 4 d 2 d 3. bc. c 2 c 3 d 3. c 2.

iain
Download Presentation

Perfect recall:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Belief Propagation for Structured Decision Making c1c4d1 c1c2d2 abc bcd ab c3d1 QiangLiu Alexander Ihler Department of Computer Science, University of California, Irvine c1 d1 e d abe ed d1 d1 d2 d2 c4d2d3 bc c2c3d3 c2 c1 d1 u d2 u c3 c2 d3 c3 Variational Framework for structured decision Abstract c4 d2 d3 Main result: • Variational inference methodssuch as loopy BP have revolutionized inference abilities on graphical models. • Influence diagrams (or decision networks)are extension of graphical models for representing structured decision making problems. • Our contribution: • A general variational framework for solving influence diagrams • A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm • Significant empirical improvement over the baseline algorithm c4 u Causes policies to be deterministic If is the maximum, the optimal strategy is • Intuition: the last term encourages policies to be deterministic • Perfect recall  convex optimization (easier) • Imperfect recall  non-convex optimization (harder) • Significance: • Enables converting arbitrary variational methods to MEU algorithms • “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops) Graphical Models and Variational Methods b • Graphical models: • Factors & exponential family form • Graphical representations: Bayes nets, Markov random fields … • Inference: answering queries about graphical models Our Algorithms a c • Junction graph belief propagation for MEU: • Construct junction graph over the augmented distribution e d Augmented distribution (factor graph) Junction graph Influence diagram Decision cluster of d1 • e.g., calculating (log) partition function: Normal cluster • Variational methods: • Log-partition function duality: • Junction graph BP: approximating and • For each decision node , identify a unique cluster (called a decision cluster) that includes • Message passing algorithm ( ) Sum-messages (from normal clusters): : locally consistent polytope Bethe-Kikuchi approximation Loopy Junction graph MEU-messages (from decision clusters): Influence Diagram Forecast Weather • Influence diagram: • Chance nodes (C): Optimal policies: Activity Happiness Conditional probability: • Decision nodes (D): Decision rule: • Strong local optimality: provably better than SPU or • Utility nodes (U): • Convergent algorithm by proximal point method: • Iteratively optimize a smoothed objective, Local utility function: Global utility function: Multiplicative Additive Maximum expected utility (MEU): Experiments Augmented distribution: Diagnostic network (UAI08 inference challenge): • Perfect recall: • Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) • Sum-max-sum rule (dynamical programming): • Perfect recall is unrealistic: memory limit, decentralized systems • Imperfect recall: • No closed form solution • Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality Decentralized Sensor network: Toy example: Imperfect recall Perfect recall 1

More Related