10 likes | 117 Views
This paper presents a novel variational framework aimed at solving influence diagrams (IDs), enhancing decision-making in structured environments. We introduce a junction graph belief propagation method tailored for IDs, featuring an intuitive understanding and robust theoretical guarantees. Our method includes a convergent double-loop algorithm, significantly outperforming established baseline algorithms. The work emphasizes the integration of policy evaluation and improvement, enabling more efficient maximum expected utility (MEU) algorithms across a variety of graphical models.
E N D
Belief Propagation for Structured Decision Making c1c4d1 c1c2d2 abc bcd ab c3d1 QiangLiu Alexander Ihler Department of Computer Science, University of California, Irvine c1 d1 e d abe ed d1 d1 d2 d2 c4d2d3 bc c2c3d3 c2 c1 d1 u d2 u c3 c2 d3 c3 Variational Framework for structured decision Abstract c4 d2 d3 Main result: • Variational inference methodssuch as loopy BP have revolutionized inference abilities on graphical models. • Influence diagrams (or decision networks)are extension of graphical models for representing structured decision making problems. • Our contribution: • A general variational framework for solving influence diagrams • A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm • Significant empirical improvement over the baseline algorithm c4 u Causes policies to be deterministic If is the maximum, the optimal strategy is • Intuition: the last term encourages policies to be deterministic • Perfect recall convex optimization (easier) • Imperfect recall non-convex optimization (harder) • Significance: • Enables converting arbitrary variational methods to MEU algorithms • “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops) Graphical Models and Variational Methods b • Graphical models: • Factors & exponential family form • Graphical representations: Bayes nets, Markov random fields … • Inference: answering queries about graphical models Our Algorithms a c • Junction graph belief propagation for MEU: • Construct junction graph over the augmented distribution e d Augmented distribution (factor graph) Junction graph Influence diagram Decision cluster of d1 • e.g., calculating (log) partition function: Normal cluster • Variational methods: • Log-partition function duality: • Junction graph BP: approximating and • For each decision node , identify a unique cluster (called a decision cluster) that includes • Message passing algorithm ( ) Sum-messages (from normal clusters): : locally consistent polytope Bethe-Kikuchi approximation Loopy Junction graph MEU-messages (from decision clusters): Influence Diagram Forecast Weather • Influence diagram: • Chance nodes (C): Optimal policies: Activity Happiness Conditional probability: • Decision nodes (D): Decision rule: • Strong local optimality: provably better than SPU or • Utility nodes (U): • Convergent algorithm by proximal point method: • Iteratively optimize a smoothed objective, Local utility function: Global utility function: Multiplicative Additive Maximum expected utility (MEU): Experiments Augmented distribution: Diagnostic network (UAI08 inference challenge): • Perfect recall: • Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) • Sum-max-sum rule (dynamical programming): • Perfect recall is unrealistic: memory limit, decentralized systems • Imperfect recall: • No closed form solution • Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality Decentralized Sensor network: Toy example: Imperfect recall Perfect recall 1