1 / 26

Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering

Twenty Second Conference on Artificial Intelligence (AAAI’07). Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering. Yifeng Zeng Aalborg University Denmark. Prashant Doshi Univ. of Georgia USA. Qiongyu Chen National University of Singapore. Outline.

Download Presentation

Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Twenty Second Conference on Artificial Intelligence (AAAI’07) Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering Yifeng Zeng Aalborg University Denmark Prashant Doshi Univ. of Georgia USA Qiongyu Chen National University of Singapore

  2. Outline • Interactive Dynamic Influence Diagrams (I-DIDs) • Curses of History and Dimensionality • Model Clustering • Computational Savings and Error Bound • Experimental Results

  3. Interactive Dynamic Influence Diagrams (I-DIDs) (Doshi et al. AAMAS’07) • Graphical models for decision-making in multiagent settings • Sequential decision-making over multiple time steps in multiagent settings • Generalize dynamic IDs to multiagent domains • Differ from MAIDs (Koller&Milch01) and NIDs (Gal&Pfeffer04) • Online solutions to I-POMDPs (Gmytrasiewicz&Doshi, JAIR’05) • Allow nested modeling of agents

  4. Aj Mj,l-1 Level l I-ID Overview of I-ID Ri Ai • A generic level l Interactive-ID (I-ID) for agent i situated with one other agent j • Model Node: Mj,l-1 • Models of agent j at level l-1 • Policy link: dashed line • Distribution over the other agent’s actions given its models • Beliefs on Mj,l-1 • P(Mj,l-1|s) • Update? S Oi

  5. Details of the Model Node • Members of the model node • Different chance nodes are solutions of models mj,l-1 • Mod[Mj] represents the different models of agent j • CPT of the chance node Aj is a multiplexer • Assumes the distribution of each of the action nodes (Aj1, Aj2) depending on the value of Mod[Mj] Mj,l-1 Aj S Mod[Mj] mj,l-11 Aj1 mj,l-11, mj,l-12 could be I-IDs or IDs mj,l-12 Aj2

  6. Ri Ait+1 St+1 Ajt+1 Oit+1 Mj,l-1t+1 Interactive Dynamic Influence Diagrams (I-DIDs) Ri Ait Ajt St Oit Mj,l-1t Model Update Link

  7. Semantics of Model Update Link Ajt+1 Mj,l-1t+1 Ajt st+1 Mj,l-1t Mod[Mjt+1] st mj,l-1t+1,1 Aj1 Mod[Mjt] mj,l-1t+1,2 Oj Aj2 mj,l-1t+1,3 mj,l-1t,1 Aj3 Aj1 Oj1 mj,l-1t+1,4 mj,l-1t,2 Aj4 Aj2 Oj2 These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations

  8. Curse of history of agent j Curses of History and Dimensionality • Primary complexity of solving I-DIDs is due to the large number of models that must be solved over time Curse of dimensionality • At time step t: • Nested property of modeling • More Agents • N+1 agent setting: (NM)l models (M is bounded # of models at each level)

  9. Model Clustering • Idea: Prune the model space to K representative models from M candidate models, K << M, at each time step • Approach • Cluster Models • k-means clustering method (MacQueen67) • Note: k is not equal to K • Clusters contain models that are likely behaviorally equivalent • Select Krepresentative models from the clusters

  10. Selection of Initial Means • Facilitate clustering of behaviorally equivalent models • Behaviorally equivalent regions • Prescribe the same optimal behavior for j • [0,0.1], [0.1,0.9], [0.9,1] • Select region boundary points as initial means • 0, 0.1, 0.9, 1 10 -1 Value L OL OR 1 0 0.1 0.9 P(TR) Sensitivity points

  11. Selection of Initial Means • Sensitivity points • Models that induce policies that are different from those by surrounding models • Vertices of the belief simplex • One dimension: 0, 1 • Two dimensions: [0,0], [0,1],[1,0], and [1,1]

  12. LP for Computing Sensitivity Points SPs are non-dominated points on intersections between value functions SP Non-dominated Intersection

  13. Example of Iterative Clustering P(TR) 0.1 0.9 0 1 Initial Means Iteration 1 . . . . . . Iteration n Select K=10

  14. Cluster models Re-compute means K Model Selection Algorithm Clustering Select Initial Means Selection Compute SPs Select K nearest models

  15. Approximate Solution of I-DID • Exact algorithm • Expansion phase • Expand all M models over time • Look-ahead phase • Approximation – Modify exact algorithm • Prune model space using KModelSelection • Maintain only K models over time • Look-ahead phase

  16. Computational Savings and Error Bound • (NM)lV.S.(NK)l • Mgrows exponentially over time • Retain K models (Mk) and discard M-K models (M/K) • Error bounded by finding the model among the K retained models that is the closest to the discarded one (PBVI; Pineau et al. 03)

  17. Error Bound Let Error bound for agent j Expected error bound for agent i

  18. Empirical Results • Two Problem Domains • Multiagent tiger • Multiagent machine maintenance • Comparison with • Exact solution of I-DID for different M • Interactive particle filtering on I-DID • Measure • Average rewards solving the level 1 I-DIDs • Variance over 50 runs • Run time

  19. Run Time Comparison • Slower than the I-PF • Reason: convergence step • Solve I-DIDs up to 8 horizons

  20. Future Work • Variants of model clustering • Application domains • Compose our package for I-DIDs

  21. Thank You!

  22. Notes • Updated set of models at time step (t+1) will have at most models • :number of models at time step t • :largest space of actions • :largest space of observations • New distribution over the updated models uses • original distribution over the models • probability of the other agent performing the action, and • receiving the observation that led to the updated model

  23. One Example

  24. K Model Selection • Initial Means • Sensitivity points + Vertices of the belief simplex • Iteration • Re-compute the cluster mean • Assign new models to clusters • Selection • Select K models • Kn: In proportion to the size of cluster n

More Related