1 / 66

Learning in Multiagent Systems

This seminar explores the concept of learning in multiagent systems, including general characterization, coordination, learning from other agents, and communication. It also discusses the credit-assignment problem and different approaches to learning. Presented by Michael Weinberg from The Hebrew University in Jerusalem, Israel in March 2003.

Download Presentation

Learning in Multiagent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning in Multiagent Systems Advanced AI Seminar Michael Weinberg The Hebrew University in Jerusalem, Israel March 2003

  2. Agenda • What is learning in MAS? • General Characterization • Learning and Activity Coordination • Learning about and from Other Agents • Learning and Communication • Conclusions Advanced AI Seminar, March 2003

  3. What is Learning • Learning can be informally defined as: • The acquisition of new knowledge and motor or cognitive skills and the incorporation of the acquired knowledge and skills in future system activities, provided that this acquisition and incorporation is conducted by the system itself and leads to an improvement in its performance Advanced AI Seminar, March 2003

  4. Learning in Multiagent Systems • Intersection of DAI and ML • Why bring them together? • There is a strong need to equip Multiagent systems with learning abilities • The extended view of ML as Multiagent learning is qualitatively different from traditional ML and can lead to novel ML techniques and algorithms Advanced AI Seminar, March 2003

  5. Agenda • What is learning in MAS? • General Characterization • Learning and Activity Coordination • Learning about and from Other Agents • Learning and Communication • Conclusions Advanced AI Seminar, March 2003

  6. General Characterization • Principal categories of learning • The features in which learning approaches may differ • The fundamental learning problem known as the credit-assignment problem Advanced AI Seminar, March 2003

  7. Principal Categories • Centralized Learning (isolated learning) • Learning executed by a single agent, no interaction with other agents • Several centralized learners may try to obtain different or identical goals at the same time Advanced AI Seminar, March 2003

  8. Principal Categories • Decentralized Learning (interactive learning) • Several agents are engaged in the same learning process • Several groups of agents may try to obtain different or identical learning goals at the same time • Single agent may be involved in several centralized/decentralized learning processes at the same time Advanced AI Seminar, March 2003

  9. Differencing Features:The degree of decentralization • The degree of decentralization • Distributedness • Parallelism Advanced AI Seminar, March 2003

  10. Differencing Features:Interaction-specific features • Classification of the interactions required for realizing a decentralized learning process: • The level of interaction • The persistence of interaction • The frequency of interaction • The variability of interaction Advanced AI Seminar, March 2003

  11. Differencing Features:Involvement-specific features • Features that characterize the involvement of an agent into a learning process: • The relevance of involvement • The role played during involvement Advanced AI Seminar, March 2003

  12. Differencing Features:Goal-specific features • Features that characterize the learning goal: • Type of improvement that is tried to be achieved by learning • Compatibility of the learning goals pursued by the agents Advanced AI Seminar, March 2003

  13. Differencing Features:The learning method • The following learning methods are distinguished: • Rote learning • Learning from instruction and by advice taking • Learning from examples and by practice • Learning by analogy • Learning by discovery • The main difference is in the required amount of learning efforts Advanced AI Seminar, March 2003

  14. Differencing Features:The learning feedback • The learning feedback indicates the performance level achieved so far • The following learning feedbacks are distinguished: • Supervised learning (teacher) • Reinforcement learning (critic) • Unsupervised learning (observer) Advanced AI Seminar, March 2003

  15. The Credit-Assignment Problem • The problem of properly assigning feedback for an overall performance change to each of the system activities that contributed to that change • Can be usefully decomposed into two sub-problems: • The inter-agent CAP • The intra-agent CAP Advanced AI Seminar, March 2003

  16. The inter-agent CAP • Assignment of credit or blame for an overall performance change to the external actions of the agents Advanced AI Seminar, March 2003

  17. The intra-agent CAP • Assignment of credit or blame for a particular external action of an agent to its underlying internal inferences and decisions Advanced AI Seminar, March 2003

  18. Agenda • What is learning in MAS? • General Characterization • Learning and Activity Coordination • Learning about and from Other Agents • Learning and Communication • Conclusions Advanced AI Seminar, March 2003

  19. Learning and Activity Coordination • Previous research on coordination focused on off-line design of behavioral rules, negotiation protocols, etc… • Agents operating in open, dynamic environments must be able to adapt to changing demands and opportunities • How can agents learn to appropriately coordinate their activities? Advanced AI Seminar, March 2003

  20. Reinforcement Learning • Agents choose the next action so as to maximize a scalar reinforcement or feedback received after each action • The learner’s environment can be modeled by a discrete time, finite state, Markov Decision Process (MDP) Advanced AI Seminar, March 2003

  21. Markov Decision Process (MDP) • MDP - Reinforcement Learning task that satisfies the Markov state property • Markov State satisfies Advanced AI Seminar, March 2003

  22. Reinforcement Learning (cont) • The environment in MDP represented by a 4-tuple <S,A,P,r > • is a set of states • is a set of actions • Each agent maintains a policy that maps states into desirable actions Advanced AI Seminar, March 2003

  23. Q-Learning Algorithm • Reinforcement Learning algorithm • Maintains a table of Q-values • Q(x,a) –“how good action a is in state x”? • Converges to the optimum Q-values with probability 1 Advanced AI Seminar, March 2003

  24. Q-Learning Algorithm (cont) • At step n the agent performs the following steps: • Observe its current state • Select and perform action • Observe the subsequent state • Receive immediate payoff • Adjust values Advanced AI Seminar, March 2003

  25. Discounted Sum of Future Rewards • Q-Learning finds an optimal policy that maximizes total discounted expected reward • Discounted reward – reward received s steps hence are worth less than reward received now by a factor of Advanced AI Seminar, March 2003

  26. Evaluating the Policy • Under policy the value of state x is: • The optimal policy satisfies: Advanced AI Seminar, March 2003

  27. Q-Values • Under policy define Q-values as: • Executing action and following policy thereafter Advanced AI Seminar, March 2003

  28. Adjusting Q-Values • Update Q-values as following: • If and • Otherwise • Where Advanced AI Seminar, March 2003

  29. Isolated, Concurrent Reinforcement Learners • Reinforcement learners develop action selection policies that optimize environmental feedback • Can be used in domains • With no pre-existing domain expertise • With no information about other agents • RL can be used as new coordination techniques where currently available coordination schemes are ineffective Advanced AI Seminar, March 2003

  30. Isolated, Concurrent Reinforcement Learners • Each agent learns to optimize its reinforcement from the environment • Other agents are not explicitly modeled • An interesting research question is whether it is feasible for such an agent to use the same learning mechanism in both cooperative and non-cooperative environments Advanced AI Seminar, March 2003

  31. Isolated, Concurrent Reinforcement Learners • An assumption of most RL techniques is that the dynamics of the environment is not affected by other agencies • This assumption is invalid in domains with multiple,concurrent learners • Standard RL is probably not adequate for concurrent, isolated learning of coordination Advanced AI Seminar, March 2003

  32. Isolated, Concurrent Reinforcement Learners • The following dimensions were identified to characterized domains amenable to CIRL: • Agent coupling (tightly/loosely) • Agent relationships (cooperative/adversarial) • Feedback timing (immediate/delayed) • Optimal behavior combinations Advanced AI Seminar, March 2003

  33. Experiments with CIRL • Conclusions: • Through CIRL both friends and foes can concurrently acquire useful coordination info • No prior knowledge of the domain needed • No explicit model of the capabilities of other agents is required • Limitations: • Inability to develop effective coordination when agents are strongly coupled, feedback is delayed and there are only few optimal behavior combinations Advanced AI Seminar, March 2003

  34. Experiments with CIRL • A possible fix to the last limitation is “lock-step learning”: • Two agents synchronize their behavior so that one is learning while the other is following a fixed policy and vice versa Advanced AI Seminar, March 2003

  35. Interactive Reinforcement Learning of Coordination • Agents can explicitly communicate to decide on individual and group actions • Few algorithms for Interactive RL: • Action Estimation Algorithm • Action Group Estimation Algorithm Advanced AI Seminar, March 2003

  36. Agenda • What is learning in MAS? • General Characterization • Learning and Activity Coordination • Learning about and from Other Agents • Learning and Communication • Conclusions Advanced AI Seminar, March 2003

  37. Learning about and from Other Agents • Agents learn to improve their individual performance • Better capitalize on available opportunities by prediction the behavior of other agents (preferences, strategies, intentions, etc…) Advanced AI Seminar, March 2003

  38. Learning Organizational Roles • Assume agents have the capability of playing one of several roles in a situation • Agents need to learn role assignments to effectively complement each other Advanced AI Seminar, March 2003

  39. Learning Organizational Roles • The framework includes Utility, Probability and Cost (UPC) estimates of a role adopted at a particular situation • Utility– desired final state’s worth if the agent adopted the given role in the current situation • Probability– likelihood of reaching a successful final state (given role/situation) • Cost– associated computational cost incurred • Potential– usefulness of a role in discovering pertinent global information Advanced AI Seminar, March 2003

  40. Learning Organizational Roles:Theoretical Framework • sets of situations and roles for agent k • An agent maintains vectors of UPC • During the learning phase: • rates a role by combining the component measures Advanced AI Seminar, March 2003

  41. Learning Organizational Roles:Theoretical Framework • After the learning phase is over, the role to be played in situation s is: • UPC values are learned using reinforcement learning • UPC estimates after n updates: Advanced AI Seminar, March 2003

  42. Learning Organizational Roles:Updating the Utility • S –the situations encountered between the time of adopting role r in situation s and reaching a final state F with utility • The utility values for all roles chosen in each of the situation in S are updated: Advanced AI Seminar, March 2003

  43. Learning Organizational Roles:Updating the Probability • - returns 1 if the given state is successful • The update rule for probability: Advanced AI Seminar, March 2003

  44. Learning Organizational Roles:Updating the Potential • - returns 1 if in the path to the final state, conflicts are detected and resolved by information exchange • The update rule for potential: Advanced AI Seminar, March 2003

  45. Learning Organizational Roles:Robotic Soccer Game • Most implementations of robotic soccer teams use the approach of learning organizational roles • Use layered learning methodology: • Low level skills (e.g. shoot the ball) • High level decision making (e.g. who to pass to) Advanced AI Seminar, March 2003

  46. Learning in Market Environments • Buyers and sellers trade in electronic marketplaces • Three types of agents: • 0-level agents: don’t model the behaviour of others • 1-level agents: model others as 0-level agents • 2-level agents: model others as 1-level agents Advanced AI Seminar, March 2003

  47. Learning to Exploit an Opponent:Model-Based Approach • The most prominent approach in AI for developing playing strategies is the minimax algorithm • Assumes that the opponent will choose the worst move • An accurate model of the opponent can be used to develop better strategies Advanced AI Seminar, March 2003

  48. Learning to Exploit an OpponentModel-Based Approach • The main problem of RL is its slow convergence • Model based approach tries to reduce the number of interaction examples needed for learning • Perform deeper analysis of past interaction experience Advanced AI Seminar, March 2003

  49. Model Based Approach • The learning process is split into two separate stages: • Infer a model of the other agent based on past experience • Utilize the learned model for designing effective interaction strategy for the future Advanced AI Seminar, March 2003

  50. Inferring a Best-Response Strategy • Represent the opponent’s model as a DFA • Example: The TFT strategy for the IPD game • Theorem: Given a DFA opponent model there exists a best response DFA that can be computed in time polynomial in Advanced AI Seminar, March 2003

More Related