Software Multiagent Systems: CS543. Milind Tambe University of Southern California [email protected] Dimensions of Multiagent Learning. Ignore others’ learning vs Model others’ learning Cooperative vs Competitive Cooperative Learn to coordinate with others Learning organizational roles
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Q(i,a) = R(i) + SP(j|i,a)* max Q(j,a’)
Q(i,a) Q(i,a) + (R(i) + max Q( j, a’) – Q(i,a))
(a’ is the action chosen using f(Q(a’, j), N[a’, j]))
What will happen? Is this a problem?
f(Q(a’, j), N[a’, j])) = argmax G(Q(a’, j), N[a’, j]))
DCOP: Exploration + Exploitation (paper to be posted on the web site) [Jain et al IJCAI’09]
Stochastic games: Multiagent learning to reach N.E. (in our readings)
(with Lockheed ATL)
Assigning values to variables = Exploration
Exploration takes time (physical movement)
Limited time; full exploration impossible
What if 20 is max reward?
SE-optimistic: how will it work?
Generalize distributed POMDPs
Different payoffs for each player, not a common payoff
Focus on two person stochastic game
Learning algorithms for stochastic games
Reward function depends on the state!
v1(s, π1*, π2*) >= v1(s, π1, π2*)
v2(s, π1*, π2*) >= v2(s, π1*, π2)
Consider two agents:
Agents converging into the Nash equilibrium
Same team against different players
Player 1 (forward) and Player 10 (fullback) against CMUnited