1 / 28

A Multi-Agent Learning Approach to Online Distributed Resource Allocation

A Multi-Agent Learning Approach to Online Distributed Resource Allocation. Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University of Massachusetts Amherst. Focus.

elise
Download Presentation

A Multi-Agent Learning Approach to Online Distributed Resource Allocation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser PrashantShenoy Computer Science Department University of Massachusetts Amherst

  2. Focus • This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks. • Exploit unknown task arrival patterns • Problem characteristics: • Realistic • Multiple agents • Partial observability • No global reward signal • Communication delay • Two interacting learning problems

  3. Increasing Computing Demands • “Software as a service” is becomeing a popular IT business model. • Challenging to build large computing infrastructure to host such wide-spread online services.

  4. A Potentially Cost-Effective Solution • Shared clusters • Built using commodity PCs or workstations. • Running the number of applications significantly larger than the number of nodes Resource manager Resource manager … … A dedicated cluster A shared cluster [Arpacidusseau and Culler, 1997; Aron et al., 2000; Urgaonkar and Shenoy 2003]

  5. Building Larger, Scalable Computing Infrastructures • Centralized resource management limits the size of shared clusters. • Organizing shared clusters into a network and sharing resource across clusters. • How to efficiently share resources within a cluster network? Shared Cluster

  6. Outline • Problem Formulation • Fair Action Learning Algorithm • Learning Distributed Resource Allocation • Local Allocation Decision • Task Routing Decision • Experimental Results • Summary

  7. Problem Formulation • A distributed sequential resource allocation problem (DSRAP) is denoted as a tuple <C, A, T, R, B>: • C = {C1, …, Cm} is a set of agents (or clusters) • A = {aij}m x m is the adjacent matrix of agents and aijis the task transfer time from Ci to Cj • T = {t1, …, tl} is a set of task types • R = {R1, …, Rq} is a set of resource types • B ={Dij} l x m is the task arrival pattern and Dijis the arrival distribution of tasks of type tiat Cj

  8. Problem Description: Cluster Network C4 a24 Cluster a46 C8 C6 a68 C2 a25 a56 a8, 10 a12 C5 a69 C10 a9, 10 C1 a79 C9 a35 a13 a37 C7 … Computing node C3 Resource R1 R2 R3

  9. Problem Description: Task • A task is denoted as a tuple <t, u, w, d1, … dq>, where • t is the task type • u is the utility rate of the task • w is the maximum waiting time before being allocated • di is the demand for resource i = 1, …, q.

  10. Problem Description: Task Type • A task type characterizes a set of tasks, each of whose feature components follows a common distribution. • A task type t is denoted as a tuple <Dts, Dtu, Dtw, Dtd1, … Dtdq>, where • Dtsis the task service time distribution • Dtuis the distribution of utility rate • Dtwis the distribution of the maximum waiting time • Dtdi is the distribution of the demand for resource i = 1, …, q.

  11. Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making Task Routing Decision-Making T4 T3 T2 Existing cluster resource scheduling algorithm

  12. Problem Goal • The main goal is to derive decision policies for each agent that maximize the average utility rate (AUR) of the whole system. • Note that, due to its partial view of the system, each individual cluster can only observe its local utility rate, but not the system's utility rate.

  13. Multi-Agent Reinforcement Learning (MARL) • In a multi-agent setting, all agents are concurrently learning their policies. • The environment becomes non-stationary from the perspective of an individual agent. • Single-agent reinforcement learning algorithms may diverge due to lack of synchronization. • Several MARL algorithms are proposed. • GIGA, GIGA-Wolf, WPL, etc.

  14. Fair Action Learning (FAL) Algorithm • We usually don’t know the exact policy gradient used by GIGA in practical problems. • FAL is a direct policy search technique. • FAL is a variant of GIGA, using an easily-calculable, approximate policy gradient. Policy gradient GIGA’s normalization function

  15. Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making T4 T3 T2

  16. Local Task Allocation Decision-Making selected := Ø • Select a subset of received tasks to be allocated locally to maximize its local utility rate. • Potentially improve the global utility rate. • Use an incrementally selecting algorithm. allocable := getAllocable(tasks) t := selectTask(Allocable) Yes t = nil No selected := selected {t} tasks := tasks \ {t} Return selected learn()

  17. Learning Local Task Allocation • Learning model: • State: features describing both tasks to be allocated and availability of local resources • Action: selecting a task • Reward for selecting task a at state s • Due to partial observation, each agent uses FAL to learn a stochastic policy. • Q-learning is used to update the value function

  18. Accelerating the Learning Process • Reasons • Extremely large policy search space • Non-stationary learning environment • Avoid poor initial policies in practical systems • Techniques • Initialize policies with a greedy allocation algorithm • Set utilization threshold for conducting ε-greedy exploration • Limit the exploration rate for selecting nil task

  19. Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making T4 T3 T2

  20. Task Routing Decision-Making • To which neighbor should an agent forward an unallocated task to get it to an unsaturated cluster before it expires? • Learn to route tasks via interacting with its neighbors • The learning objective is to maximize the probability of each task to be allocated in the system. C4 C2 C6 Task C5 C1 C3

  21. Learning Task Routing • State sx is defined by the characteristics of the current task x that an agent is forwarding. • An actionj corresponds to choosing neighbor j for forwarding a task. • Reward is the allocation probability of task x forwarded to neighbor j: Routing policy of j The probability that j allocates x locally The allocation probability of x forwarded by j

  22. Learning Task Routing (cont.) • The local allocation probability • Qi(sx, j)is the expected probability that the task x will be allocated if an agent i forwards it to its neighbor j. • Q-learning is used to update the value function. • FAL is used to learn the task routing policy.

  23. Dual Exploration r(sx, i) Qj(sx, i) x j i r(sx, j) Qi(sx, j) s d Forward Exploration Backward Exploration [Kumer and Miikkulainen, 1999]

  24. Experiments: Compared Approaches • Distributed approaches: • A centralized approach • using best-first algorithm with a global view • ignore the communication delay • sometimes generating optimal allocation

  25. Experimental Setup • Cluster network with heterogeneous clusters and heterogeneous computing nodes (total 1024 nodes) • Four types of tasks: ordinary, IO-intensive, compute-intensive, and demanding • Two task arrival patterns: light load and heavy load

  26. Experimental Result: Light Load

  27. Experimental Result: Heavy Load

  28. Summary • This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks for building large computing infrastructure. • Experimental results are encouraging. • This work plausibly suggests that MAL may be a promising approach to online optimization problems in distributed systems.

More Related