250 likes | 333 Views
This study delves into implementing interactive computing on grids through reinforcement learning, highlighting applications in various industries and the need for autonomic grids. It covers scheduling, rewards models, and experimental setups.
E N D
Grid Differentiated Services: a Reinforcement Learning Approach Julien Perez Laboratoire de Recherche en Informatique Université Paris-Sud / CNRS
Grids must offer “interactive” computing Interactive: guaranteed low latency CCGrid08
Grids must offer “interactive” computing • To be more than a niche • Traditional demanding applications in physics, genomics,… • Clinical medical image analysis • Digital libraries: on-line complex queries • Disaster management, on-line instruments, • Ubiquitous computing and ambient intelligence • Industry products for clusters • MathWorks is porting the DCT on EGEE/gLite CCGrid08
Institutional grids • Local control • Implicit scheduling policy as a result of the partially independent local decisions • Including fair share constraints • No time-slice CCGrid08
An example challenge • Dynamically reallocating resources to classes • Typical classes: VOs x simulation (best effort) and analysis (QoS) • Current status: independent “pull” strategies (pilot jobs, glide-in,…) • Manual reconfiguration excluded • Automatic reconfiguration: they tried hard… • A case for Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. (Kephart & Chess 2003) • Self-*: configuration, optimization, healing, protection CCGrid08
2003 2004 2005 2006 2007 2008 2009 2010 2011 GÉANT … upgrade GÉANT2 Grid infrastructures … reinforce … upgrade User involv. Data infrastruct. Supercomputer FP6 FP5 FP7 We need autonomic grids now! Ulf DAHLSTEN, European Commission, Information Society and Media Directorate-General, Directorate F - Emerging Technologies and Infrastructures CCGrid08
Reinforcement learning situations • Components • A stationary environment. unknown but observable. • The available actions • Associated rewards • Goal: maximize the expected long term benefit • Method: discover the optimal action-value function through directed trial and error Agent p(s,a) Action at State st Reward rt+1 Environment Pss’aRss’a Rt=Sgkrt+k+1 Qp(s,a)= Ep(Rt|st=s,at=a) CCGrid08
Scheduling: State and Actions • State: set of real (in R)variables measured in the system • Workload in the queue and in the cluster • VO distribution of jobs in the queue and the cluster • Resource status • No information about the arrival process • Action: job to schedule • Estimation of the action-value function Q • TD(0) temporal difference, with continuous estimation • Lookup tables would provide a poor approximation • Robust off-the shelf non-linear approximation: neural network (Rummery & Niranjan, 1994, Tesauro et al. 2007) • Re-training on each new example (vs. active learning) • Moving target: no guarantee of convergence • Off-line initial training with earliest deadline first policy CCGrid08
Rewards model • Time/Utility functions (TUF) • Utility is a function of the execution delay • Service classes are associated to functions • Jensen et al 85, Tesauro & Kephart 2004, Vengerov 2007 Hard real-time Soft real-time Best effort t deadline Fixed Proportional a: start date d: relative deadline CCGrid08
Rewards model • Fairness • Prescribed share w • Deficit distance D = maxk (wk –Sk)+ • Fairness utility 1- D/max(wk) • Policy if some VOs request less than their share • Fair excess allocation ? • “Greedy” allocation: use this slackness to favor responsiveness • The overall reward is the weighted sum of time utility and fairness utility CCGrid08
Outline • Motivation for grid Differentiated Services • The reinforcement learning framework • Experimental setup and results CCGrid08
Experimental setup • The simulation platform • Discrete event simulator • Plugin schedulers • Analysis tools • Matlab implementation • One step of the RL is 1 to 10 ms • Synthetic and real (EGEE) workloads CCGrid08
Synthetic experiments More detail in the CCGrid’08 paper CCGrid08
The EGEE workload • Torque logs of the LAL cluster 17-26 May 2006 • 100 servers (approximation) • VO = (.20, .12, .12, .06, .06, .09, .35) • Heavily dominated by short jobs • Jobs less than 15 mn are 62% of the total number of jobs, but less than 3% of the workload • An unknown proportion are SDJ • SDJ (Short Deadline Jobs) are executed immediately or rejected • Native scheduler: Maui/PBS with SDJ • The SDJ scheme cannot be outperformed • Challenge: get acceptable results for all interactive jobs CCGrid08
Performance cdf of the waiting time Acceptable, but not competitive with the SDJ CCGrid08
Performance CCGrid08
Performance waiting time native – waiting time RL Slow learning CCGrid08
Conclusion • Coping with the learning phase in unsteady systems: apprenticeship learning • Multi-objective multi-scale Reinforcement Learning CCGrid08
Current estimate Error New estimate The classical temporal difference algorithm • Very naïve • TD(0) • On-policy: a* is the actual action • Exploration-exploitation: e-greedy Current estimate Target CCGrid08
Temporal difference algorithm • TD(0) • Continuous estimation of Q() • Lookup tables would provide a poor approximation • Robust off-the shelf non-linear approximation: neural network (Rummery & Niranjan, 1994, Tesauro et al. 2007) • Re-training on each new example (vs. active learning) • Moving target: no guarantee of convergence • Off-line initial training with earliest deadline first policy CCGrid08
Synthetic experiments • Load Parameters • Poisson arrival with parameter l • Execution time exponential with parameter m • Utilization factor r =l/m • Maximum duration of interactive jobs w • Proportion of interactive jobs q • Number of servers P • Fair Share parameters • Target fair share configuration • Actual distribution: the target may be feasible or not • Policies: FIFO and RL CCGrid08
Performance: feasible schedule Interactive jobs: cdf of the waiting time for interactive jobs r =.99 P = 50 w = (.7, .2, .05, .05) Feasible schedule 5000 jobs More than 90% do no wait more than 2 minutes CCGrid08
Performance: feasible schedule Mean and std of the waiting time RL does not starve batch jobs CCGrid08
Performance: feasible schedule Dynamics of the fair share 3% off the optimum Reasonably fast convergence at the grid time scale (fairness-wise) CCGrid08
Performance: unfeasible schedule Dynamics of the fair share Target w= (.7, .2, .05, .05) Actual w = (.4, .2, .2, .2) RL and FIFO very close to the optimum CCGrid08