1 / 64

A Statistical Scheduling Technique for a Computational Market Economy

A Statistical Scheduling Technique for a Computational Market Economy. Neal Sample Stanford University. Research Interests. Compositional Computing (GRID) Reliability and Quality of Service Value-based and model-based mediation Languages: “Programming for the non-programmer expert”

aldona
Download Presentation

A Statistical Scheduling Technique for a Computational Market Economy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Statistical Scheduling Technique for a Computational Market Economy Neal Sample Stanford University

  2. Research Interests • Compositional Computing (GRID) • Reliability and Quality of Service • Value-based and model-based mediation • Languages:“Programming for the non-programmer expert” • Database Research • Semistructured indexing and storage • Massive table/stream compression • Approximate algorithms for streaming data

  3. Why We’re Here Integration/Composition Coding 1970 1990 2010

  4. GRID: Commodity Computing

  5. GRID: Commodity Computing

  6. GRID: Commodity Computing Distributed Supercomputing High Throughput Collaborative On Demand (Chip design, cryptography) (FightAIDSAtHome, Nug30) (Data exploration, Education) Data Intensive (Large Hadron Collider) (Computer-in-the-loop)

  7. Composition of Large Services • Remote, autonomous • Services are not free • Fee ($), execution time • 2nd order dependencies • “Open Service Model” • Principles: GRID, CHAIMS • Protocols: UDDI, IETF SLP • Runtime: Globus, CPAM

  8. Grid Life is Tough • Increased complexity throughout • New tools and applications • Diverse resources such as computers, storage media, networks, sensors • Programming • Control flow & data flow separation • Service mediation • Infrastructure • Resource discovery, brokering, monitoring • Security/authorization • Payment mechanisms

  9. Our GRID Contributions • Programming models and tools • System architecture • Resource management • Instrumentation and performance analysis • Network protocols and infrastructure • Service mediation

  10. Other GRID Research Areas • The nature of applications • Algorithms and problem solving methods • Security, payment/escrow, reputation • End systems • Programming models and tools • System architecture • Resource management • Instrumentation and performance analysis • Network protocols and infrastructure • Service mediation

  11. Roadmap • Brief introduction to CLAM language • Some related scheduling methods • Surety-based scheduling • Sample program • Monitoring • Rescheduling • Results • A few future directions

  12. CLAM Composition Language • Decomposition of CALL-statement • Parallelism by asynchrony in sequential program • Reduction of complexity of invoke statements • Control of new GRID requirements(estimation, trading, brokering, etc.) • Abstract out data flow • Mediation for data flow control and optimization • Extraction model mediation • Purely compositional • No primitives for arithmetic • No primitives for input/output • Targets the “non-programmer expert”

  13. CLAM Primitives Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: service cost estimation Invocation and result gathering: INVOKE EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service

  14. Resources + Scheduling • Computational Model • Multithreading • Automatic parallelization • Resource Management • Process creation • OS signal delivery • OS scheduling end system

  15. Resources + Scheduling • Computational Model • Synchronous communication • Distributed shared memory • Resource Management • Parallel process creation • Gang scheduling • OS-level signal propagation cluster end system

  16. Resources + Scheduling • Computational Model • Client/server • Loosely synchronous: pipelines • IWIM • Resource Management • Resource discovery • Signal distribution networks intranet cluster end system

  17. Resources + Scheduling • Computational Model • Collaborative systems • Remote control • Data mining • Resource Management • Brokers • Trading • Mobile code negotiation Internet intranet cluster end system

  18. Scheduling Difficulties • Adaptation: Repair and Reschedule • Schedules for T0 are only guesses • Estimates for multiple stages may become invalid • => Schedules must be revised during runtime schedule hazard reschedule work work work t0 TIME tfinish

  19. Scheduling Difficulties • Service Autonomy: No Resource Allocation • The scheduler does not handle resource allocation • Users observe resources without control • Means: Competing objectives have orthogonal scheduling techniques • Changing goals for tasks or users means vastly increased scheduling complexity

  20. Some Related Work R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution

  21. Some Related Work A M Q PERT R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution

  22. Some Related Work A R M A Q M PERT CPM R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution

  23. Some Related Work R A R M M A Q Q M ePERT(AT&T)Condor (Wisconsin) PERT CPM R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution

  24. Some Related Work R A R R M M A A Q Q M Q ePERT(AT&T)Condor (Wisconsin) PERT CPM Mariposa(UCB) R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution

  25. Some Related Work R R A R R A M M A A M Q Q M Q Q ePERT(AT&T)Condor (Wisconsin) PERT CPM Mariposa(UCB) SBS(Stanford) R M Rescheduling Monitoring Execution A Q Autonomy of Services QoS, probabilistic execution

  26. Sample Program A B C D

  27. Budgeting • Time • Maximum allowable execution time • Expense • Funding available to lease services • Surety • Goal: schedule probability of success • Assessment technique

  28. Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C

  29. Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C

  30. Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C

  31. Program Schedule as a Template • Instantiated at runtime • Service provider selection, etc. A A A A B A B B B B B D D D C C C D C C D C

  32. t0 Schedule Selection • Guided by runtime “bids” • Constrained by budget A A A A B A 7±2h $50 B B B B B 6±1h $40 D D D C C C D C C D C 5±2h $30 3±1h $30

  33. t0 Schedule Constraints • Budget • Time: upper bound - e.g. 22h • Cost: upper bound - e.g. $250 • Surety: lower bound - e.g. 90% • {Time, Cost, Surety} = {22, 250, 90} • Steered by user preferences/weights • <Time, Cost, Surety> = <10, 1, 5> • Selection • S1est [20, 150, 90] = (22-20)*10 + (250-150)*1 + (90-90)*5 = 120 • S2est [22, 175, 95] = (22-22)*10 + (250-175)*1 + (95-90)*5 = 100 • S3est [18, 190, 96] = (22-18)*10 + (250-190)*1 + (96-90)*5 = 130

  34. Search Space Plans Pareto budget cost Budget User Pref. budget time Expected Program Cost 0 0 Expected Program Execution Time

  35. Program Evaluation and Review Technique • Service times:most likely(m), optimistic(a) and pessimistic(b) (1) expected duration (service) (2) standard deviation (3) expected duration (program) and (4) test value ; (5) expectation test (6) ~expectation test N(0, 1) 

  36. t0 Complete Schedule Properties userspecifiedsurety Bank = $100 deadline Probability Density Probable Program Completion Time

  37. Individual Service Properties probability density 1.2 1.2 1.2 0 0 0 ~finish time 0 10 A 7±2h B 6±1h C 5±2h

  38. t0 Combined Service Properties Surety(90%) Deadline(22h) 1 1.2 1.2 1.2 0 probable finish time 14 23 0 0 0 Current Surety(99.6%) probability density probability density ~finish time 0 10

  39. Tracking Surety probabilitydensity User-specifiedsurety 100 90 surety % 80

  40. Runtime Hazards • With control over resource allocation or without runtime hazards • Scheduling becomes much easier • Runtime implies t0 schedule invalidation • Sample hazards • Delays and slowdowns • Stoppages • Inaccurate estimations • Communication loss • Competitive displacement… OSM

  41. Progressive Hazard Definition + Detection 100 serviceAstart serviceBstart minimumsurety surety % 90 hazard (serviceB slow) 80  0 execution time

  42. Catastrophic Hazard Definition + Detection 100 serviceAstart serviceBstart minimumsurety surety % 90 0% hazard (serviceB fails) 80  0 execution time

  43. Pseudo-Hazard Definition + Detection 100 serviceAstart serviceBstart minimumsurety surety % 90 pseudo-hazard 0% (serviceB communication failure) 80  0 execution time

  44. Monitoring + Repair • Observe, not control • Complete set of repairs • Sufficient (not minimal) • Simple cost model: early termination = linear cost recovery • Greedy selection of single repair -O(s*r) A B C D

  45. Schedule Repair 100 A 90 surety % B thazard trepair 80 C D execution time  0

  46. Strategy 0: baseline (no repair) • pro: no additional $ cost • pro: ideal solution for partitioning hazards • con: depends on self-recovery 100 A 90 surety % B thazard trepair 80 C D execution time  0

  47. Strategy 1: service replacement • pro: reduces $ lost • con: lost investment of $ and time • con: concedes recovery chance 100 A 90 surety % B’ B thazard trepair 80 C D execution time  0

  48. Strategy 2: service duplication • pro: larger boost surety; leverages recovery chance • con: large $ cost 100 A 90 surety % B’ B thazard trepair 80 C D execution time  0

  49. Strategy 3: pushdown repair • pro: cheap, no $ lost • pro: no time lost • con: cannot handle catastrophic hazards • con: requires recovery chance 100 A 90 surety % B x thazard trepair 80 C D execution time  0 C’

  50. Experimental Results • Rescheduling options • Baseline: no repairs • Single strategy repairs • Limits flexibility and effectiveness • Use all strategies • Setup • 1000 random DAG schedules, 2-10 services • 1-3 hazards per execution • Fixed service availability • All schedules are repairable

More Related