1 / 27

Why static is bad!

Why static is bad!. Today: static partitioning. Want dynamic sharing. Hadoop. Pregel. Shared cluster. MPI. Comparing Sharing Frameworks: choice. Choice of resources Can a framework pick between all resources? A predefined subset? Or a random chosen subset? Why important?

lhampton
Download Presentation

Why static is bad!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why static is bad! Today: static partitioning Want dynamic sharing Hadoop Pregel Shared cluster MPI

  2. Comparing Sharing Frameworks: choice • Choice of resources • Can a framework pick between all resources? • A predefined subset? • Or a random chosen subset? • Why important? • Policies may need to be global --localization • If you can preempt you can get your preference

  3. Comparing Sharing Frameworks: Interference • Can frameworks tray to use the same machines? • Can a framework pick between all resources? • How to avoid this? • Offer resources to machines one at a time • Statically partition • Offer in parallel and arbitrate when conflict arises.

  4. Comparing Sharing Frameworks: Granularity • Allocation Granularity • MPI tasks: gang-schedule, job can’t run until all slots are acquired. • Hadoop: elastic, job can start running when it allocates a few slots • Why important? • If gang-scheduling, then the framework will hoard until it gets all the slots it needs. • The cluster may or may not be underutilized. • Cluster-wide behaviors

  5. Mesos

  6. Other Benefits of Mesos • Run multiple instances of the same framework • Isolate production and experimental jobs • Run multiple versions of the framework concurrently • Build specialized frameworks targeting particular problem domains • Better performance than general-purpose abstractions

  7. Goals • High utilization of resources • Support diverse frameworks (current & future) • Scalability to 10,000’s of nodes • Reliability in face of failures Resulting design: Small microkernel-like core that pushes scheduling logic to frameworks

  8. Design Elements • Fine-grained sharing: • Allocation at the level of tasks within a job • Improves utilization, latency, and data locality • Resource offers: • Simple, scalable application-controlled scheduling mechanism

  9. Element 1: Fine-Grained Sharing Coarse-Grained Sharing (HPC): Fine-Grained Sharing (Mesos): Fw. 3 Fw. 3 Fw. 2 Fw. 1 Framework 1 Fw. 1 Fw. 2 Fw. 2 Fw. 1 Fw. 3 Fw. 2 Fw. 1 Framework 2 Fw. 3 Fw. 3 Fw. 2 Fw. 2 Fw. 2 Fw. 3 Framework 3 Fw. 1 Fw. 3 Fw. 1 Fw. 2 Storage System (e.g. HDFS) Storage System (e.g. HDFS) + Improved utilization, responsiveness, data locality

  10. Element 2: Resource Offers • Option: Global scheduler • Frameworks express needs in a specification language, global scheduler matches them to resources + Can make optimal decisions • – Complex: language must support all framework needs – Difficult to scale and to make robust – Future frameworks may have unanticipated needs

  11. Element 2: Resource Offers • Mesos: Resource offers • Offer available resources to frameworks, let them pick which resources to use and which tasks to launch • Keeps Mesos simple, lets it support future frameworks • Decentralized decisions might not be optimal

  12. Mesos Architecture MPI job Hadoop job MPI scheduler Hadoop scheduler Pick framework to offer resources to Mesos master Allocation module Resource offer Mesos slave Mesos slave MPI executor MPI executor task task

  13. Mesos Architecture MPI job Hadoop job MPI scheduler Hadoop scheduler Resource offer = list of (node, availableResources) E.g. { (node1, <2 CPUs, 4 GB>), (node2, <3 CPUs, 2 GB>) } Pick framework to offer resources to Mesos master Allocation module Resource offer Mesos slave Mesos slave MPI executor MPI executor task task

  14. Mesos Architecture MPI job Hadoop job Framework-specific scheduling MPI scheduler Hadoop scheduler task Pick framework to offer resources to Mesos master Allocation module Resource offer Launches and isolates executors Mesos slave Mesos slave MPI executor Hadoop executor MPI executor task task

  15. Drawbacks • Poor fairness • Jobs with long tasks can dominate • There is NO preemption!! • Sticky slots • Jobs with higher priority can dominate a set of preferred slots • Mesos uses lottery scheduling, probability of being offered a slot is proportional to the frameworks priority • Head of line blocking • Mesos offers resources one framework at a time • Prevents frameworks from trying to use the same slots • Based on assumptions: scheduling decisions are quick, • Mesos revokes offers if a schedules takes too long • Essentially leads to a queue

  16. Omega

  17. Omega • Scales • Central layer only does optimistic conflict resolution • No head of Line blocking • Allows for flexible and evolvable scheduling • Framework can implement any arbitrary form of scheduling • Each framework has global view • Frameworks can preempt each other

  18. Comparing Sharing Frameworks • Choice of resources • Interference • Allocation Granularity • Cluster-wide behaviors

  19. Comparing Frameworks

More Related