3.8k likes | 4.06k Views
Apache Mesos D esign Decisions. mesos.apache.org @ ApacheMesos. Benjamin Hindman – @ benh. t his is not a talk about YARN. at least not explicitly!. this talk is about Mesos !. a little history.
E N D
Apache MesosDesign Decisions mesos.apache.org @ApacheMesos Benjamin Hindman – @benh
a little history • Mesos started as a research project at Berkeley in early 2009 by Benjamin Hindman, Andy Konwinski, MateiZaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica
our motivation increase performance and utilization of clusters
our intuition static partitioning considered harmful
static partitioning considered harmful datacenter
static partitioning considered harmful higher utilization!
our intuition build new frameworks
Apache Mesos is a distributed systemfor running and building other distributed systems
Mesos replaces static partitioning of resources to frameworks withdynamic resource allocation
Mesos is a distributed system with amaster/slave architecture masters slaves
frameworks register with the Mesos master in order to run jobs/tasks frameworks masters slaves
Mesos @Twitter in early 2010goal: run long-running services elastically on Mesos
Apache Aurora (incubating) Aurora is a Mesos framework that makes it easy to launch services written in Ruby, Java, Scala, Python, Go, etc! masters
Storm, Jenkins, … masters
design decisions • two-level scheduling and resource offers • fair-sharing and revocable resources • high-availability and fault-tolerance • execution and isolation • C++
design decisions • two-level scheduling and resource offers • fair-sharing and revocable resources • high-availability and fault-tolerance • execution and isolation • C++
frameworks get allocated resources from the masters framework resources are allocated via resource offers a resource offer represents a snapshot of available resources (one offer per host) that a framework can use to run tasks offer hostname 4 CPUs 4 GB RAM masters
frameworks use these resources to decide what tasks to run framework task 3CPUs 2GB RAM a task can use a subset of an offer masters
cluster manager status quo application the specification includes as much information as possible to assist the cluster manager in scheduling and execution specification cluster manager
cluster manager status quo wait for task to be executed application cluster manager
cluster manager status quo application result cluster manager
problems with specifications • hard to specify certain desires or constraints • hard to update specifications dynamically as tasks executed and finished/failed
an alternative model framework a request is purposely simplified subset of a specification, mainly including the required resources request 3CPUs 2GB RAM masters
question: what should Mesos do if it can’t satisfy a request?
question: what should Mesos do if it can’t satisfy a request? wait until it can…
question: what should Mesos do if it can’t satisfy a request? wait until it can … offerthe best it can immediately
question: what should Mesos do if it can’t satisfy a request? wait until it can … offerthe best it can immediately
an alternative model framework offer hostname 4 CPUs 4 GB RAM masters
an alternative model framework offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM masters
an alternative model framework offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM framework uses the offers to perform it’s own scheduling masters
an analogue:non-blocking sockets application write(s, buffer, size); kernel
an analogue:non-blocking sockets application 42 of 100 bytes written! kernel
IIUC, even YARN allocates “the best it can” to an application when it can’t satisfy a request
offers representthe currently available resources a framework can use