1 / 20

David Oppenheimer

A case for resource discovery in shared distributed platforms. David Oppenheimer. UCB ROC Retreat 12 January 2005. Introduction. Application performance is a function of resources available to the application resources needed by the application

kaveri
Download Presentation

David Oppenheimer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A case for resource discovery in shared distributed platforms David Oppenheimer UCB ROC Retreat12 January 2005

  2. Introduction • Application performance is a function of • resources available to the application • resources needed by the application • or, “application sensitivity to resource constraints” • At summer retreat, described SWORD • at app deployment time, find best set of nodes given • resources available on a set of distributed nodes • application sensitivity to resource constraints • assumptions • available resources vary among nodes enough to matter • spare CPU, mem, disk space; inter-node latency, avail. bw; ... • applications are sensitive to resource constraints enough to matter • Focus of this talk: verify assumption (1)

  3. Introduction (cont.) • Questions we will address • is there enough variation among nodes at any given (deployment) time to justify service placement? • is there enough variation over time on a single node to justify periodic task migration? • are there correlations between attributes on a single node, or among nodes at the same site? • All of these questions are important in designing a system for resource discovery and service placement (like SWORD)

  4. Outline • How much does the available amount of per-node resources vary among nodes at a fixed time? • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

  5. Experimental environment • Per-node attributes: Ganglia, CoMon • two-week period (Oct 10-Oct 24, 2004) • each node polled every 5 minutes • free memory, free swap, free disk, load average, network bytes sent and received/sec, # active slices • Inter-node latency: all-pairs pings • one month period ending Oct 24, 2004 • each pair of nodes measured every 15 minutes • Inter-node bandwidth: Iperf • one month period ending Oct 24, 2004 • each pair of nodes measured 1-2x/week • About 250 nodes in the trace each day

  6. Outline • How much does the available amount of per-node resources vary among nodes at a fixed time? • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

  7. Resource heterogeneity: averages • How much does available resources vary over the trace?

  8. Resource heterogeneity: averages • How much does available resources vary over the trace?

  9. Resource heterogeneity: CV vs. time

  10. Outline • How much does the available amount of per-node resources vary among nodes at a fixed time? • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

  11. Variability of per-node attributes over time

  12. Variability of per-node attributes over time

  13. Variability of per-node attributes over time

  14. Variability of per-node attributes over time • Can rank degree of variability of each attribute • disk, swap < mem, load < net bytes; #slices mod to sig. • CDF curve shifts to right as interval length incrs. • attributes vary less over short time periods than long • migration interval: find “sweet spot” in curve of variability vs. interval length • CDF slope decreases as median var. of attr. incr. • may be able to classify nodes as high/low var. over time for mem, load, net bytes (they have high median var.)

  15. Inter-node latency and BW variation over time • Most nodes have low latency (and bw) variability even over a month-long trace • migration may not be worthwhile

  16. Outline • How much does the available amount of per-node resources vary among nodes at a fixed time? • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

  17. Correlation among per-node attributes • No strong correlations between different attrs. • though some one-hour trace segments had some • Some correlation between nodes at same site

  18. Correlation between latency and avail BW r=-.59 • Moderate inverse power law correlation • Using latency to estimate BW gives 233% error • some nodes are bandwidth-capped, some in weird ways • Some node pairs showed strong lat-BW correlation • 17% within 25%, 56% within 50%

  19. Conclusion • How much does the available amount of per-node resources vary among nodes at a fixed time?significantly; enough to warrant svc. placement • How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time?moderate variability; may warrant migration • On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?no strong correlation between diff. attrs. some correlation between same attr, same site latency can predict avail. bandwidth

  20. Future work • Ask same questions but use application model to answer, rather than analysis of raw data • different apps have different resource sensitivities • different apps have different migration costs • Can we predict attribute values? • give warning before migration • or just don’t bother to deploy on “bad” nodes • How much “better” could we do if SWORD could schedule jobs?

More Related