1 / 12

Reducing Allocation Errors in Network Testbeds

GSS 2012 USC/ISI. IMC 2012 Boston, USA. Scenario Problem Statement Current Practice Improvements. Reducing Allocation Errors in Network Testbeds. National Science Foundation Grant No. 1049758. Jelena Mirkovi c Hao Shi Alefiya Hussain. *. +. Overview. Scenario

jeroen
Download Presentation

Reducing Allocation Errors in Network Testbeds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GSS 2012 USC/ISI IMC 2012 Boston, USA Scenario Problem Statement Current Practice Improvements Reducing Allocation Errors in Network Testbeds National Science Foundation Grant No. 1049758 Jelena Mirkovic Hao Shi Alefiya Hussain

  2. * + Overview • Scenario • What is a testbed and how people use it? • Problem Statement • Emulab-based practice • Allocation Errors • A great portion can be avoided • Improvement • Deterministic-search based method

  3. * + Scenario – an user case • How people launch multiple experiment instances in testbed

  4. + Scenario – features of resources • Limited quantities (until Jan 2011) • Heterogeneity: none of them has absolute advantages • Network Testbed Mapping Problem • how to allocate resources efficiently?

  5. * + Problem Statement – Illustration • Network Testbed Mapping Problem

  6. + Problem Statement – Goals/Challenges • Economize inter-switch bandwidth • Accommodate heterogeneous nodes • Maximize possibility for future mappings • Generate one solution in a timely fashion

  7. * + Current Practice – Emulab’s Algorithm (assign) • Simulated Annealing • A heuristic that performs a cost-function-guided exploration • Starts from a random solution and scores it using a cost function • Perturbs the solution using a generation function to find next one • If better: accept • If worse: accept with small possibility controlled by temperature • Cooling schedule converges algorithm to a single “best” solution • No guarantee that the best solution can be found

  8. + Current Practice – Performance • Allocation Errors • 11,176 TEMP errors (out of a total of 24,206 errors) • A huge space to improve!

  9. * + Our Strategy – assign+ • Deterministic fashion • Explore 5 possible solution spaces using expert knowledge of possible network testbed architecture • 1) PART: minimizes partitions in the virtual topology • 2) SCORE: minimizes the score of the allocation strategy • 3) ISW: prefers physical machine classes (pclasses) that have high-bandwidth inter-switch links • 4) PREF: prefers pclassesthat share a switch with pclasses, which host neighbors of the allocating node • 5) FRAG: tries to use the smallest number of pclasses • Choose the solution with lowest inter-switch bandwidth as best

  10. + Our Strategy – Evaluation • Reconstruct DeterLab state on Jan 1, 2011 • Use virtual topology and state snapshot data from file system • hardware types, OS supported, switch connectivity, … • 255 available machines in the pool • Replay all successful and failed allocations in 2011 • start time, end time, experiment size, … • Failed allocations: generate their duration based on past successful distribution • Keep only the first instance if overlapping

  11. + Our Strategy – Performance • Allocation failure rates and Running time

  12. * + Other key components in the paper • Relaxing virtual topology requirements can get better results • OS, node type, hardware, … • Most testbed usage patterns show heavy-tail distributions • experiment sizes, duration, … • due to human dynamics based on priorities • Potential improvements for allocation policy • Take-a-Break: release a long-running instance and queue it • Borrow-and-Return: borrow from long-running instance for 4 hours • For more details: • http://www-net.cs.umass.edu/imc2012/papers/p495.pdf • http://steel.isi.edu/TestbedUsageData

More Related