slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Large Scale File Distribution Troy Raeder & Tanya Peters PowerPoint Presentation
Download Presentation
Large Scale File Distribution Troy Raeder & Tanya Peters

Loading in 2 Seconds...

play fullscreen
1 / 7

Large Scale File Distribution Troy Raeder & Tanya Peters - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

Large Scale File Distribution Troy Raeder & Tanya Peters. The Problem. Distribute a large file to some number of machines useful to deploy new programs, distribute data Chirp_distribute was implemented last year and distribute files using a spanning tree

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Large Scale File Distribution Troy Raeder & Tanya Peters' - sal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the problem
The Problem
  • Distribute a large file to some number of machines
    • useful to deploy new programs, distribute data
    • Chirp_distribute was implemented last year and distribute files using a spanning tree
  • Want to improve upon the existing methods to transfer files more efficiently.
    • Choke points exist – multiple machines will all transfer files through a single router/switch
    • Minimizing failures, including permissions errors
the solution
The Solution
  • Take advantage of network topology – transfer across routers and switches as soon as possible, and then machines in the same cluster transfer to each other.
    • Using traceroute, we build a graph that represents the network. This is done as needed and saved in a file which is loaded at run time.
  • Access Control Lists: if we know a source machine doesn’t have permissions to transfer to some target, don’t even try
picking a target
Picking a Target:
  • Check if all clusters in the graph contain a copy of the file.
    • If some cluster does not, we copy to it.
    • Next, if some node within your cluster doesn't have the file, transfer to it.
    • Otherwise, pick some other node that doesn't have the file.
  • If a node is unable to transfer to nodes that don't have the file yet, it is removed from the list of possible sources.
initial results
Initial Results
  • Current version of algorithm doesn’t always do better
    • As expected, for smaller files and/or smaller number of hosts, overhead costs us
    • For larger files and/or number of hosts, things like timeouts can wash out relative gains.
what s next
What's Next...
  • Pick source & target more intelligently
  • If initial attempt to copy from some cluster A to cluster B fails, don't try transferring between these two clusters again unless no other possibilities exist.
  • Try and manage straggler transfers
    • Dynamically set timeout for transferring a single copy: set to some multiple of max or average transfer time seen so far.
  • The end result hopefully that we have a significant improvement over existing algorithm