1 / 25

DiskRouter: A Mechanism for High Performance Large Scale Data Transfers

DiskRouter: A Mechanism for High Performance Large Scale Data Transfers. Outline. Problem DiskRouter Overview Details Real life DiskRouters Experiments. Problem. SDSC to NCSA Bottleneck Bandwidth : 12.5 MBPS Latency 67 ms Transfer Rate got by applications for a 1GB file

imueller
Download Presentation

DiskRouter: A Mechanism for High Performance Large Scale Data Transfers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DiskRouter: A Mechanism for High Performance Large Scale Data Transfers

  2. Outline • Problem • DiskRouter Overview • Details • Real life DiskRouters • Experiments

  3. Problem SDSC to NCSA Bottleneck Bandwidth : 12.5 MBPS Latency 67 ms Transfer Rate got by applications for a 1GB file Scp : 0.66MBPS GridFTP(1 stream) : 0.85 MBPS GridFTP(10 streams) : 3.52 MBPS

  4. DiskRouter Overview • Mechanism to efficiently move large amounts of data (order of terabytes) • Uses disk as a buffer to aid in large scale data transfers • Application-level overlay network used for routing • Ability to use higher level knowledge for data movement

  5. A Simple Case A B A is transferring a large amount of data to B

  6. A Simple Case A B C DiskRouter C is an intermediate node between A and B

  7. A Simple Case with DiskRouter A B C With DiskRouter DiskRouter Without DiskRouter Improves performance when bandwidth fluctuation betweenA and Cis independent of the bandwidth fluctuation betweenC and B

  8. Data Mover/Distributed Cach e Sourcewrites to the closestDiskRouterandDestinationreceives it up from itsclosestDiskRouter Destination DiskRouter Cloud Source

  9. Outline • Problem • DiskRouter Overview • Details • Real life DiskRouters • Experiments

  10. Routing Between DiskRouters DiskRouter C DiskRouter A DiskRouter B Cneed not be in the path betweenAandB

  11. Network Monitoring • Uses ‘Pathrate’ for estimating network capacity • Performs actual transfers for measurement • Logging the data rate seen by different components • Generate network interface stats on the machines involved in the transfers

  12. Implementation Details • Uses multiple sockets and explicitly sets TCP buffer sizes • Overlaps disk I/O and socket I/O

  13. Client Side • Client library provided • Applications can call library functions for network I/O • Functions provided for common case file transfer (overlaps network I/O and disk I/O) • Third party transfer support

  14. Outline • Problem • DiskRouter Overview • Details • Real life DiskRouters • Experiments

  15. Real Life DiskRouters UW Milwaukee SDSC NCSA 90 Mbps 3.3 ms INFN Italy 411 Mbps 8 ms 518 Mbps 67 ms 94 Mbps 2.7 ms 30 Mbps 126.6 ms 90 Mbps 5.5 ms 514 Mbps 0.85 ms StarLight UW Madison MCS ANL

  16. Outline • Overview • Details • Real Life DiskRouters • Experiments

  17. Testing Multiroute UW Milwaukee 90 Mbps 3.3 ms 411 Mbps 8 ms 90 Mbps 5.5 ms StarLight UW Madison

  18. Multiroute Improves Performance Total Data into Starlight Data From Milwaukee Megabits/second Data From Madison

  19. SRB to Unitree Transfer Using Stork • Data movement from SDSC to NCSA via Starlight (3 TB of data had to be moved) • Integrated into Stork • Found significant performance gain

  20. Link between SDSC and NCSA SDSC NCSA 518 Mbps 67 ms 94 Mbps 2.7 ms StarLight

  21. Starlight DiskRouter Stats Data Inflow Data Outflow Memory Used Disk Used

  22. GridFTP vs DiskRouter End-to-End Data Rate Seen by Stork(MBPS) vs. Time DiskRouter GridFTP Megabytes/second

  23. A Glimpse of Performance Transfer of 1 GB file from SDSC (SanDiego) to NCSA (Urbana-Champaign) Tool Transfer Rate Scp 0.66 MBPS GridFTP(1 stream) 0.85 MBPS GridFTP(10 streams) 3.52 MBPS DiskRouter 10.77 MBPS

  24. Work In Progress • Computation on data streams in the DiskRouter • Ability to perform computation in the nodes attached locally to the DiskRouter • Working together with Stork to add intelligence to data movement

  25. Questions • Thanks for listening

More Related