Achieving Dependable Bulk Throughput in a Hybrid Network

Achieving Dependable Bulk Throughput in a Hybrid Network Guy Almes <almes@internet2.edu> Aaron Brown <aaronmb@udel.edu> Martin Swany <swany@cis.udel.edu> Joint Techs Meeting Univ Wisconsin -- 17 July 2006

Outline • Observations: • on user needs and technical opportunities • on TCP dynamics • Notion of a Session Layer • the obvious application • a stronger application • Phoebus as HOPI experiment • deployment • early performance results • Phoebus as an exemplar hybrid network

On User Needs • In a variety of cyberinfrastructure-intensive applications, dependable high-speed wide-area bulk data flows are of critical value • Examples: • Terabyte data sets in HPC applications • Data-intensive TeraGrid applications • Access to Sloan Digital Sky Survey and similar very large data collections • Also, we stress ‘dependable’ rather than ‘guaranteed’ performance • As science becomes more data-intensive, these needs will be prevalent in many science disciplines

On Technology Drivers • Network capacity increases, but user throughput increases more slowly • Source: DOE • The cause of this gap relates to TCP dynamics

On TCP Dynamics • Consider the Mathis Equation for Reno • Focus on bulk data flows over wide areas • How can we attack it? • Reduce non-congestive packet loss (a lot!) • Raise the MTU (but only helps if end-to-end!) • Improve TCP algorithms (e.g., FAST, Bic) • RTT is still a factor • Use end-to-end circuits • Decrease RTT??

Situation for running example

The Transport-Layer Gateway User Space Transport Network Data Link Session Session Physical Transport Transport Network Network Data Link Data Link Physical Physical • A session is the end-to-end chain of segment-specific transport connections • In our early work, each of these transport connections is a conventional TCP connection • Each transport-level gateway (depot) receives data from one connection and pipes it to the next connection in the chain

The Logistical Session Layer

Obvious Application • Place a depot half-way between hosts A and B, thus cut the RTT roughly in half • Bad news: only a small factor • Good news: it actually does more

Obvious Application: With one depot to reduce RTT

Stronger Application • Place one depot at HOPI node near the source, and another near the destination • Observe: • Abilene Measurement Infrastructure: • 2nd percentile: 950 Mb/s median: 980 Mb/s • MTU = 9000 bytes; loss is very low • Local infrastructure: • MTU and loss are good, but not always very good • but the RTT is very small • But with HOPI we can do even better

The HOPI Project • The Hybrid Optical and Packet Infrastructure Project (hopi.internet2.edu) • Leverage both the 10-Gb/s Abilene backbone and a 10-Gb/s lambda of NLR • Explore combining packet infrastructure with dynamically-provisioned lambdas

Stronger application: depots near each host Backbone: large RTT 9000-byte MTU very low non-congestive loss GigaPoP / Campus: very small RTT some 1500-byte MTU some non-congestive loss

Two Conjectures • Small RTT does effectively mask moderate imperfections in MTU and loss • End-to-end session throughput is (only a little less than) the minimum of component connection throughputs

Phoebus • Phoebus aims to narrow the performance gap by bringing revolutionary networks like HOPI to users • Phoebus is another name for the mythical Apollo in his role as the “sun god” • Phoebus stresses the ‘session’ concept to enable multiple network/transport infrastructures to be catenated • Phoebus builds on an earlier project called the Logistical Session Layer (LSL)

Experimental Phoebus Deployment • Place Phoebus depots at each HOPI node • Ingress/egress spans via ordinary Internet2/ Abilene IP infrastructure • Backbone span can use either/both of: • 10-Gb/s path through Abilene • dynamic 10-Gb/s lambda • Initial test user sites: • SDSC host with gigE connectivity • Columbia Univ host with gigE connectivity

Initial Performance Results • In very early tests: • SDSC to losa: about 900 Mb/s • losa to nycm: about 5.1 Gb/s • nycm to Columbia: about 900 Mb/s • direct: 380 ± 88 Mb/s • Phoebus: 762 ± 36 Mb/s • In later tests with a variety of file sizes, SDSC to losa performance became worse

Initial Performance Results

Initial Test Results • What about the three components? • SDSC to losa depot: 429-491 Mb/s • losa depot to nycm depot: 5.13-5.15 Gb/s • nycm depot to Columbia: 908-930 Mb/s • Whatever caused that weakness in the SDSC-to-losa path did slow things down

Plans for Summer 2006 • ‘Experimental production’ Phoebus, reaching out to interested users • Improve access control and instrumentation: • Maintain a log of achieved performance • Test use of dynamic HOPI lambdas • Evaluate Phoebus as a service within newnet • Test use of Phoebus internationally

Comments on Backbone Span • Backbone could ensure flow performance between pairs of backbone depots • Backbone could provide a Phoebus Service in addition to its “IP” service • Relatively easy to use dynamic lambdas within the backbone portion of the Phoebus infrastructure • Alternatively, the backbone portion could use IP, but a non-TCP transport protocol!

Comments on the Local (Ingress and Egress) Spans • Near ends, we have good, but not perfect, local/metro-area infrastructure • Relatively hard to deploy dynamic lambdas • Small RTTs allow high-speed TCP flows to be extended to many local sites in a scalable way

Thus, Phoebus leverages both: • innovative wide-area infrastructure and • conventional local-area infrastructure • Phoebus can thus extend the value of multi-lambda wide-area infrastructure to many science users on high-quality conventional campus networks

Ongoing Work • Phoebus deployment on HOPI • We’re seeking project participants! • Please email for information • ESP-NP • ESP = Extensible Session Protocol • Implementation on an IXP Network Processor from Intel • The IXP2800 can forward at 10 Gb/s

Acknowledgements • UD Students: • Aaron Brown, Matt Rein • Internet2: • Eric Boyd, Rick Summerhill, Matt Zekauskas, ... • HOPI Testbed Support Center (TSC) Team • MCNC, Indiana Univ NOC, Univ Maryland • San Diego Supercomputer Center: • Patricia Kovatch, Tony Vu • Columbia University: • Alan Crosswell, Megan Pengelly, the Unix group • Dept of Energy Office of Science: • MICS Early Career Principal Investigator program

End • Thank you for your attention • Questions?

Achieving Dependable Bulk Throughput in a Hybrid Network