Bulk Data Transfer Over End-to-end Lambda Network

Bulk Data Transfer Over End-to-end Lambda Network Gao Xiang 2011_3_2

References • 1, Empirical characterization of uncongested optical lambda networks and 10GbE commodity endpoints • Marian, T.; Freedman, D.A.; Birman, K.; Weatherspoon, H.; • 2010 IEEEIIFIP International Conference on Dependable Systems & Networks (DSN) • 2, End-System Aware, Rate-Adaptive Protocol for Network Transport in LambdaGrid Environments • Pallab Datta ; Wu-chun Feng ; Sushant Sharma ; • SC 2006 Conference, Proceedings of the ACM/IEEE • 3, Multi-Layer Lambda Grid With ExactBandwidth Provisioning Over Converged IP and Optical Networks • Tsukishima, Y. ; Sameshima, Y. ; Hirano, A. ; Jinno, M. ; Kudoh, T. ; Okazaki, F. ; • JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 27, NO. 12, JUNE 15, 2009

Paper One • Abstract • Examination of the end-to-end characteristics of an uncongested lambda network running at high speeds over long distances, identifying scenarios associated with loss, latency variations, and degraded throughput at attached end-hosts. • Key Words • end-to-end • uncongested lambda • NLR (Cornell National LambdaRail) • loss latency throughput

Problem • Impedance mismatch:End-host Network Interface Controllers (NICs) can not reliably communicate at their maximum data rates with such a high-bandwidth optical network

Uncongested Lambda Networks • TeraGrid:An optical network interconnecting ten major supercomputing sites throughout the United States. • Where is The Problem? • Device clutter • End-host loss • Cost-benefit of service

Cornell NLR Rings: • Established four static I0GbE full duplex routes that begin and end at Cornell,but transit various physical lengths • A tiny ring to New York City and back • A small ring via Chicago,Atlanta,Washington D.C., and New York City • A medium ring via Chicago, Denver, Houston, Atlanta, Washington D.C., New York City • A large ring across Chicago, Denver,Seattle, Los Angeles, Houston, Atlanta, Washington D.C.,and New York City

Experimental Measurements • Experimental Setup: • GenerateUDP and TCP Iperf traffic between the two commodity end-hosts over all paths between the Ingress and Egress end-hosts. • Before and after every experimental run，read kernel counters on both sender and receiver that account for packets being dropped at the end-host in the DMA ring, socket buffer, or TCP window. • All NLR network segments were uncongested-as a matter of fact, the background traffic over each link never exceeded 5% utilization. • Measurements: • Packet Loss • Throughput • Packet Batching

Packet Loss • Performed many sequences of 60-second UDP Iperf runs over a period of 48 hours. All paths (tiny, short, medium, and large) for data rates between 400Mbps to 2400Mbps, with 400Mbps intervals. • Interrupts via Irqbalance VS Interrupts Bound to a Single CPU • Irqbalance is a Linux* daemon that distributes interrupts over the processors and cores you have in your computer system

Throughput • Many TCP congestion control algorithms have been proposed:Fast TCP, High Speed TCP, H-TCP, BIC, CUBIC, Hybla, TCP-Illinois, Westwood, Compound TCP, Scalable TCP, YeAH-TCP. • 60-second Iperf bursts to conduct a set of 24-hour bulk TCP transfer tests over all the Cornell NLR Rings. • As path length increases,more data and, importantly, more ACKs are lost since theTCP windows are enlarged to match the bandwidth delay product of the longer paths.

Packet Batching • A CPU is notified of the arrival and departure of packets at a NIC by interrupts. • Batch packets by parameterizing the NIC to generate a single interrupt for a group of packets that arrive during some specified time interval • Problem: Tools that rely on accurate packet inter-arrival measurements to estimate capacity or available bandwidth yield meaningless results when employed in conjunction with packet batching.

Conclusion • UDP loss is dependent upon both the size of socket buffers and DMA rings as well as the specifics of interrupt affinity in the end-host network adapters. • TCP throughput decreases with an increase in packet (data and acknowledgment) loss, with an increase in path length, and an increase in window size. The congestion control algorithm is only marginally important in determining the achievable throughput, as most TCP variants are similar. • Built-in kernel NAPI and NIC Interrupt Throttling improve throughput, although they are detrimental for latency sensitive measurements. This reinforces the conventional wisdom that there is no "one-size-fits-all" set of parameters, and careful parameter selection is necessary for the task at hand.

Paper Two • Abstract • Author Proposed the idea of a end-system aware, rate-adaptive protocol for network transport, based on end-system performance monitoring. Our proposed protocol significantly improves the performance of data transfer over LambdaGrids by intelligently adapting the sending rate based on end-system constraints. • Key Words • RBUDP • RAPID+ • MAGNET

RBUDP • Reliable Blast UDP • Reliable Blast UDP : Predictable High Performance Bulk Data Transfer • Eric He, Jason Leigh, Oliver Yu, Thomas DeFanti • IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002.

MAGNET • Monitoring Apparatus for General KerNel-Event Tracing • MAGNET, which stands for a Monitoring Apparatus for General kerNel Event Tracing, is a high-fidelity low-overhead monitoring mechanism for exporting kernel events to user space. Because it is incorporated into the kernel, MAGNET is able to monitor unmodified applications. It also exports key parameters from the OS kernel. In much the same way that the behavior of an electric circuit can be observed on an oscilloscope by attaching leads to parts of the circuit, MAGNET allows the behavior of the operating system and applications to be observed and recorded. • http://public.lanl.gov/radiant/research/measurement/magnet.html

RAPID+, • Rate-Adaptive Protocol for Information Delivery

Rate-Adaptation Algorithm • • BNIC: The buffer capacity of the NIC at the receiver. • • RTT: The round trip time delay between the sender and the receiver. • •δn: The number of incoming packets read by the receiving application at the receiver (measured using MAGNET) during the current iteration. • • αn: The number of packets lost due to buffer overruns at the receiver during the nth iteration. • • αn-1: The number of packets lost at the receiver during the last (n-1th) iteration of data sent. • • Rn: The sending rate at the current (nth) iteration. • • Rn+1: The sending rate at the next (n+1th) iteration. • • γ: The average rate of decrement of packet losses over k successive iterations.

Experimental Results

Conclusion • RAPID+ has two features that distinguish it from other transport solutions: (1) data is transmitted at a rate that is adapted to the end-system(receiver) limitations, moreover attempting to keep the circuit fully utilized under such constraints; (2) it uses dual communication paths—a unidirectional dedicated end-to-end circuit for data transfer and the Internet for endsystem congestion notification and rate adaptation.

Paper Three • Abstract • This paper proposes for the first time a multi-layer Lambda Grid, which is a platform to establish dynamically a computing environment with a guaranteed level of throughput and bandwidth over converged IP and optical networks according to each client reservation request. • Key Words • Converged IP and optical network • Lambda grid • Resource management • Resource virtualization

NRM: Network Resource Manager • CRM: Computing Resource Manager • RC: Resource Coordinator • DER: Domain Edge Router • OXC: Optical Cross Connect • Dynamic-CPF: Dynamic Control of IP Packet Filtering • Dynamic-AEP: Dynamic Allocation of An End-to-end Path

Architecture of multi-layer lambda grid with Dynamic-CPF

Messaging diagram for Dynamic-CPF

Experimental network configuration SHD: Super High Definition video FTP: File Transfer Protocol-based software MPI: Message Passing Interface NTT: NIPPON TELEGRAPH AND TELEPHONE CORPORATION AIST: Advanced Industrial Science and Technology

Bandwidth Guarantee and Routing Control Test • In the bandwidth guarantee and routing control test, the SHD packet flow was protected by the packet filters from the best effort traffic, and was guaranteed to be 250 Mbps as SHD requested. • Bandwidth Control Test • In the bandwidth control test, each of the reserved sub-lambdas had the exact requested bandwidth according to the requests sent from the applications.

Discuss • Large-scale experiment is not easy to do • Protocol Improvement

Thanks

Bulk Data Transfer Over End-to-end Lambda Network

Bulk Data Transfer Over End-to-end Lambda Network

Presentation Transcript

Maximizing End-to-End Network Performance

End to End Protocols

End-to-End Data

Chapter 7 End-to-End Data

PhantomNet An end-to-end mobile network testbed

Towards Unbiased End-to-End Network Diagnosis

Maximizing End-to-End Network Performance

End-to-End Performance: The Network View

End-to-End XML Data Description: XDF

High Performance Active End-to-end Network Monitoring

End to end lightpaths for large file transfer over fast long-distance networks

End-to-end Performance over Research Networks

Network Tomography Using Passive End-to-End Measurements

Maximizing End-to-End Network Performance

Towards end-to-end debugging for data transfers

Chapter 7 End-to-End Data

End to End Data entry services

End-to-End Data

Maximizing End-to-End Network Performance

Towards Unbiased End-to-End Network Diagnosis

End-to-end Performance over Research Networks

End to End Protocols