1 / 33

Experience with Loss-Based Congestion Controlled TCP Stacks

Experience with Loss-Based Congestion Controlled TCP Stacks. Yee-Ting Li University College London. Introduction. Transport of Data for next generation applications Network hardware is capable of Gigabits per second

gavin
Download Presentation

Experience with Loss-Based Congestion Controlled TCP Stacks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

  2. Introduction • Transport of Data for next generation applications • Network hardware is capable of Gigabits per second • Current ‘Vanilla’ TCP not capable over long distances and high throughputs • New TCP Stacks have been introduced to rectify problem • Investigation into the performance, bottlenecks and deploy-ability of new algorithms

  3. Transmission Control Protocol • Connection orientated • Reliable Transport of Data • Window based • Congestion and Flow Control to prevent network collapse • Provides ‘fairness’ between competing streams • 20 Years old • Originally designed for kbit/sec pipes

  4. TCP Algorithms • Based on two algorithms to determine rate at which data is to be sent • Slowstart: probe for initial bandwidth • Congestion Avoidance: maintain a steady state transfer rate • Focus on Steady State: probe for increases in available bandwidth, whilst backing off if congestion is detected (through loss). • Maintained through a ‘congestion window’ cwnd that regulates the number of unacknowledged packets allowed on connection. • Size of window approx equals Bandwidth delay product • Determines the appropriate window size to set to obtain a bandwidth under a certain delay • Window = Bandwidth x Delay

  5. Algorithms • Congestion Avoidance • For every packet (ack) received by sender • Cwnd  cwnd + 1/cwnd • For when loss is detected (through dupacks) • Cwnd  cwnd / 2 • Growth of cwnd determined by: • the RTT of the connection • When rtt is high, cwnd grows slowly (because of acking) • The loss rate on the line • High loss means that cwnd never achieved a large value • Capacity of the link • Allows for large cwnd value (when low loss)

  6. Advantages Achieves good throughput Not changes to kernels required Disadvantages Have to manually tune the number of flows May induce extra loss on lossy networks Need to reprogram/recompile software Current Methods of Achieving High Throughput

  7. New TCP Stacks • Modify the congestion control algorithm to improve response times • All based on modifying the cwnd growth and decrease values • Define: • a = increase of data packets per window of acks • b = decrease factor upon congestion • To maintain compatibility (and hence network stability and fairness), for small cwnd values: • Mode switch from Vanilla to New TCP

  8. HSTCP • Designed by Sally Floyd • Determine a and b as a function of cwnd • a  a(cwnd) • b  b(cwnd) • Gradual improvement in throughput as we approach larger bandwidth delay products • Current implementation focused on performance upto 10Gb/sec – set linear relation between loss and throughput (response function)

  9. Scalable TCP • Designed by Tom Kelly • Define a and b to be constant: • a: cwnd  cwnd + a (per ack) • b: cwnd  cwnd – b x cwnd • Intrinsic scaling property that has the same performance over any link (beyond the initial threshold) • Recommended settings • a = 1/100 • b = 1/8

  10. H-TCP • Designed by Doug Leith and Robert Shorten • Define a mode switch so that after congestion we do normal Vanilla • After a predefined period ∆L, switch to a high performance a • ∆i≤ ∆L: a = 1 • ∆I> ∆L: a = 1 + (∆ - ∆L) + [(∆ - ∆L)/20]2 • Upon loss drop by • | [Bimax(k+1) - Bimax(k)] / Bimax(k) | > 0.2: b = 0.5 • Else: b = RTTmin/RTTmax

  11. Implementation • All New Stacks have own implementation • Small differences between implementations means that we are comparing the kernel differences rather than just the algorithmic differences • Lead to development of ‘test platform’ kernel  altAIMD • Implements all three stacks via simple sysctl switch. • Also incorporates switches for certain undesirable kernel ‘features’ • moderate_cwnd() • IFQ • Added extra features for testing/evaluation purposes • Appropriate Byte Counting (RFC3465) • Inducible packet loss (at recv) • Web100 TCP logging (cwnd etc)

  12. UCL Manchester StarLight CERN Cisco 7600 Cisco 7600 Juniper Cisco 7600 Cisco 7600 Cisco 7600 Networks Under Test • Networks MB-NG DataTAG Bottleneck Capacity 1Gb/sec RTT 120msec Bottleneck Capacity 1Gb/sec RTT 6msec

  13. Graph/Demo • Mode switch between stacks on constant packet drop { { { Vanilla TCP Scalable TCP HS-TCP

  14. Comparison against theory • Response function

  15. Self Similar Background Tests • Results skewed • Not comparing differences in TCP algorithms! • Not useful results!

  16. SACK … • Look into what’s happening at the algorithmic level: • Strange hiccups in cwnd  only correlation is SACK arrivals Scalable TCP on MB-NG with 200mbit/sec CBR Background

  17. SACKS • Supplies the sender information about what segments the recv has • Sender infers the missing packets to resend • Aids recovery during loss and prevents timeouts • Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each SACK • Very cpu intensive • Can be interrupted by arrival of next SACK which causes the SACK implementation to misbehave • Tests conducted with Tom Kelly’s SACK fast-path patch • Improves SACK processing, but still not sufficient

  18. Periods of web100 silence due to high cpu utilization Logging done in userspace – kernel time taken up by tcp sack processing TCP resets cwnd SACK Processing overhead

  19. Congestion Window Moderation • Linux TCP implementation adds ‘feature’ of moderate_cwnd() • Idea is to prevent large bursts of data packets under ‘dubious’ conditions • When an ACK acknowledges more than 3 packets (typically 2) • Adjusts cwnd to known number of packets ‘in-flight’ (plus extra 3 packets) • Under large cwnd sizes (high bandwidth delay products), throughput can be diminished as result

  20. CPU Load and Throughput

  21. 90% TCP AF moderate_cwnd(): Vanilla TCP moderate_cwnd ON moderate_cwnd OFF CWND Throughput

  22. moderate_cwnd(): HS-TCP moderate_cwnd OFF moderate_cwnd ON 70% TCP AF 90% TCP AF

  23. moderate_cwnd(): Scalable-TCP moderate_cwnd OFF moderate_cwnd ON 70% TCP AF 90% TCP AF

  24. Multiple Streams Aggregate BW CoV

  25. 10 TCP Flows versus Self-Similar Background Aggregate BW CoV

  26. 10 TCP Flows versus Self-Similar Background BG Loss per TCP BW

  27. Impact • Fairness: ratio of throughput achieved by one stack against another • Means that a fairness against vanilla tcp is defined by how much more throughput a new stacks gets more than vanilla • Doesn’t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing traffic? (mostly vanilla tcp) • Redefine fairness in terms of the Impact: • Consider the affect of the background traffic only under different stacks • Vary against number of TCP Flows to determine impact(vanilla flows) throughput of n-Vanilla flows • BW impact = throughput of (n-1) Vanilla flows + 1 new TCP flow

  28. Impact of 1 TCP Flow Throughput Throughput Impact

  29. 1 New TCP Impact CoV

  30. Impact of 10 TCP Flows Throughput Throughput Impact

  31. 10 TCP Flows Impact CoV

  32. WAN Tests

  33. Summary • Comparison of actual TCP differences through test platform kernel • Problems with SACK implementations mean that it is difficult under loss to maintain high throughput (>500Mbit/sec) • Other problems exist with kernel implementation that hinder performance • Compare stacks under different artificial (and hence repeatable) conditions • Single stream: • Multiple stream: • Need to study over wider range of networks • Move tests onto real production environments

More Related