1 / 15

Understanding TCP Incast Throughput Collapse in Datacenter Network

Understanding TCP Incast Throughput Collapse in Datacenter Network. Offense: Chang Seok Bae Yi Yang. Offense Outline. Challenge the contributions Challenge the methodology Challenge the conclusions Challenge the details. I nconsistence. Later definition of RTO

karah
Download Presentation

Understanding TCP Incast Throughput Collapse in Datacenter Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding TCP Incast Throughput Collapse in Datacenter Network Offense: Chang SeokBae Yi Yang

  2. Offense Outline • Challenge the contributions • Challenge the methodology • Challenge the conclusions • Challenge the details

  3. Inconsistence • Later definition of RTO • Later definition of goodput.

  4. Not well addressed the topic • larger switch buffers can delay the onset of Incast (doubling the buffer size doubles the number of servers that can be contacted). • Ethernet flow control is effective when the machines are on a single switch • Your solution lies not in the network but at the endhost. What do you think some other approach such as using traffic engineering? [2]

  5. Repeated Work • Reproduce the results in prior work • Use other’s workload code • Use other’s Linux kernel modification unexpected result! why? Just because of the different operating system Linux 2.6.28.1 vs. Linux 2.6.18.8

  6. Lack of different workloads and different environment • Refuse to use the latest, more representative workload. • The understanding of Incast should be evaluated under a wide variety of settings, i.e., different applications, environments, network equipment, and network topologies.

  7. Too small a minimum RTO can lead to spurious timeouts for wide-area network traffic • Does not address the case where a large number of short-lived TCP burstand non-TCP traffic might share the Ethernet fabric, causing severe unfairness to TCP traffic [1]

  8. Model • What’s model for variable-fragment workload • Model is incomplete and so limited • How much are you sure your model works for some other network

  9. Weakness of Quantitative models We want to know the statistic result of measured and predicted results, rather than just saying the shapes of curves are identical.

  10. Measurement • What’s timeline reconstruction and analysis tool you built • How to guarantee its correctness even though tools are not sufficiently polished to be released

  11. What does this figure mean?

  12. Region3: Goodput decrease again?

  13. Reference [1] V. S. Rajanna et al, XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platform [2] T. Benson et al, The case for fine-grained traffic engineering in data centers

  14. Thank you

  15. Some Ethernet switches provide a per-hop mechanism for flow control that operates independently of TCP’s flow control algorithm. When a switch that supports Ethernet Flow Control is overloaded with data, it may send a “pause” frame to the interface sending data to the congested buffer, informing all devices connected to that interface to stop sending or forwarding data for a period of time. During this period, the overloaded switch can reduce the pressure on its queues.

More Related