1 / 33

Sampling and Stability in TCP/IP Workloads

Sampling and Stability in TCP/IP Workloads. Lisa Hsu, Ali Saidi, Nathan Binkert Prof. Steven Reinhardt University of Michigan. Background. During networking experiments, some runs would inexplicably get no bandwidth Searched high and low for what was “wrong” Simulator bug? Benchmark bug?

laddie
Download Presentation

Sampling and Stability in TCP/IP Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling and Stability in TCP/IP Workloads Lisa Hsu, Ali Saidi, Nathan Binkert Prof. Steven Reinhardt University of Michigan MoBS 2005

  2. Background • During networking experiments, some runs would inexplicably get no bandwidth • Searched high and low for what was “wrong” • Simulator bug? • Benchmark bug? • OS bug? • Answer: none of the above MoBS 2005

  3. The Real Answer • Simulation Methodology!? • Tension between speed and accuracy in simulation • Want to capture representative portions of simulation WITHOUT running the entire application • Solution: Fast functional simulation • So what’s the problem here? MoBS 2005

  4. TCP Tuning • TCP tunes itself to the performance of underlying system • Sets its send rate based on perceived end-to-end bandwidth • Performance of network • Performance of receiver • During checkpointing simulation, had tuned to performance of meaningless system • After switching to detailed simulation, the dramatic change in underlying system performance disrupted flow MoBS 2005

  5. Timing Dependence • The degree to which an application’s performance depends upon execution timing (e.g. memory latencies) • Three classes: • Non-timing dependent (like SPEC2000) • Weakly timing dependent (like multithreaded) • Strongly timing dependent MoBS 2005

  6. Packet from application Perceived bandwidth high  send it now! Peceived bandwidth low  wait til later Execution Path Strongly Timing Dependent Application execution depends on stored feedback state from underlying system (like TCP/IP workloads) MoBS 2005

  7. Packet from application Perceived bandwidth high  send it now! MEANINGLESS Peceived bandwidth low  wait til later Execution Path Detailed Simulation Correctness Issue Functional Simulation MoBS 2005

  8. Packet from application Perceived bandwidth high  send it now! Peceived bandwidth low  wait til later Need to…. Perceived bandwidth reflects that of configuration under test Safe to take Data!! MoBS 2005

  9. Goals • More rigorous characterization of this phenomenon • Determine severity of this tuning problem across a variety of networking workloads • Network link latency sensitivity? • Benchmark type sensitivity? • Functional CPU performance sensitivity? MoBS 2005

  10. M5 Simulator • Network targeted full system simulator • Real NIC model • National Semiconductor DP83820 GigE Ethernet Controller • Boots Linux 2.6 • Uses Linux 2.6 driver for DP83820 • All systems (and link) modeled in a single process • Synchronization between systems managed by a global tick frequency MoBS 2005

  11. or 8 IPC 1 or 8 IPC 1 Cycle Mem FASTEST or 8 IPC 1 or 8 IPC 1 Cycle Mem FASTER 1 IPC + Blocking Caches  << 1 IPC OoO Superscalar Non-Blocking Caches SLOWEST FASTER Operating Modes MoBS 2005

  12. Benchmarks • 2 system client/server configuration • Netperf • Stream – a transmit microbenchmark • Maerts – a receive microbenchmark • SPECWeb99 • NAT configuration (3 system config) • Netperf maerts with a NAT gateway between client and server MoBS 2005

  13. System Under Test Drive System link cache D PF1/PF8 FC1 PF8 CACHE WARMUP MEASUREMENT CHECKPOINTING Experimental Configuration (x2 if NAT) (receiver/sender) (sender/NAT/receiver) MoBS 2005

  14. “Graph Theory” • Tuning periods after CPU model changes? • How long do they last? • Which graph minimizes Detailed modeling time necessary? • Effects of checkpointing PF width? MoBS 2005

  15. COV .5% COV 1.66% FC Cache warmup endstransition to D PF checkpoints loadedtransition to D or FC Netperf Maerts No tuning! Tuning period Tuningperiod bears brunt of tuning time • Takeaways: • Shift from “high performance” CPU to lower causes more drastic tuning periods • Shift from lower performance to higher has more gentle transition Known achievable bandwidth by each system configuration MoBS 2005

  16. Netperf Stream • Why no tuning periods? • Because it is SENDER limited! • Change in performance is local – no feedback from network or receiver required • Thus changes in send rate can be immediate MoBS 2005

  17. NAT = System Under Test sender receiver CPU changes applied here NAT Netperf Maerts The “pipe” is changing – this feedback takes longer to receive in TCP because it is not explicit  may ruin simulation MoBS 2005

  18. TCP Kernel Parameters Solved in real world by TCP timeouts, but would take much too long to simulate pouts– unACKed packets in flight cwnds– congestion window (in packets) **Reflects state of the network pipe sndwnds– available receiver buffer space (in bytes) **Reflects receiver’s ability to receive Deadlock? TCP RULES: pouts may NOT exceed cwnds bytes(pouts) may NOT exceed sndwnds MoBS 2005

  19. SPECWeb99 • Much more complex than Netperf • Harder to understand fundamental interactions • Speculations in paper – but understanding this more deeply definitely future work MoBS 2005

  20. What About Link Delay? • TCP algorithm: cwnd can only increase upon every receipt of an ACK packet • Ramp-up of cwnd is limited by RTT • KEY POINT: tuning time is sensitive to RTT MoBS 2005

  21. Conclusions • TCP/IP workloads require a tuning period relative to the network RTT when receiver limited • Sender-limited workloads are generally not problematic • Some cases lead to unstable system behavior • Tips for minimizing tuning time: • “Slow” fast forwarding CPU • Try different switchover points • Use fast-ish cache warmup period to bear brunt of transition MoBS 2005

  22. Future Work • Identify other strongly timing dependent workloads (feedback directed optimization?) • Examine SPECWeb behavior further • Further investigate protocol interactions that cause zero bandwidth periods • Hopefully lead to more rigorous avoidance method MoBS 2005

  23. Questions? MoBS 2005

  24. memory access MISS L1 Perfect Cache HIT Execution Path Non-Timing Dependent Single-threaded, application only execution (like SPEC2000) MoBS 2005

  25. L1 Missidle loop memory access Perfect Cachecontinue Execution Path RAM accessschedule different thread Weakly Timing Dependent Application execution tied to OS decisions (like multi-threaded apps) MoBS 2005

  26. Basic TCP Overview • Congestion Control Algorithm • Match send rate to the network’s ability to receive it • Flow Control Algorithm • Match send rate to the receiver’s ability to receive it • Overall goal: • Send data as fast as possible without overwhelming system, which would effectively cause slowdown MoBS 2005

  27. Congestion Control • Feedback in the form of • Time Outs • Duplicate ACKs • Feedback dictates Congestion Window parameter • Limits the number of unACKed packets out at a given time (i.e. send rate) MoBS 2005

  28. Congestion Control cont. • Slow Start • Congestion window starts at 1, every ACK received is an exponential increase in congestion window • Additive Increase, Multiplicative Decrease (AIMD) • Every ACK increases window by 1, losses perceived by DupACK halve the window • Timeout recovery • Upon timeout, go back to slow start MoBS 2005

  29. Flow Control • Feedback in the form of explicit TCP header notifications • Receiver tells sender how much kernel buffer space it has available • Feedback dictates send window parameter • Limits the amount of unACKed data out at any given time MoBS 2005

  30. Results • Zero Link Delay MoBS 2005

  31. Non Timing Dependent • Single threaded, application only simulation (like SPEC2000) • The execution timing does not affect the commit order of instructions • Architectural state generated by a fast functional simulator would be the same as a detailed simulator MoBS 2005

  32. Weakly Timing Dependent • Applications whose performance are tied with OS decisions • Multi-threaded (CMP, SMT, etc.) • Execution timing effects like cache hits and misses, memory latencies, etc. can affect scheduling decisions • However, these execution path variations are all valid and do not pose a correctness problem MoBS 2005

  33. Strongly Timing Dependent • Workloads that explicitly tune themselves to performance of underlying system • Tuning to an artificially fast system affects system performance • When switching to detailed simulation, you may get meaningless results MoBS 2005

More Related