400 likes | 408 Views
Congestion Responsiveness of Internet Traffic (a fresh look at an old problem). Ravi Prasad & Constantine Dovrolis Networking and Telecommunications Group College of Computing, Georgia Tech. TCP and Internet stability. Stable network: the offered load stays below the capacity ( ρ <1)
E N D
Congestion Responsiveness of Internet Traffic(a fresh look at an old problem) Ravi Prasad & Constantine Dovrolis Networking and Telecommunications Group College of Computing, Georgia Tech
TCP and Internet stability • Stable network: the offered load stays below the capacity (ρ<1) • Otherwise, persistent packet losses • Congestion collapse: fully utilized links, but almost zero per-flow goodput • Conventional wisdom #1: the Internet manages to be stable due to TCP congestion control • TCP: more than 90% of Internet traffic • TCP reduces offered load (send window) upon signs of congestion • Negative-feedback loop, stabilizing queueing system • Conventional wisdom #2: stability can be maintained without admission control or resource reservations
TCP-centric congestion control • If all flows use TCP, or TCP-friendly congestion control, then the Internet will be stable • TCP congestion control -> no congestion collapse • “Promoting the use of end-to-end congestion control in the Internet”, Floyd & Fall, ToN’99 • “Congestion control principles”, Floyd, RFC2914, 2000 • Key modeling unit: persistent flows (they last forever!) • “Rate control in communication networks: shadow prices, proportional fairness and stability”, Kelly et al., JORS’98 • “Congestion control for high performance, stability, and fairness in general networks”, Paganini et al., ToN’05 • Number of active flows does not change with time • Infinitely long flows can be effectively controlled
Receiver Sender Application Response Request Transport Network Flows are generated by users/applications, not by the transport layer! • Examples: user clicks web page, p2p movie download, machine-generated periodic FS synchronization • Session: Set of finite (i.e., non-persistent) flows, generated by single user action • Key issue: session arrival process • Does the session arrival rate reduce during congestion?
1 2 3 N Two fundamental flow arrival models • Closed-loop model • Fixed number of users, each user can generate one session at a time • New session arrival: depends on completion of previous session • E.g., ingress traffic in campus network (student downloads) • Open-loop model • Sessions arrive in network independently of congestion • Theoretically, infinite population of users • E.g., egress traffic at popular Web server • Very different models in terms of congestion responsiveness & stability
Related work • Open-loop traffic model • “Statistical bandwidth sharing: a study of congestion at flow level”, Fredj et al., Sigcomm’01 • “Stability and performance analysis of networks supporting services”, Veciana et al., ToN’01 • Closed-loop traffic model • “A new method for the analysis of feedback-based protocols with applications to engineering web traffic over the Internet”, Heyman et al., Sigmetrics’99 • “Dimensioning bandwidth for elastic traffic in high-speed data networks”, Berger & Kogan, ToN’00 • Main open issues: • What do the previous two models imply for the congestion responsiveness of aggregate Internet traffic? • Which of the previous two models is closer to real Internet traffic?
Our contributions • Introduce two new metrics for congestion responsiveness of aggregate Internet traffic • Elasticity and instability coefficient • Examine congestion responsiveness of several traffic models, including open-loop, closed-loop, and mixed traffic • Open-loop TCP traffic is less congestion responsive than even UDP traffic! • Closed-loop traffic is more congestion responsive than persistent flows • Design experimental methodology to measure Close-loop Traffic Ratio (CTR) • Measure CTR in several Internet packet traces • 70-90% of Internet traffic appears to be closed-loop • Several of implications for networking research & practice
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Elasticity metric • Quantifies the extent to which a traffic aggregate backs off upon a congestion event • Uand U ’: average throughput of aggregate traffic prior and during stimulus, respectively • Defined as fractional change in throughput • Depends on congestion event cause • Canonical congestion event: a persistent TCP transfer (stimulus) that is not limited by the receiver’s window
Elasticity • f=1 • Completely responsive • f=0 • Completely unresponsive Stimulus Cross-traffic
Elasticity • Positive elasticity • Negative elasticity • When cross traffic increases its rate upon congestion Stimulus Cross-traffic
Instability Coefficient • Instability coefficient quantifies whether (and how fast) a traffic aggregate can lead to congestion collapse upon congestion at time t • Defined as (t)=dN(t)/dt • N(t): number of active sessions at time t • ≤ 0 • Fixed or decreasing number of active sessions • Stable network • > 0 • Increasing number of active sessions • Has the potential to cause congestion collapse • Larger ; faster move towards congestion collapse
Instability Coefficient • Simulation of a stable network: = 0 • Open-loop model: session arrival rate 200/sec
Instability Coefficient • Simulation of an unstable network > 0 • Open-loop model: session arrival rate 400/sec
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Closed-loop model – PS server • N users: cycles of transfer and idle periods • S:Average session size • TT : Average transfer duration • TI : Average idle time • TT increases during congestion • Na: Number of active sessions • Elasticity f = 1/(Na+1) • Instability coefficient : cannot be positive indefinitely ( Na<N )
Open-loop model – PS server • Poisson session arrivals • S:Average session size • : Session arrival rate • Offered load = S/C • Stable only if <1 • Expected throughput for new transfer: • C(1-) : available bw • Elasticityf = 0 • Instability coefficient: > 0 if >1
Mixed traffic • Internet traffic: mix of open-loop and closed-loop traffic • Mixed traffic can be characterized by Closed-loop Traffic Ratio (CTR) • fmix = CTR* fclosed • mix> 0 when open > 1 • Not when open +closed >1
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Persistent TCP transfers • N homogenous transfers • Stimulus increases RTT and loss rate from (T,p) to (T’,p’) • UMass model to estimate TCP average throughput • Number of transfers remains constant, i.e., = 0
Constant-rate UDP transfers • Fixed number of constant-rate flows • UDP flows do not react to congestion, and they do not retransmit lost packets • Throughput after stimulus: U’= (1-p)U • Elasticity f = p >0 • Truly congestion responsive traffic should have larger elasticity than loss rate • Instability coefficient is zero • Number of flows does not change during congestion • Cannot cause congestion collapse
Open-loop TCP transfers • Poisson stream of TCP flows • Size uniformly distributed between 16-20pkts • Arrival rate chosen to vary offered load • Ideally, f=0when <1 • But, negative elasticity is possible with TCP redundant retransmissions • Increased offered load after stimulus • is positive when >1 • Possible congestion collapse • Open-loop traffic is net’s worse enemy
Closed-loop TCP transfers • When loss rate ~ 0 (i.e., small number of sessions) • Stimulus increases RTT from T to T’ • Transfer latency increases from kT to kT’ • With small number of active sessions: • Elasticity: about constant • With large number of active sessions: • Elasticity > 1/(Na+1) • Closed-loop TCP traffic: more elastic than persistent flows
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
What to measure? • Direct elasticity measurements require packet traces at bottleneck during stimulus • We have access to only a couple of such links • Direct measurements of instability coefficient require packet traces during congestion events • We have access to only a couple of congested links • Alternative: Measure CTR (closed-loop traffic ratio) • Indirect metric for congestion responsiveness • High CTR (close to one): mostly closed-loop traffic • Low CTR (close to zero): mostly open-loop traffic
CTR estimation (overview) • Start with packet trace from Internet link • Per-packet: arrival time, src/dst address & ports, size • Focus only on TCP traffic: HTTP and well-known ports • Identify users: • Downloads: user is associated with unique DST address • Uploads: user is associated with unique SRC address • Multi-user hosts and NATs is a problem (see paper for details) • For each user, identify sessions: • Session: one or more connections (“jobs”) associated with same user action • E.g., Web page download: multiple HTTP connections • Classify sessions as open-loop or closed-loop: • Successive sessions from same user: closed-loop • Session from a new user, or session arriving from known user after a long idle period: open-loop
An HTTP 1.1 connection can stay alive across multiple sessions Job : Segment of TCP connection that belongs to a single session Intra-job packet interarrivals: TCP and network-dependent (short) Inter-job packet interarrivals: caused by user actions (long) Classify interarrivals based on Silence Threshold (STH) Intra job gap Inter job gap From Connections to Jobs to Sessions 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
Intra job gap Inter job gap Silence Threshold (STH) estimation
<MSI >MSI session 2 session 1 Group jobs from same user in sessions • Intuition: jobs from same session will have short interarrivals (machine-generated) • Minimum Session Interarrival (MSI) threshold • MSI aims to distinguish machine-generated from user-initiated events • MSI = 1-5 seconds 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 Intra job gap Inter job gap session 3
<MSI >MSI > MTT < MTT Classify sessions as open/closed-loop • First session from a user is always open-loop • Session from a returning user is also open-loop, if it starts more than MTT seconds since completion of last session • MTT: Maximum Think Time • Typically, MTT would be several minutes 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 Intra job gap Inter job gap session 2 Open session 3 Close session 1 Open
Robustness to MSI & MTT thresholds • Examined CTR variation in the following ranges: • MSI: 0.1sec-2sec • MTT : 10min-25min • CTR variation < 0.05 • Linear regression: • CTR/MSI = -0.0044/sec • CTR/MTT = 0.0037/min • We use: • MSI=1 Sec. • MTT=15 Min.
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Summary • Persistent transfers have very different congestion responsiveness than finite-size transfers • Focus on open-loop and closed-loop flow arrivals • TCP or TCP-like protocols are not sufficient to avoid congestion collapse • Negative feedback at session/application layer holds key for network stability • Measurements show high CTR values for most Internet links we examined • Possibly why Internet is mostly stable
Is AQM an effective controller? • Active Queue Management (AQM) • Most AQM models assume persistent TCP flows • Provides congestion signal to flows • Stabilizes buffer occupancy • Controls link utilization • However, AQM is ineffective controller in presence of open-loop TCP traffic • Flow arrival process does not react to AQM drops • Congestion collapse still possible with AQM
Is admission control necessary? • Admission control is an effective way to control the offered load with open-loop traffic • Avoids flow aborts and reattempts • See proposals by J. Roberts and others • However, admission control is not required with closed-loop traffic • Closed-loop traffic is self-regulating • As long as the maximum possible number of active sessions does not exceed a certain threshold
What about TCP-friendliness? • “TCP friendliness” has been proposed for all non-TCP traffic as a way to avoid congestion collapse • However, like TCP, open-loop TCP friendly sessions can still cause congestion collapse • TCP friendliness is more important for fairness reasons (share bw almost equally with TCP)
Traffic models for simulations-analysis • Time to drop the persistent flows assumption! • It is not realistic • It has very different congestion responsiveness than real Internet traffic • More realistic aggregate traffic models: • Mix of both open-loop and closed-loop finite-size sessions • We need more CTR measurements to characterize the mix • We need mathematical models for closed-loop traffic behavior, considering user behavior under congestion
Session/application congestion control • Several existing applications generate sessions independent of network congestion (bad!) • Example-1: NNTP servers transfer news periodically • Example-2: CDN servers exchange content as needed or periodically • Client-side control mechanism: • Do not start new session before current session completes • Server-side control mechanism: • Use admission control when number of active sessions exceeds threshold