Origins of Long Range Dependence Myths and Legends

Origins of Long Range Dependence Myths and Legends Aleksandar Kuzmanovic 01/08/2001

Outline • Definitions • Why is LRD important? • Heavy tails • Producing self-similar traffic • Physical interpretation in LAN and WAN networks • Different hypothesis from around 10 papers

On the Self-Similar Nature of Ethernet Traffic, W. Willinger, 1994

Definitions • Long range dependent process • if its autocorrelation function is nonsummable • Self-similar process • scaling behavior of finite dimensional distributions • X=(m^(1-H))*X(m) in distribution • Second order self-similar process • aggregated processes possess the same non-degenerate AC functions as the original process • X and (m^(1-H))*X(m) have the same AC function • Self-similar processes have hyperbolically decaying autocorrelation functions - LRD can be characterized by a single parameter H

Heavy tails (Noah effect) • Heavy-tailed distributions • LLCD • Pareto a typical example

Producing Self-Similar Traffic 1.Multiplexing ON/OFF sources that have a fixedrate in ON periods and ON/OFF period lengths that are heavy tailed. • Aggregate traffic is fBm with 2. queue model • implies that multiplexing constant-rate connections with Poisson connection arrivals and a heavy-tailed distribution for connection lifetimes would result in self-similar traffic 3.Inter-arrival packet times are i.i.d. Pareto with • and then consider the corresponding count process (the number of arrivals in consecutive intervals), we have “pseudo self-similar” traffic (Paxson, Floyd) (or even self-similar (L. Lipsky)?)

Questions we want to answer • What physical activity causes LRD? • What is the role of protocols (TCP and MAC layer protocols)? • What is the role of limited resources (i.e. bandwidth)? • What model fits best to each of the assumptions? • What is the largest time-scale over which the correlation is present? • Self-similarity vs. pseudo self-similarity and relevance

Statistical Analysis of Ethernet LAN Traffic at the Source Level, W. Willinger, 1997, I

Statistical Analysis of Ethernet LAN Traffic at the Source Level, W. Willinger, 1997, II • Model 1 (heavy tailed ON/OFF activity at the source level) is widely accepted • Result proven theoretically • Noah effect (heavy-tailed periods) • ON periods alpha = 1.7 • OFF periods alpha = 1.2 • TCP traffic measured most of the time... • Higher load - H increases • WAN measurements do not fit into this model • connection typically do not stay long

Wide Area Traffic: The Failure of Poisson Modeling, V. Paxson, S. Floyd, 1995 • Summary of ways to produce LRD traffic • WAN (TCP) traffic for TELNET and FTP applications • TELNET connection arrivals appear to be Poisson, but packet arrivals are not • Single TELNET connection is LRD • Model 3: Inter-arrival times are i.i.d. Pareto • Aggregate is also LRD, but there is no analytical proof (*) • FTP traffic also LRD, yet non of the models fit because of limited resources. • Aggregated traffic is not fBm (single H is not enough)

Explaining WWW Traffic Self-Similarity, M. Crovella, 1995 • WWW traffic is self-similar • but only when load is high (i.e. in busiest hours) • Authors force model 1 (ON/OFF model) • The distribution of: • transfer times (alpha = 1.21) • user requests for documents (alpha = 1.06) • document sizes available in the Web (alpha = 1.05) • user think times (alpha = 1.5) • H increases as the load increases (same as in LAN)

On the Relationships betw. file sizes, tran. prot. and s-s netw. traffic, M. Crovella, 1996 • Model 1: The success of this simple model is surprising given that it ignores non-linarities arising in real networks • Hypothesis: • Heavy tailed file size distributions together with TCP is responsible for LRD • if UDP is used, there is little or no LRD • Explanation • “In some sense, the effect of the unaccounted for nonlinearity is reflected back as a stretching in time effect, thus conforming to the model’s original suppositions” • Other interesting stuff: mix of Pareto and exp. background traffic

On the Propagation of LRD in the Internet, A. Veres, 2000, I • Not about roots, but about propagation of self-similarity by TCP • A(t) = C - B(t) • TCP is a linear system beyond a characteristic time scale • if it adapts well to a background traffic, it itself becomes self-similar

On the Propagation of LRD in the Internet, A. Veres, 2000, II • Experimental proof: • NY-Budapest file transfer, source is not LRD - traffic is LRD (H=0.76) • Max time scale = 8 min • Also, if there is number of on-off TCP connections, they can spread LRD • W. Willinger obviously does not like this paper: • “This is a fraud and has no relevance for LRD observed on link level...” • “Protocols have no impact on LRD, they just have to send the data generated by applications...”

TCP Congestion Control and Heavy-Tails, M. Crovella, 2000, I • Switch to Model 3 (Heavy-tailed inter-packet arrivals) • Although heavy-tailed flow lengths are commonly associated with heavy-tailed file sizes, there is no strong correlation between file sizes and transmission times • It has been shown that TCP can show heavy-tailed inter-arrival times under some conditions • Because most of the connections are short lived (!) only slow start and exp. back-off were considered

TCP Congestion Control and Heavy-Tails, M. Crovella, 2000, II • Simple Markov chain model for exp. backoff and slow start with pr. of loss parameter • State probability with different loss rates • For alpha to be between 1 and 2, p has to be between 1/8 and 1/4 ...but for different model • p increases => H increases

TCP Congestion Control and Heavy-Tails, M. Crovella, 2000, III • Pathological TCP connections: 15 packets • Analytical model not that good (borders are loose) • For this set-up, correlation up to 1000 sec • For larger file sizes, up to 200-300 sec • Under certain conditions, heavy tailed transmission times can occur even in the absence of any variability in file sizes • Future work: to consider the variability in round-trip time estimation

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, I • Answer to previous two papers: • TCP can create self-similarity but over finite range of time scales - “pseudo self similarity” • but everything in nature is finite (thus “pseudo”) • Also criticize pathological model of previous paper, but they themselves use pathological model of different kind (always packets model) • Separate Markovian models for Congestion avoidence (CA) and Time Out (TO) models • Simulated these two models with different loss probability parameters

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, II • Range of time scales observed from the simulation (2^6*RTT*(2.5 to 10)) => 2^9*RTT • Explanation on why aggregate is self-similar • independent bottlenecks (at the edge) • aggregate of independent pseudo-self-similar flows should be self-similar itself (**)

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, III • !About Veres paper • compute loss probability (0.08 to 0.14) • TO model predicts H=0.69-0.72 (really measured 0.74) • Time scale goes up to 2^6 RTO (also near measured value) • Experiments (file transfers) • North-South America • Measurements: p = 0.13, H = 0.77, ts = (2^7 to 2^8)*RTT • TO model: p = 0.12, H = 0.72, ts = (2^7 to 2^9)*RTT • East - West Coast • Measurements: p = 0.018, H = 0.86, ts = 2^6*RTT • CA model: p = 0.018, H = 0.75, ts = 2^4*RTT • One should be careful when attributing the origin of traffic characteristics to a specific cause

Protocols Can Make Traffic Appear Self-Similar, Jon Peha, 1997. I • How basic retransmission mechanism can cause self-similarity • No model, only experimental investigation • Simple single queue (bottleneck) model • Input traffic - Poisson; retransmissions are bursty • As time-scale gets larger, burstiness from original Poisson traffic decreases, but burstiness from retransmissions stays the same! • Unlikely that traffic from retransmission mechanism cause truly self similar traffic, rather pseudo self-similarity

Protocols Can Make Traffic Appear Self-Similar, Jon Peha, 1997. II • Pictorial “proof”

Protocols Can Make Traffic Appear Self-Similar, Jon Peha, 1997. III • Cut-off time scales observed: • 150Mbps link rate, 500 bits packets, RTT 60 msec • TS = 5 minutes • 10Mbps Ethernet, No. of retransmissions=5, To=125 • TS in range of minutes • For larger To, it is possible to reach time scales measured atBellcore • I have computed cut-off time-scale for Veres paper • 128 Kbps, Tout=10*RTT=2 sec, TS=8min • If this effect is found to be as strong in more complex models, this could be a significant cause

The Second-order Characteristics of TCP, J.Y.Boudec, 1996, I • Pseudo self similarity (TS=20-30 sec) • Minimum bottleneck bandwidth 34Mbps (?) • Two main reasons (both heavy-tailed) • Burst length arrivals • Round trip time • Real network measurements • Figure - missing

The Second-order Characteristics of TCP, J.Y.Boudec, 1996, II • Even for 34Mbps link and utilization of 25%, the arrival bursts are eliminated and the inter packet times are dependent on the round trip times • The aggregate of TCP connections have the same H as a single TCP connection (***) • “It seems likely that the heavy tailed distributions observed in Willinger’s work were a result of, among other things, the heavy tailed distribution of a round trip time”

More on RTTs • Why are round trip times heavy-tailed? • Because of TCP congestion control? • Because of retransmissions? • Because of variety of destinations? • It can be heavy-tailed even without any congestion protocol or different destinations! • Measurement and Analysis of LRD Behavior of Internet Packet Delay,M. Borella, Infocom 97 • Constant UDP transmissions - LRD response • Is cross-traffic heavy-tailed? • Or multiple bottlenecks assumption? • Simple example (not through bandwidth adaptation, but through RTT adaptation)

Heavy-tailed parameters File sizes Connection life-times Inter-arrival packet times Document sizes available in the web User think times TELNET packet arrivals Round trip times Pseudo self-similarity it should be clear that the range of time scales covered is far beyond dominant time scales, and as long as packet loss is concerned, this is relevant Summary

Conclusions • One should be careful when attributing the origin of traffic characteristics to a specific cause • There is more than one physical activity causing LRD • Protocols (TCP) influence is more than relevant • Time scales covered are relevant in both generation, time-stretching and propagation hypothesis • Model 3 (inter-arrival times i.i.d. Pareto) plus heavy-tailed file sizes (introducing congestion) is promising • Analytical proof for aggregate is missing (simulation proof reported in 3 papers) • Round-trip times hypothesis might be promising - supports Veres idea in a slightly different way

Origins of Long Range Dependence Myths and Legends

Origins of Long Range Dependence Myths and Legends

Presentation Transcript

Myths and legends

Myths and Legends

Folktales, myths, and legends

Myths, Legends, and Tales

Myths, Legends and Folktales

Norse myths and legends

Myths and Legends

Myths, Legends and Folktales

Myths, Legends and Folktales

Northern Myths and Legends

Myths and legends

Myths and Legends

Myths and Legends

Long memory or long range dependence

Myths and Legends

Myths, Legends, and Tales

Myths and Legends

Myths and Legends

Myths and Legends

Roman Myths and Legends

MYTHS AND LEGENDS

Myths, Legends and Folktales