Slide 1 Internet Traffic Demand and Traffic Matrix Estimation

Challenges in directly measuring traffic demand or traffic matrix

granularity and time scale of traffic demand matrix ?

Focus mainly on two studies representing two approaches

Partial (or “sampled”) measurement at ingress/egress points/links

(optional material: will go over only briefly)

Inference of traffic matrix based on link loads (aggregate SNMP link load measurement)

gravity model

tomogravity model (optional material)

Readings: Please do the required readings

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 2 ### Traffic Demands

- How to measure and model the traffic demands?
- Know where the traffic is coming from and going to

- Why do we care about traffic demands?
- Traffic engineering utilizes traffic demand matrices in balancing traffic loads and managing network congestion
- Support what-if questions about topology and routing changes
- Handle the large fraction of traffic crossing multiple domains
- Understanding traffic demand matrices are critical inputs to network design, capacity planning and business planning!

- How to populate the demand model?
- Typical measurements show only the impact of traffic demands
- Active probing of delay, loss, and throughput between hosts
- Passive monitoring of link utilization and packet loss

- Need network-wide direct measurements of traffic demands

- How to characterize the traffic dynamics?
- User behavior, time-of-day effects, and new applications
- Topology and routing changes within or outside your network

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 3 ### Traffic Demands

Big Internet

User Site

Web Site

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 4 AS 2

AS 3, U

AS 3, U

AS 4, AS 3, U

AS 3, U

- What path will be taken between AS’s to get to the User site?
- Next: What path will be taken within an AS to get to the User site?

### Traffic Demands

Interdomain Traffic

AS 3

User Site

Web Site

U

AS 1

AS 4

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 5 110

Change in internal routing configuration changes flow exit point!

### Traffic Demands

Zoom in on one AS

OUT1

25

110

110

User Site

Web Site

300

OUT2

200

75

300

10

110

50

IN

OUT3

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 6 ### Defining Traffic Demand Matrices

Granularity and time scale:

Source/destination network prefix pairs, source/destination AS pairs

ingress/egress routers, or ingress/egress PoP pairs?

Finer granularity: traffic demands

likely unstable or fluctuate too widely!

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

6

Slide 7 ### Traffic Matrix (TM)

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 8 ### Ideal Measurement Methodology

Measure traffic where it enters the network

Input link, destination address, # bytes, and time

Determine where traffic can leave the network

Set of egress links associated with each network address (forwarding tables)

Compute traffic demands

Associate each measurement with a set of egress links

Even at PoP-level level, direct measurement can be too expensive!

We either need to tap all ingress/egress links, or collect netflow records at all ingress/egress routers

May lead to reduced performance at routers

large amount of data: limited router disk space, export Netflow records consumes bandwidth!

Either packet-level or flow-level data, need to map to ingress/egress points, and a lot of processing to generate TM!

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

8

Slide 9 ### Adapted Measurement MethodologyInter-domain Focus

[F+01] Paper (Optional Material):

Driving traffic demands from netflow measurements based on selected links

- A large fraction of the traffic is interdomain
- Interdomain traffic is easiest to capture
- Large number of diverse access links to customers
- Small number of high speed links to peers

- Practical solution
- Flow level measurements at peering links (both directions!)
- Reachability information from all routers

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 10 ### Measuring Only at Peering Links

- Why measure only at peering links?
- Measurement support directly in the interface cards
- Small number of routers (lower management overhead)
- Less frequent changes/additions to the network
- Smaller amount of measurement data

- Why is this enough?
- Large majority of traffic is interdomain
- Measurement enabled in both directions (in and out)
- Inference of ingress links for traffic from customers

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 11 Outbound

Inbound

### Inbound & Outbound Flows on Peering Links

Peers

Customers

Note: Ideal methodology applies for inbound flows.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 12 Outbound

Internal

Transit

Inbound

### Full Classification of Traffic Types at Peering Links

Peers

Customers

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 13 ### Identifying Where the Traffic Can Leave

- Traffic flows
- Each flow has a dest IP address (e.g., 12.34.156.5)
- Each address belongs to a prefix (e.g., 12.34.156.0/24)

- Forwarding tables
- Each router has a table to forward a packet to “next hop”
- Forwarding table maps a prefix to a “next hop” link

- Process
- Dump the forwarding table from each edge router
- Identify entries where the “next hop” is an egress link
- Identify set all egress links associated with a prefix

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 14 ### Flows Leaving at Peer Links

- Single-hop transit
- Flow enters and leaves the network at the same router
- Keep the single flow record measured at ingress point

- Multi-hop transit
- Flow measured twice as it enters and leaves the network
- Avoid double counting by omitting second flow record
- Discard flow record if source does not match a customer

- Outbound
- Flow measured only as it leaves the network
- Keep flow record if source address matches a customer
- Identify ingress link(s) that could have sent the traffic

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 15 ? input

? input

Use outing simulation to trace back to the ingress links!

### Most Challenging Part: Inferring Ingress Links for Outbound Flows

Example

Outbound traffic flow

measured at peering link

output

Customers

destination

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 16 Forwarding

Tables

Configuration

Files

NetFlow

SNMP

### Computing the Demands

- Data
- Large, diverse, lossy
- Collected at slightly different, overlapping time intervals, across the network.
- Subject to network and operational dynamics. Anomalies explained and fixed via understanding of these dynamics

- Algorithms, details and anecdotes in paper!

researcher in data mining gear

NETWORK

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 17 ### Experience with Populating the Model

- Largely successful
- 98% of all traffic (bytes) associated with a set of egress links
- 95-99% of traffic consistent with an OSPF simulator

- Disambiguating outbound traffic
- 67% of traffic associated with a single ingress link
- 33% of traffic split across multiple ingress (typically, same city!)

- Inbound and transit traffic (uses input measurement)
- Outbound traffic (uses input disambiguation)
- Results are pretty good, for traffic engineering applications, but there are limitations
- To improve results, may want to measure at selected or sampled customer links; e.g., links to email, hosting or data centers.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 18 ### Proportion of Traffic in Top Demands (Log Scale)

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Zipf-like distribution. Relatively small number of heavy demands dominate.

Slide 19 midnight EST

midnight EST

### Time-of-Day Effects (San Francisco)

Heavy demands at same site may show different time of day behavior

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 20 ### Discussion

- Distribution of traffic volume across demands
- Small number of heavy demands (Zipf’s Law!)
- Optimize routing based on the heavy demands
- Measure a small fraction of the traffic (sample)
- Watch out for changes in load and egress links

- Time-of-day fluctuations in traffic volumes
- U.S. business, U.S. residential, & International traffic
- Depends on the time-of-day for human end-point(s)
- Reoptimize the routes a few times a day (three?)

- Stability?

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 21 ### TM Estimation Using Link Loads

[M+02] Paper: TM estimation using SNMP link loads

- Available information:
- Link counts from SNMP data.
- Routing information. (Weights of links)
- Additional topological information. ( Peerings, access links)
- Assumption on the distribution of demands.

- TM Estimation => using indirect measurements (here link loads), solving an inference problem!
- Y: link load measurements, A “routing matrix”
- Given Y, solving for X, where Y=AX

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 22 ### Terminology

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 23 ### Three Existing Techniques

- Key issue: linear equations under-strained!
- More (N^2) unknowns (X_{ij}’s) than # of knowns Y_{l}’s

- Linear Programming (LP) approach.
- O. Goldschmidt - ISMA Workshop 2000

- Bayesian estimation.
- C. Tebaldi, M. West - J. of American Statistical Association, June 1998.

- Expectation Maximization (EM) approach.
- J. Cao, D. Davis, S. Vander Weil, B. Yu - J. of American Statistical Association, 2000

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 24 ### Linear Programming

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 25 ### Statistical Approaches

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 26 ### Bayesian Approach

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 27 ### Expectation Maximization (EM)

- Assumes Xj are ind. dist. Gaussian.
- Y=AX implies:
- Requires a prior for initialization.
- Incorporates multiple sets of link measurements.
- Uses EM algorithm to compute MLE.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 28 ### Comparison of Methodologies

- Considers PoP-PoP traffic demands.
- Two different topologies (4-node, 14-node).
- Synthetic TMs. (constant, Poisson, Gaussian, Uniform, Bimodal)
- Comparison criteria:
- Estimation errors yielded.
- Sensitivity to prior.
- Sensitivity to distribution assumptions.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 29 ### 4-node Topology

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 30 ### 4-node Topology Results

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 31 ### 14-node Topology

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 32 ### 14-node Topology Results

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 33 ### Marginal Gains of Known Rows

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 34 ### New Directions

- Lessons learned:
- Model assumptions do not reflect the true nature of traffic (multimodal behavior)
- Dependence on priors
- Link count is not sufficient (Generally more data is available to network operators.)

- Proposed Solutions:
- Use choice models to incorporate additional information.
- Generate a good prior solution.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 35 ### New Statement of the Problem

- Xij= Oi.αij
- Solution via Discrete Choice Models (DCM).
- User choices.
- ISP choices.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 36 ### Choice Models

- Decision makers: PoPs
- Set of alternatives: egress PoPs.
- Attributes of decision makers and alternatives: attractiveness (capacity, number of attached customers, peering links).
- Utility maximization with random utility models.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 37 ### Random Utility Model

- Uij= Vij + εij : Utility of PoP i choosing to send packet to PoP j.
- Choice problem:
- Deterministic component:
- Random component: mlogit model used.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 38 ### Gravity Modeling

- General formula:
- Simple gravity model: Try to estimate the amount of traffic between edge links.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 39 Slide 40 ### Further Improvement: Tomogravity Model (Optional Material)

- Two step modeling.
- Gravity Model: Initial solution obtained using edge link load data and ISP routing policy.
- Tomographic Estimation: Initial solution is refined by applying quadratic programming to minimize distance to initial solution subject to tomographic constraints (link counts).

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 41 ### Highlights

- Router to router traffic matrix is computed instead of PoP to PoP.
- Performance evaluation with real traffic matrices.
- Tomogravity method (Gravity + Tomography)

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 42 ### Recall: Gravity Model

- General formula:
- Simple gravity model: Try to estimate the amount of traffic between edge links.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 43 ### Generalized Gravity Model

- Four traffic categories
- Transit
- Outbound
- Inbound
- Internal

- Peers: P1, P2, …
- Access links: a1, a2, ...
- Peering links: p1,p2,…

Slide 44 ### Generalized Gravity Model

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 45 ### Tomography

- Solution should be consistent with the link counts.

Slide 46 ### Reducing the Computational Complexity

- Hundreds of backbone routers, ten thousands of unknowns.
- Observations:
- Some elements of the BR to BR matrix are empty. (Multiple BRs in each PoP, shortest paths)
- Topological equivalence. (Reduce the number of IGP simulations)

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 47 ### Quadratic Programming

- Problem Definition:
- Use SVD (singular value decomposition) to solve the inverse problem.
- Use Iterative Proportional Fitting (IPF) to ensure non-negativity.

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 48 ### Evaluation of Gravity Models

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 49 ### Performance of Proposed Algorithm

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 50 ### Comparison

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation

Slide 51 ### Robustness

- Measurement errors
x=At+ε

ε=x*N(0,σ)

CSci5221: Internet Traffic Demand and Traffic Matrix Estimation