New directions in traffic measurement and accounting cristian estan ucsd george varghese ucsd l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD PowerPoint PPT Presentation


  • 254 Views
  • Uploaded on
  • Presentation posted in: General

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD. Discussion Leaders Andrew Levine Jeff Mitchell . Reviewed by Michela Becchi. Outline. Introduction Cisco NetFlow Sample and Hold & Multistage Filters Analytical Evaluation Comparison

Download Presentation

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


New directions in traffic measurement and accounting cristian estan ucsd george varghese ucsd l.jpg

New Directions in Traffic Measurement and AccountingCristian Estan – UCSDGeorge Varghese - UCSD

Discussion Leaders

Andrew Levine

Jeff Mitchell

Reviewed by

Michela Becchi


Outline l.jpg

Outline

  • Introduction

  • Cisco NetFlow

  • Sample and Hold & Multistage Filters

  • Analytical Evaluation

  • Comparison

  • Measurements

  • Conclusions


Introduction l.jpg

Introduction

  • Measuring and monitoring of network traffic for Internet Backbones

    • Long term traffic engineering (traffic rerouting and link upgrade)

    • Short term monitoring (hot spots and DOS attacks’ detection)

    • Accounting (usage based pricing)

  • Scalability problem

    • FixWest, MCI traces: ~million flows/hour between end host pairs


Cisco netflow l.jpg

Cisco NetFlow

  • Flow: unidirectional stream of data identified by

    • Source IP address and port

    • Destination IP address and port

    • Protocol

    • TOS byte

    • Rx router interface

  • An entry in DRAM for each flow

  • Heuristics for end-of-flow detection

  • Flow data exported via UDP packets from routers to collection server for processing


Cisco netflow problems l.jpg

Cisco NetFlow - problems

  • Processing overhead

    • Interfaces faster then OC3 (155Mbps) slowed down by memory cache updates

  • Collection overhead

    • Collection server

    • Network connection

  • NetFlow Aggregation (based on IP prefixes, ASes, ports)

    • Extra “aggregation” cache

    • Only aggregated data exported to collection server

    • PB: High amount of aggregates


Sampled netflow l.jpg

Sampled NetFlow

  • Sampling packets

  • Per flow information based on samples

  • Problems:

    • Inaccurate (sampling and losses)

    • Memory Intensive

    • Slow (DRAM needed)


Slide7 l.jpg

Idea

  • “A small percentage of flows accounts for a large percentage of the traffic”

    • Algorithms for identifying large flows

  • Use of SRAM instead of DRAM

  • Categorize algorithms depending on:

    • Memory size and memory references

    • False negatives

    • False positives

    • Expected error in traffic estimates


Algorithms l.jpg

Algorithms

  • Sample and Hold

    • Sample to determine flows to consider

    • Update flow entry for every subsequent packet belonging to the flow

  • Multistage Filters

    • Use multiple tables of counters (stages) indexed by a hash function computed on flow ID

    • Different stages have independent hash functions

    • For each packet and for each stage, compute hash on flow ID and add the packet size to corresponding counter

    • Consider counters in all stages for addition of packets to flow memory


Sample and hold l.jpg

F1

F4

F1

F2

F2

F1

F1

F3

F3

Sample and Hold

Sampled Packet (probability=1/3)

Entry created

Entry updated

Flow Memory

F1 3

F1 2

F1 1

F3 2

F3 1

Transmitted Packets


Multistage filters l.jpg

Multistage Filters

flow memory

Array of counters

Hash(Pink)


Multistage filters11 l.jpg

Multistage Filters

flow memory

Array of counters

Hash(Green)


Multistage filters12 l.jpg

Multistage Filters

flow memory

Array of counters

Hash(Green)


Multistage filters13 l.jpg

Multistage Filters

flow memory


Multistage filters14 l.jpg

Multistage Filters

flow memory

Collisions are OK


Multistage filters15 l.jpg

Multistage Filters

Reached threshold

flow memory

stream1 1

Insert


Multistage filters16 l.jpg

Multistage Filters

flow memory

stream1 1


Multistage filters17 l.jpg

Multistage Filters

flow memory

stream1 1

stream2 1


Multistage filters18 l.jpg

Stage 2

Multistage Filters

flow memory

Stage 1

stream1 1


Parallel vs serial multistage filters l.jpg

Parallel vs. Serial Multistage Filters

  • Threshold for serial filters: T/d (d = number of stages)

  • Parallel filters perform better on traces of actual traffic


Optimizations l.jpg

Optimizations

  • Preserving entries

    • Nearly exact measurement of long lived large flows

    • Bigger flow memory required

  • Early removal

    • Definition of a threshold R < T to determine which entries added in the current interval to keep

  • Shielding

    • Avoid to update counters for flows already in flow memory

    • Reduction of false positives

  • Conservative update of the counters

    • Update normally only the smallest counter

    • No introduction of false negatives

    • Reduction of false positive


Conservative update of counters l.jpg

Conservative update of counters

Gray = all prior packets


Conservative update of counters22 l.jpg

Redundant

Redundant

Conservative update of counters


Conservative update of counters23 l.jpg

Conservative update of counters


Analytical evaluation l.jpg

Analytical Evaluation

  • Sample and Hold

    • Prob.(false negatives): (1-p)^T ~ e^(-O)

    • Best estimate for flow size s: c+1/p

    • Upper bound for flow memory size: O*C/T

      • Preserving entries: 2O*C/T

      • Early removal: O*C/T+C/R

  • Parallel Multistage Filters

    • No false negatives

    • Prob(false positives): f(1/k)^d

    • Upper bound for flow size estimate error: f(T,1/k)

    • Bound on memory requirement

      Where

      T: threshold, p:sample prob (O/T), c: number of bytes counted for flow,

      C: link capacity, O: oversampling factor, d: filter depth,

      k: stage strength (T*b/C)


Comparison w memory constraint l.jpg

Comparison w/ Memory Constraint

  • Assumptions:

    • Memory Constraint M

    • The considered flow produces traffic zC (e.g. z=0.01)

  • Observations and Conclusions:

    • Mz ~ oversampling factor

    • S&H and MF better accuracy but more memory accesses

    • S&H and MF through SRAM, SNetflow through DRAM, as long as x is larger than the ratio of a DRAM memory access to an SRAM memory access


Comparison w o mem constraint l.jpg

Comparison w/o Mem Constraint

  • Observations and Conclusions:

    • Through preserving of entries, S&H and MF provide exact estimation for long-lived large flows

    • S&H and MF gain in accuracy by losing in memory bound (u=zC/T)

    • Memory access as in case of constrained memory

    • S&H provides better accuracy for small measurement intervals => faster detection of new large flows

    • Increase in memory size => greater resource consumption


Dynamic threshold adaption l.jpg

Dynamic threshold adaption

  • How to dimension the algorithms

    • Conservative bounds vs. accuracy

    • Missing a priori knowledge of flow distribution

  • Dynamical adaptation

    • Keep decreasing the threshold below the conservative estimate until the flow memory is nearly full

    • “Target usage” of memory

    • “Adjustment ratio” of threshold

    • For stability purposes, adjustments made across 3 intervals

  • Netflow: fixed sampling rate


Measurement setup l.jpg

Measurement setup

3 unidirectional traces of Internet traffic

3 flow definitions

Traces are between 13% and 17% of link capacities


Measurements l.jpg

Measurements

S&H (threshold 0.025%, oversampling 4)

MF (strength=3)

Differences between analytical bounds and actual behavior (lightly loaded links)

Effect of preserving entries and early removal


Measurements30 l.jpg

Measurements

Flow IDs: 5-tuple

MF always better than S&H

SNetflow better for medium flows, worse for very large ones

AS: reduced number of flows (~entries in flow memory).

Flow IDs: destination IP

Flow IDs: ASes


Conclusions l.jpg

Conclusions

  • Focus on identifying large flows which creates the majority of network traffic

  • Proposal of two techniques

    • Providing higher accuracy than Sampled Netflow

    • Using limited memory resource (SRAM)

  • Mechanism to make the algorithms adaptable

  • Analytical Evaluation providing theoretical bounds

  • Experimental measurements showing the validity of the proposed algorithms


Future works l.jpg

Future works

  • Generalize algorithms to automatically extract flow definitions for large flows

  • Deepen analysis, especially to cover discrepancy between theory and experimental measurements

  • Explore the commonalities with other research areas (e.g.: data mining, architecture, compilers) where issues related to data volume and high speed also hold


The end l.jpg

The End

  • Questions?


Zipf distribution l.jpg

Zipf distribution

  • Characteristics:

    • Few data “score” very high

    • A medium number of elements have “medium score”

    • Huge number of elements “score” very low

  • Examples

    • Use of words in a natural language

    • Web use (e.g.: www.sun.com website accesses)

    • +


  • Login