1 / 27

Trajectory Sampling for Direct Traffic Oberservation

This paper discusses the use of trajectory sampling for direct traffic observation in networks, and proposes a method using deterministic hash functions to determine a representative subset of packets. It explores implementation issues and compares the results with other common approaches.

Download Presentation

Trajectory Sampling for Direct Traffic Oberservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001

  2. Circuit switched networks (e.g. telephone): Per-call state is maintained =>trivial IP networks: Don’t maintain per-flow information ? Problem: Which (spatial) path does traffic take?

  3. Why is this interesting? • Quality of Service depends on traffic management • Traffic control • Timescale: seconds • no human intervention • Traffic engineering • Timescale: minutes - months • Resource allocation • Pricing • Failover strategies

  4. Indirect measurement Uses information on Network model Network state Direct measurement Direct observation of traffic at multiple points in the network Options

  5. Problems with indirect measurement • Behavior of network elements depends on vendor-specific design choices • Deliberate sources of randomness to avoid collision • Events outside domain (route advertising by neighboring domains) • Interactions may be too complex to predict

  6. Direct measurement:Sampling of packets • Sample packets that traverse each link • Subset of packets used as representative Problem: How do we get the actual path?

  7. Key idea of the paper • Use a deterministic hash function over the packet’s content to determine subset of packets • Use the same hash function throughout the domain • Use second hash function to label packets

  8. Theory • Measurement domain represented as a directed graph • Packets • enter at ingress node • exit at egress node • Invariance function • Packet content without changing fields, e.g. time-to-live field which is decremented each hop

  9. Sampling Hash Function • Decides whether or not a given packet should be sampled • Deterministic function of the invariant packet content • Same function on each link • Results in L-bit binary number

  10. Identification Hash Function • Entire packet content could be used • Aim: limit traffic to measurement collection system • Results in m-bit binary number • Additional information may be included • Length of packet • Source, destination

  11. Invariant content • Header: three categories of fields • Variable fields (not included) • E.g., TTL, header checksum, etc. • Low entropy fields (not included) • Content changes little between packets • E.g., version, header length, protocol • High entropy fields (included) • Source and destination IP, etc. • Part of remainder of packet

  12. Ambiguities (f-h)

  13. Dealing with ambiguities • Probability that trajectory can be disambiguated depends on network topology and traffic => renormalization of results necessary • Safer to discard all duplicate labels (greater loss of samples)

  14. Specification of Hash Functions • Ordered bits of invariant part of packet content x are considered as binary integers: f(x) • Sampling hash function h(f(x)) = f(x) mod A • Identification hash function g(f(x)) = f(x) mod B with A, B positive integers

  15. Identical Packets • Automatically ambiguous => lead to biased estimators Question: • How much packet content is needed to avoid collisions? Answer: • 40 bytes lead to collision probability smaller than 10-3

  16. Implementation of hashing • 40 byte “numbers” are represented by vector of 16 bit words z = (zk ,zk-1,…,z0) = Si=0k zi 216i • Use 32 bit long division • Iteratively compute (zk ,zk-1,…,z0) mod A = (zk-1+ 216(zk mod A),…,z0) mod A

  17. Sampling independent of packet content? • Note: IP address of source and destination are included in the invariant content! • Chi-squared test • 40 byte packet prefix => 95% confidence level • 20 byte packet prefix results in strong dependence

  18. Optimal Sampling • Tradeoff • More unambiguous samples => more accuracy • More samples => more measurement traffic • Optimize for given measurement traffic mn (m bits per sample, n samples) • Small m increases collisions • Large m means smaller n

  19. (Question to the authors • Doesn’t the measurement traffic itself get sampled and thereby add another source of error? • … may be part of their future work statement)

  20. Example Service provider wants to determine what fraction of packets on a certain backbone link belongs to a certain customer • Compare • customer packets observed both on backbone and on access link • Total number of packets observed on backbone • Real and estimated fractions largely within error bars

  21. Implementation issues • Can trajectory sampling be part of next generation of high-speed interfaces? • Authors claim “yes”: • Compute both hash functions in parallel • Processor cost negligible compared with cost of interface cards • Processor speed doubles every 18 months, maximum trunk speed every 21 months

  22. Aggregation-based approaches e.g., sum of packets traversing a link Sampling-based approaches sample subset of observations Other Common Approaches

  23. Aggregation-based Approaches • Link measurements (direct) • Traffic statistics (# of bytes / # of packets transferred / dropped) • Measurements reported periodically • Flow aggregation (indirect) • Flow: sequence of packets with common field in header • Relies on emulation of routing protocol

  24. Sampling-based Approaches • Active end-to-end probes (direct) • Hosts send probe packets to one or more other hosts • Packet loss rate • Round-trip delay • End-to-end path characteristics • Variation: collect and exchange measurements of multicast session

  25. Related Work • Measure end-to-end performance of individual flows • ATM cells sampled at ingress and egress points • Determine QoS for a single connection, e.g., delay and loss rate

  26. Extensions and Other Applications • Distributed denial of service attacks • Attackers use packet spoofing • Filtering • A configurable packet filter may allow trajectory sampling for a subset of packets • Probe Packets • Packet content may be constructed to ensure sampling

  27. Conclusions • Simple processing • No Router state required • Packets directly observed

More Related