1 / 28

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices. Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College of Computing, Georgia Tech + AT&T Labs - Research. Flow matrix FM

janet
Download Presentation

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang+ and Jun (Jim) Xu* *College of Computing, Georgia Tech +AT&T Labs - Research

  2. Flow matrix FM FM [i, j, f] = the size of the flow f flowing from node i to node j Useful in Computing usage pattern of ISPs Detecting of flapping routes Detecting DDoS attacks Traffic matrix TM TM [i, j] = traffic volume from node i to node j Useful in Capacity planning and forecasting Routing configuration Network fault/reliability diagnoses Provisioning for SLA Traffic and flow matrices

  3. Traffic matrix Indirect inference (holistic) Link counts from SNMP Routing matrix Network model Direct measurement Sampling Our approach Flow matrix Not well studied yet Straightforward approach: sampling Existing approaches

  4. Data streaming algorithms • Data streaming:processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream. • Our context • Packet arrival rate is high (e.g., 10-40 Gbps) • Small but fast memory — SRAM (10ns per access) will be used. • Challenge: how to fully use SRAM to remember as much information pertinent to traffic/flow matrix as possible?

  5. Two data streaming schemes • The bitmap-based scheme • Traffic matrix • The counter array-based scheme • Flow matrix • Traffic matrix

  6. System model Sever Node i Data analysis module Online streaming module Online streaming module Node j

  7. The bitmap-based scheme • Online streaming module • Data analysis module

  8. Online streaming module • The data digest data-structure is a bit array (bitmap) initially set to all 0’s. • It is updated upon each packet arrival. • Measurement proceeds in epochs.

  9. Example Invariant packet header + the first 8 bytes of the payload packet H(.) U := U-1 If U/b < Threshold save the bitmap start a new epoch 1 0 b-1 0 1 2 i [Snoeren et al. SIGCOMM’01] shows that these 28 bytes are sufficient to differentiate almost all non-identical packets.

  10. Complexities • Computational complexity • One hash function computation • One write to the memory • Storage complexity • Each packet only produces a little more than one bit as its digest. • This can be further reduced using sampling.

  11. The bitmap-based scheme • Online streaming module • Data analysis module

  12. Data analysis module • What we have so far? (for TM [i, j]): • BMi generated by the traffic at node i (Ti) and • BMj generated by the traffic at node j (Tj) • What we want to estimate

  13. Estimation based on BMi and BMj • [Whang et al. 1990] proposed a method to infer |T| from BM , i.e., where is the number of “0”s in BM. • |Ti U Tj| can be inferred from the bitwise-OR of BMi and BMj. • An estimator of TM [i, j] is given by • We derive the variance of the estimator

  14. Multipaging 2 3 4 1 Node i Node j 1 2 3 t2 t1

  15. Eliminating the effects of clock offset and packets in transit 2 3 4 1 Node i t Node j 1 2 3 T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network)If t < T1, then overlap(1,2) = 1 Combining with packets in transit T2 : a tight upper bound of packet traversal time If t < T1+T2, then overlap(1,2) = 1

  16. Counter array based scheme • Online streaming module • Data analysis module

  17. Online streaming module • The data digest data-structure is a counter array. • It is updated upon each packet arrival. • Measurement proceeds in epochs.

  18. Example Flow label packet H(.) n+1 n b-1 0 1 2 i

  19. Counter array based scheme • Online streaming module • Data analysis module

  20. Data analysis module • Principle: find good counter-value matching between ingress nodes and egress nodes • Challenge: the hashing collisions make the one-to-one matching fail. • Method: iterative elephant-first matching • Accuracy: work well for the medium-to-large flow matrix elements due to the Zipfian nature of Internet traffic.

  21. Elephant-first matching Node i Node j Node i Node j a1>a2 a1 a2 a1-a2 0 FM[i, j, f] = a2 K a1<=a2 a1 a2 0 a2-a1 FM[i, j, f] = a1 K

  22. Evaluation • Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time. • We construct the synthetic experiments based on 16 publicly available packet-level traces from NLANR.

  23. Evaluation: traffic matrix bitmap scheme counter array scheme

  24. Metric

  25. RMSRE: traffic matrix

  26. RMSRE: flow matrix

  27. Conclusion • A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches. • Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix. • Both algorithms are designed to operate at very high speed networks.

  28. Thank You! • Questions?

More Related