1 / 25

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks. Tian Bu 1 , Jin Cao 1 , Aiyou Chen 1 , Patrick P. C. Lee 2 Bell Labs, Alcatel-Lucent 1 Columbia University 2 May 10, 2007. Outline. Motivation Why heavy-key detection? What are the challenges?

fremont
Download Presentation

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu1, Jin Cao1, Aiyou Chen1, Patrick P. C. Lee2 Bell Labs, Alcatel-Lucent1 Columbia University2 May 10, 2007

  2. Outline • Motivation • Why heavy-key detection? • What are the challenges? • Sequential hashing scheme • Allows fast, memory-efficient heavy-key detection in high-speed networks • Results of trace-driven simulation

  3. Motivation • Many anomalies in today’s networks: • Worms, DoS attacks, flash crowds, … • Input: a stream of packets in (key, value) pairs • Key: e.g., srcIPs, flows,… • Value: e.g., data volume • Goal: identify heavy keys that cause anomalies • Heavy hitters: keys with massive data in one period • E.g., flows that violate service agreements • Heavy changers: keys with massive data change across two periods • E.g., sources that start DoS attacks

  4. Challenge • Keeping track of per-key values is infeasible … Counter value v1 v2 v3 vN Key 1 2 3 N • Number of keys = 232 if we keep track of source IPs • Number of keys = 2104 if we keep track of 5-tuples (srcIP, dstIP, srcPort, dstPort, proto)

  5. Goal • Find heavy keys using a “smart” design: • Fast per-packet update • Fast identification of heavy keys • Memory-efficient • High accuracy

  6. Previous Work • Multi-stage filter [Estan & Varghese, 03] • Covers only heavy hitter detection, but not heavy changer detection • Deltoids [Cormode & Muthukrishnan, 04] • Covers both heavy hitter and heavy changer detections, but is not memory-efficient in general • Reversible sketch [Schweller et al., 06] • Space and time complexities of detection are sub-linear in the key space size

  7. Our Contributions • Derive the minimum memory requirement subject to a targeted error rate • Propose a sequential hashing scheme that is memory-efficient and allows fast detection • Propose an accurate estimation method to estimate the values of heavy keys • Show via trace-driven simulation that our scheme is more accurate than the existing work

  8. Hash array bucket 1 2 : K Table 1 2 … M Minimum Memory Requirement • How to feasibly keep track of per-key values? • Use a hash array[Estan & Varghese, 2003] • M independent hash tables • K buckets in each table

  9. Packet Key x value v h1 hM h2 bucket +v 1 2 +v +v : +v +v K Minimum Memory Requirement • For each packet of key x, • Find bucket in Table i by hashing x: hi(x) • Increment the counter of each hash bucket by value v Record step Table 1 2 … M

  10. Minimum Memory Requirement • Find heavy buckets, whose values (changes) > threshold • Heavy keys: associated buckets are heavy buckets Heavy bucket Detection step bucket 1 2 : K Table 1 2 … M

  11. Minimum Memory Requirement • Input parameters: • N = size of the key space • H = max. number of heavy keys •  = error rate, Pr(a non-heavy key is treated as a heavy key) • Objective: Find all heavy keys subject to a targeted error rate . • Minimum memory requirement: Size of a hash array, given by M*K, is minimized when • K = H / ln(2) • M = log2(N / ( H))

  12. How to identify heavy keys? • Challenge: hash array is irreversible • Many-to-one mapping • Solution: Enumerate all keys!! • Computationally expensive Heavy bucket bucket 1 2 : K Table 1 2 … M

  13. Heavy key Heavy key Sequential Hashing Scheme • Basic idea: smaller keys first, then larger keys • Observation: if there are H heavy keys, then there are at most H unique sub-keys with respect to the heavy keys • Find all possible sub-keys of the H heavy keys • Enumeration of a sub-key space is easier 0 0 0 0 : 16 128 59 1 : 0 135 104 2 : 255 255 255 255 Entire IP space Size = 232 Sub-IP space Size = 28

  14. w1 w2 w3 wD … +v +v +v +v +v +v +v +v +v +v Sequential Hashing Scheme -Record step Input: (key x, value v) Key x bucket 1 2 … : K 1 2 … M1 Table 1 ... M2 1 2 ... MD Array 1 Array 2 Array D

  15. Try all w2’s Try all w3’s Try all wD’s Try all w1’s 1 … 2 (1 + )H w1w2’s (1 + )H w1w2w3’s (1 + )H w1w2…wD’s (1 + )H w1’s : Array 2 Array 3 K Array 1 Array D Sequential Hashing Scheme -Detection step Heavy bucket … • - intermediate error rate  - targeted error rate

  16. Estimation • Goal: find the values of heavy keys • Rank the importance of heavy keys • Eliminate more non-heavy keys • Use maximum likelihood • Bucket values due to non-heavy keys ~ Weibull • Estimation is solved by linear programming

  17. Record step Hash arrays 1 … 2 Data stream Record step : K Array D Array 1 Detection step Candidate heavy keys Detection step Heavy keys + values Estimation Hash arrays Threshold Recap

  18. Experiments • Traces: • Abilene data collected at an OC-192 link • 1 hour long, ~50 GB traffic • Evaluation approach: • Compare our scheme and Deltoids [Cormode & Muthukrishnan, 04], both of which use the same number of counters • Metrics: • False positive rate • (# of non-heavy keys treated as heavy) / (# of returned keys) • False negative rate • (# of heavy keys missed) / (true # of heavy keys)

  19. Results - Heavy Hitter Detection False +ve/-ve rates of sequential hashing • Worst-case error rates: • Sequential hashing: 1.2% false +ve and 0.8% false -ve • Deltoids: 10.5% false +ve, 80% false –ve

  20. Results - Heavy Changer Detection False +ve/-ve rates of sequential hashing • Worst-case error rates: • Sequential hashing: 1.8% false +ve, 2.9% false -ve • Deltoids: 1.2% false +ve, 70% false –ve

  21. Summary of Results • High accuracy of heavy-key detection while using a memory-efficient data structure • Fast detection • On the order of seconds • Accurate estimation • Provides more accurate estimates than least-square regression [Lee et al., 05]

  22. Conclusions • Derived the minimum memory requirement for heavy-key detection • Proposed the sequential hashing scheme • Using a memory-efficient data structure • Allowing fast detection • Providing small false positives/negatives • Proposed an accurate estimation method to reconstruct the values of heavy keys

  23. Thank you

  24. How to Determine H? • H = maximum number of heavy keys Total data volume H ≈ threshold

  25. Tradeoff Between Memory and Computation •  – intermediate error rate • Large : fewer tables, more computation • Small : more tables, less computation

More Related