1 / 41

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications. Robert Schweller 1 , Zhichun Li 1 , Yan Chen 1 , Yan Gao 1 , Ashish Gupta 1 , Yin Zhang 2 , Peter Dinda 1 , Ming-Yang Kao 1 , Gokhan Memik 1.

von
Download Presentation

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller1, Zhichun Li1, Yan Chen1, Yan Gao1, Ashish Gupta1, Yin Zhang2, Peter Dinda1, Ming-Yang Kao1, Gokhan Memik1 1 Lab for Internet and Security Technology (LIST), Northwestern Univ. 2 University of Texas at Austin

  2. The Spread of Sapphire/Slammer Worms

  3. Motivation (online change detection) • Online network anomaly/intrusion detection over high speed links • Small memory usage • Small # of memory access per packet • Scalable to large key space size • Primitives for online anomaly detection • Heavy hitters (lots of prior work) • Heavy changes: enabler for aggregate queries over multiple data streams • Asymmetric routing demands spatial aggregation • Time Series Analysis (TSA) need temporal aggregation

  4. Outline • Background on k-ary sketch • Reversible sketch problem • Modular hashing • IP mangling • Reverse hashing • Evaluation • Conclusion

  5. 0 1 K-1 1 … j … H K-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] First to detect flow-level heavy changes in massive data streams at network traffic speeds

  6. h1(k) … 0 1 K-1 Estimate v(S, k): sum of updates for key k 1 … hj(k) j + = hH(k) … a b H k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] APIs: Update (k, u): Tj [ hj(k)] += u (for all j) S=COMBINE(a,S1,b,S2):

  7. ? ? Reverse Sketch Problem • Main problem • Cannot efficiently report keys with heavy changeINFERENCE(S,t) • Important function for anomaly detection! • Our Contribution • Determine set of keys that have “large” estimates in a sketch

  8. Reversible sketch framework value stored value Streaming data recording reversible k-ary sketch IP mangling Modular hashing key Heavy change detection change threshold reversible k-ary sketch heavy change keys Reverse hashing Reverse IP mangling

  9. Outline • Background on k-ary sketch • Reversible sketch problem • Modular hashing • IP mangling • Reverse hashing • Evaluation • Conclusion

  10. H = 5 K = 212 #keys = 232 (IP addresses) E[false positives] << 1 Taking Intersections • Intersect A1, A2, A3, A4, A5

  11. The problem with simple intersection • Each set Ai can be very large ! H = 5 K = 212 #keys = 232 (IP addresses) |A1| = 232 / 212 = 220

  12. The problem with simple intersection • Each set Ai can be very large ! • Solution: Modular hashing

  13. Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits h() 12 bits 010 110 001 101

  14. h1() h2() h3() h4() 010 110 001 101 010 110 001 101 Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits Greatly reduces size of reverse mapped sets

  15. Modular hashing reduces the set size A1: 25 * 25 * 25 * 25 Intersection: Only 32 elements per word set 1 b1 2 b2 3 b3 4 b4 5 b5

  16. Modular hashing reduces the set size A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25 Intersection: 1 b1 2 b2 3 b3 4 b4 5 b5

  17. 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . * Problem: Too many collisions

  18. 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . * Problem: Too many collisions Solution: IP Mangling with GF (Galois Extension Field) IP Mangling: a bijective mapping function for breaking the key space continuity

  19. Outline • Background on k-ary sketch • Reversible sketch problem • Modular hashing • IP mangling • Reverse hashing • Evaluation • Conclusion

  20. Handling Multiple Intersections… 2H different intersections 1 b1 b1 2 b2 b2 3 b3 b3 4 b4 b4 5 b5 b5 Much more difficult – Solution: Reverse Hashing algorithms • Step 1: Reverse hashing for each module • Step 2: Infer the whole key through bucket index matching among candidates from each module

  21. 1 2 3 4 5 Reverse Hashing for Each Module Take the first word as an example candidate set of the first word in Hash table i { 2,3,5} { 2, 6,9,10} H=5, r=1, K=212 r tolerance level {0,2,3} { 2,3,8,10} { 3,6,7,9} {2,3} {2} All possible values of the first word in the sketch

  22. b11 b11 b11 b12 b12 b12 1 1 1 b21 b21 b21 b22 b22 b22 2 2 2 b31 b31 b31 b32 b32 b32 3 3 3 b41 b41 b41 b42 b42 b42 4 4 4 b51 b51 b51 b52 b52 b52 5 5 5 Bucket Index Matrix of Candidates H=5, r=1, K=212 For each x in I1, we can get B1(x), a vector of the heavy bucket sets which x hashes to. 192.168.0.1 192.123.47.62 192.*.*.* hash to the red heavy buckets

  23. Prefix Extension Algorithm Path discovery algorithm I1 B1 I2 B2 150 72 <150.72> + = 47 * more than r=1Ignore! 104 <47.72> Ignore! <236.104> 236

  24. I3 B3 182 <150.72.182> <150.72> <150.72.182.75> 32 <150.72.32> <236.104> I4 B4 <236.104.49.75> <236.104.49> 49 75 Prefix Extension Algorithm = + = +

  25. Recap: value stored value Streaming data recording reversible k-ary sketch IP mangling Modular hashing key Heavy change detection change threshold reversible k-ary sketch heavy change keys Reverse hashing Reverse IP mangling n is the size of key space

  26. Outline • Background on k-ary sketch • Reversible sketch problem • Modular hashing • IP mangling • Reverse hashing • Evaluation • Conclusion

  27. Evaluation • Dataset • A large US ISP (330M Netflow records) • NU (19M Netflow records) • Efficient data recordingFor the worst case traffic, all 40-byte packets • Software: 526Mbps on P4 3.2Ghz PC • Hardware: 16Gbps on a single FPGA broad • Only a few hundred KB to a couple of MB memory used • Only 15 memory access per packet for 48 bit reversible sketches and 16 per packet for 64 bit reversible sketches • Efficient heavy change detection and key inference • 0.34 seconds for 100 changes. 13.33 seconds for 1000 change

  28. Key Inference Accuracy • True positives and false positives of 16bit reversible sketches for 32bit IP addresses [Deltoids]: S.Muthukrishnan and Graham Cormode, What's New: Find Significant Differences in Network Data Streams. Infocom 2004

  29. More Results • Stress test with larger dataset still accurate • Scalable to larger key space size: similar results for 64bit IP pairs • Built anomaly/intrusion detection system to detect, e.g., SYN flooding and port scans [ICDCS 2006]

  30. Conclusions Proposed the first reversible sketches which • Record high speed network streams online • Detect the heavy changes and infer the keys online • Small memory usage, small # of memory access per packet • Scalable to large key space size

  31. Backup Slides

  32. Related work • Compare with [deltoids] • Accuracy better • Scalable to large key space better • # of Memory access less • [PCF, IMC2004]: not reversible • [Q. Zhao et al, IMC2005] [S.Venkataraman, NDSS2005]: unique fan-out (fan-in) estimation.

  33. Modular Hashing Optimal Hashing

  34. Reversible sketch problem However… Not reversible Lack of an inference API: INFERENCE(S,t) • Important function for anomaly detection! • Decouple the recording stage of sketches from the detection stage to enable efficient combine and inference. • Given a threshold t, report keys whose corresponding sum of updates are larger than the threshold. Our contribution: an efficient algorithm for inference

  35. ? ?

  36. 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . * Solution: IP Mangling with Problem: Too many collisions

  37. IP-mangling • Use GF (Galois Extension Field) function for attack resilience

  38. Modular Hashing Optimal Hashing Modular Hashing with IP Mangling

  39. b11 b12 1 b21 b22 2 b31 b32 3 b41 b42 4 b51 b52 5 Reverse Hashing for Each Module Take the first word as an example H=5, r=1, K=212 all possible value of the first word for the No. j heavy bucket in Hash tablei all possible value of the first word in Hash table i All possible value of the first word in the sketch

  40. False positive reduction by original sketch verifying Final result <150.72.182.75> Estimate (<150.72.182.75>, 180) (<150.72.182.75>, 180) Verified original k-ary sketch Threshold150

  41. + = a b K-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] • first to detect flow-level heavy changes in massive data streams at network traffic speeds • APIs • UPDATE(S,k,u): Tj [ hj(k)] += u (for all j) • ESTIMATE(S, k): sum of updates for key k • Linear combination: S=COMBINE(a,S1,b,S2)

More Related