1 / 26

On the Utility of Anonymized Flow Traces for Anomaly Detection

On the Utility of Anonymized Flow Traces for Anomaly Detection. Author : Martin BURKHART∗, Daniela BRAUCKHOFF†, Martin MAY‡ Journal: ITC SS 2008 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: M9915016@mail.ntust.edu.tw. Contributions.

kalb
Download Presentation

On the Utility of Anonymized Flow Traces for Anomaly Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Utility of Anonymized Flow Traces for Anomaly Detection Author : Martin BURKHART∗, Daniela BRAUCKHOFF†, Martin MAY‡ Journal: ITC SS 2008 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: M9915016@mail.ntust.edu.tw

  2. Contributions • Introduce a generic methodology for evaluating the impact of anonymization • Quantify the utility of anonymized data for a three-week long data • Present an overall estimate for the impact of anonymization

  3. Outline • Introduction • Methodology • Measurement Results • Conclusion

  4. Introduction • Traffic data is hindered • Releasing data introduces a threat to users’ privacy • Anomaly detection • Have been evaluated with anonymized data • Focus on the anonymization of IP addresses • Blackmarking • Truncation • Random Permutation • (Partial) Prefix-Preserving permutation

  5. Utility of Anonymized Data for Anomaly Detection • Granularity design space has two dimensions • Subset size • The size of the network (subnet) that is to be analyzed • Resolution • The address granularity which the traffic is analyzed • Assume the whole design space is available

  6. • Cell 1 [00,00]: Select all traffic and set the resolution to the minimum. • Cell 5 [00,16]: Select all traffic and set the resolution to /16 networks.

  7. IP address anonymization techniques • Blackmarking (BM) • Blindly replaces all IP addresses in a trace with the same value • Truncation (TR{t}) • Replaces the t least significant bits of an IP address with 0 • Random permutation (RP) • Translates IP addresses using a random permutation • Partial prefix-preserving permutation (PPP{p}) • Permutes the host and network part of IP addresses independently

  8. IP address anonymization techniques • Prefix-preserving permutation (PP) • Permutes IP addresses so that two addresses sharing a common real prefix

  9. Methodology • Data captured from the four border routers of the Swiss Academic and Research Network • IP address range contains about 2.4 million IP addresses • Traffic volume varies between 60 and 140 million NetFlow records per hour • Analyzed a three-week period (from August 19th to September 10th 2007) 713 Terabytes • Un-sampled and Non-anonymized flow data

  10. Methodology-Ground Truth • Visual inspection of metric timeseries • Computed the timeseries for five well-known metrics • byte, packet, flow counts, unique IP address counts, and the Shannon entropy¶ of flows per IP address • At 15-minute intervals • 2016 data points per metric

  11. Methodology-Ground Truth • Assigning ground truth to each interval • If the analyzed metric timeseries exposed an unusual event, classified that interval as anomalous • Identifying the anomaly type • Assigned the anomalous events to different types • Volume • A sharp increase or decrease in the volume based metrics • (D)DoS • Drop in the destination IP address entropy

  12. Methodology-Ground Truth • Scan • Increase in the destination IP address count and entropy • Network Fluctuation • Cause an increase or decrease in the IP address counts at the highest resolution • Unknown

  13. Methodology-Anomaly Detection • Use Kalman filter • Efficient recursive filter

  14. Methodology • 60 studied metrics are different variants of • Three volume-based metrics (vbm) • Byte, packet and flow counts • Two feature-based metrics (fbm) • Unique IP address count • Shannon entropy of flows per IP address • Total (3[vbm] + (2[fbm] × 2[src/dst] × 3[res])) × 2[in/out] × 2[udp/tcp] = 60 detection metrics

  15. Methodology

  16. Measurement Results

  17. Measurement Results • Volume Anomalies • Exposed by volume-based metrics • For TCP blackmarking and random permutation perform slightly better

  18. Measurement Results • Scanning and denial of service anomalies • Feature-based metrics

  19. Measurement Results • Network fluctuations • Feature-based metrics at lower resolutions

  20. Measurement Results-AUC

  21. Measurement Results • Blackmarking • Decreases the utility for detecting anomalies in UDP and TCP traffic except volume anomalies • Random permutation • Very bad with the detection of anomalies in UDP traffic • Preserving the utility for TCP traffic

  22. Measurement Results • Truncation of 8 or 16 bit • Decreases the utility for detecting anomalies in TCP traffic by roughly10 percent • Performing well for UDP traffic • (Partial) prefix-preserving permutation • No significant negative impact for detecting anomalies in UDP and TCP traffic

  23. Implicit Traffic Aggregation • Analyzing the count of additional flows for 170 webservers • Truncating a single bit • Around 10% of the webservers have a resulting traffic increase of 100% or more and 50% no additional traffic • Unaffected servers : 20% for 2 bits, 5% for 4 bits, and even 0% for 8 bits • 25% for 2 bits, 55% for 4 bits and 89% for 8 bits at least a doubling of traffic

  24. Conclusion • Anonymization techniques impact statistical anomaly detection • Introduced the detection granularity design space • Analyzed the utility of anonymized traces

  25. Thanks for your attention Q&A

More Related