1 / 20

The Devil and Packet Trace Anonymization

This article discusses the problem of anonymizing packet traces before they are released, and explores a new tool called tcpmkpub that provides a general framework for anonymization.

Download Presentation

The Devil and Packet Trace Anonymization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Devil and Packet Trace Anonymization Authors: Ruoming Pang, Mark Allman, Vern Paxson and Jason Lee Published: ACM SIGCOMM Computer Communication Review, Volume 36 , Issue 1 ,January 2006 Presenter: Ping Wang

  2. Overview • Problem • How to anonymize the packet traces before released • Goal • Try to preserve as much as possible information

  3. Background • Why share? • Verify the previous results • Compare to the competing ideas on the same data • Provide a broader view • Who share? • NLANR’s PMA packet traces • CAIDA’s skitter measurement • LBNL’s internal traffic

  4. Background cont. • Available anonymization tools • tcpdpriv • Ipsumdump • tcpurify Not general enough, and most of them focus on only the header field, primarily IP addresses

  5. A New tool - tcpmkpub • Provides a general framework for anonymizing traces • It is based on explicit rules for each header field

  6. An example specification • All fileds must be specified with a name, length, action(“KEEP”, “ZERO”, function)

  7. An example specification cont. • Supports case statement for the header fields which can vary

  8. Anonymization Policy • Checksums • Link layer • Network layer • Transport layer

  9. Checksums • Replace the original checksum C0 withCc • For those cannot be verified checksum • The packet has been corrupted • Insert “1” • The original packet is truncated • Use Cc (note in meta-data) • For those checksum is optional, like UCP, use zero as the checksum

  10. Link layer • Ethernet address is 6 bytes • High 3 bytes represent the NIC vendor • Scrambling the entire 6 byte address is not good for research • Scrambling only the lower 3 bytes is not good for the vendor • Remapping these two parts seperately

  11. Network layer (1) – focus on IP address • External addresses • Use the prefix-preserving address anonymization scheme proposed in other paper • Internal addresses • not use prefix-preserving address anonymization scheme • Use a prefix which is not used by external addresses within anonymous packet • subnet and host portions are mapped seperately.

  12. Network layer (1) • Scanners • Many organizations run a scanner as part of security operation • Trend to hit addresses in some order, like a.b.c.1, a.b.c.2, a.b.c.3, etc. • Keep the scanner’s IP address uniform across the trace, and flag it in the meta-data. And for the destinations of the sans, use different mapping. For exmaple: X1, X2 belongs to one subnet Y • Not involve scanner, map to X’1, X’2 in subnet Y’ • Involve scanner, map to X’’1, X’’2 in subnet Z1 and Z2

  13. Network layer (3) • Multicast addresses • preserved • Private addresses • preserved • Invalid addresses • Remap it as the subnet existed, but note this information in the meta-data.

  14. Transport layer • Preserve both port numbers and sequence numbers • Rewrite timestamp options • Transform the timestamp into separate increasing counters • Reason: Clock drift manifest in timestamp options can be leveraged to fingerprint a physical machine

  15. Testing • Can the transformed traces really be used? • Use p0f to do OS fingerprinting • Use tcpsum to find the number of packets and bytes in both the original and transformed traces

  16. Test cont. • Are the transformed traces really anonymous? • Check tcpmkpub’s own log file • Look for some string in the anonymized traces • e.g. “Document”, “Setting”, “ConfirmFIleOp” • Look for like IP addresses • Look for string versions of IP addresses • MAC addresses • Check timestamps

  17. Paper contributions • Develop a tool, tcpmkpub, for implementing arbitrary anonymization policy; • Use meta-data to help researchers to deal with lost information • Invalid checksum, scanner IP • Beyond IP address obfuscation, explore many other dangerous details • timestamp, Ethernet addresses, etc.

  18. Paper weaknesses • Only give two experiments to show the anonymized traces are useful • Could have given some anonymization results to make the policy more clear. • For example, in the scanner case, addresses a.b.c.1, a.b.c.2, a.b.c.3, what they would look like if they are involved in scaning traffic, and what if not

  19. Future work • Keep more consistency between the original and anonymized traces • Study online anonymization • Provide a tool which can be easily used for validation the anonymized traces • Provide a tool for creating an anonymization policy for tcpmkpub

  20. Questions?

More Related