1 / 23

De- anonymizing the Internet Using Unreliable IDs

By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Yinzhi Cao, Ionut Trestian. De- anonymizing the Internet Using Unreliable IDs. Problem. A free but troublesome network Problems we try to solve: To what extent can we use IP addresses to track hosts?

masseyd
Download Presentation

De- anonymizing the Internet Using Unreliable IDs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Yinzhi Cao, Ionut Trestian De-anonymizing the Internet Using Unreliable IDs

  2. Problem • A free but troublesome network • Problems we try to solve: • To what extent can we use IP addresses to track hosts? • Can we use the binding information between hosts and IP addresses to strengthen network security?

  3. Host-Tracking Graph • Formally, we define the host-tracking graph G : H × T → IP, where H is the space of all hosts on the Internet, T is the space of time, and IP is the IP-address space.

  4. Host-Tracking Graph

  5. Host Representation • Since we lack strong authentication mechanisms, we consider leveraging application-level identifiers such as user email IDs, messenger login IDs, social network IDs, or cookies.

  6. Goals • We would like to generate two outputs • The first being an identity-mapping table that represents the mappings from unreliable IDs to hosts • The second being the host-tracking graph that tracks each host’s activity across different IP addresses over time

  7. Tracking Host Activities

  8. Application-ID Grouping • To quantitatively compute the probability of two independent user IDs u1 and u2 appearing consecutively, let us assume that each host’s connection (hence the corresponding user login) to the Internet is a random, independent event.

  9. Host-Tracking Graph Construction

  10. Resolving Inconsistency • Proxy Identification • To find both types of proxies/NATs, HostTracker gradually expands all the overlapped conflict binding windows associated with a common IP address. • Guest Removal

  11. Input Dataset • A month-long user-login trace collected at a large Web-email service provider in October, 2008 (about 330 GB). Each entry has 3 fields: • (1) an anonymized user ID (550 million) • (2) the IP address that was used to perform the email login (220 million) • (3) the timestamp of the login event • For validation: A month-long software-update log collected by a global software provider during the same period of October, 2008. • a unique hardware ID for each remote host that performs an update, • the IP address of the remote host • the software update timestamp.

  12. Tracked events

  13. Tracked Hosts vs. Active Hosts

  14. Validation results

  15. Tracked User Population

  16. Signup Date

  17. Email sending behavior

  18. Applications – Detecting Malicious Activity • In a previous study we identified 5.6 million malicious IDs that are used to conduct spam campaigns • Intersection between malicious IDs and tracked IDs (220 million) is small (50k)

  19. Signup Date - Revisited

  20. Host Tracking – Security

  21. Country Code Comparison

  22. Seed Size Analysis

  23. Conclusions • Although email accesses provide only a limited view of the Internet one can use other information for tracking – social network IDs, cookies etc • Hard to evade HostTracker and maintain attack effectiveness at the same time

More Related