De- anonymizing the Internet Using Unreliable IDs

By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Yinzhi Cao, Ionut Trestian De-anonymizing the Internet Using Unreliable IDs

Problem • A free but troublesome network • Problems we try to solve: • To what extent can we use IP addresses to track hosts? • Can we use the binding information between hosts and IP addresses to strengthen network security?

Host-Tracking Graph • Formally, we define the host-tracking graph G : H × T → IP, where H is the space of all hosts on the Internet, T is the space of time, and IP is the IP-address space.

Host-Tracking Graph

Host Representation • Since we lack strong authentication mechanisms, we consider leveraging application-level identifiers such as user email IDs, messenger login IDs, social network IDs, or cookies.

Goals • We would like to generate two outputs • The first being an identity-mapping table that represents the mappings from unreliable IDs to hosts • The second being the host-tracking graph that tracks each host’s activity across different IP addresses over time

Tracking Host Activities

Application-ID Grouping • To quantitatively compute the probability of two independent user IDs u1 and u2 appearing consecutively, let us assume that each host’s connection (hence the corresponding user login) to the Internet is a random, independent event.

Host-Tracking Graph Construction

Resolving Inconsistency • Proxy Identification • To find both types of proxies/NATs, HostTracker gradually expands all the overlapped conflict binding windows associated with a common IP address. • Guest Removal

Input Dataset • A month-long user-login trace collected at a large Web-email service provider in October, 2008 (about 330 GB). Each entry has 3 fields: • (1) an anonymized user ID (550 million) • (2) the IP address that was used to perform the email login (220 million) • (3) the timestamp of the login event • For validation: A month-long software-update log collected by a global software provider during the same period of October, 2008. • a unique hardware ID for each remote host that performs an update, • the IP address of the remote host • the software update timestamp.

Tracked events

Tracked Hosts vs. Active Hosts

Validation results

Tracked User Population

Signup Date

Email sending behavior

Applications – Detecting Malicious Activity • In a previous study we identified 5.6 million malicious IDs that are used to conduct spam campaigns • Intersection between malicious IDs and tracked IDs (220 million) is small (50k)

Signup Date - Revisited

Host Tracking – Security

Country Code Comparison

Seed Size Analysis

Conclusions • Although email accesses provide only a limited view of the Internet one can use other information for tracking – social network IDs, cookies etc • Hard to evade HostTracker and maintain attack effectiveness at the same time

De- anonymizing the Internet Using Unreliable IDs

De- anonymizing the Internet Using Unreliable IDs

Presentation Transcript

The Most Unreliable Cars

Slicing the Onion: Anonymity Using Unreliable Overlays

Using the Internet

Unreliable Computer

The Unreliable Supplier

De- anonymizing Social Networks

De-anonymizing Social Networks

De- anonymizing Data

The Unreliable Narrator

Using the Internet

Using the Internet?

Using the internet

Unreliable Narrator

Using the Internet

Using the Internet

Using the Internet

DE-ANONYMIZING SOCIAL NETWORKs

Using the Internet

Using the Internet

Using the Internet

The Unreliable Narrator

Shop Using Fake ids