1 / 13

(Un)Trustworthy Wireless: What your wireless traffic says about you…

(Un)Trustworthy Wireless: What your wireless traffic says about you…. Jeff Pang with Ben Greenstein, Ramki Gummadi, Tadayoshi Kohno, David Wetherall (UW/Intel Seattle), and Srini. What are we trying to achieve?. Time to rethink privacy implications of wireless networks

derora
Download Presentation

(Un)Trustworthy Wireless: What your wireless traffic says about you…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Un)Trustworthy Wireless:What your wireless traffic says about you… Jeff Pang with Ben Greenstein, Ramki Gummadi, Tadayoshi Kohno, David Wetherall (UW/Intel Seattle), and Srini

  2. What are we trying to achieve? • Time to rethink privacy implications of wireless networks • Identify the shortcomings of current designs and how an adversary might exploit them • Propose some directions for thwarting these attacks • Initial focus on Wi-Fi, but aim is to address other protocols as well, e.g., Bluetooth, RFID, GSM

  3. What is wireless privacy? • Traditional notions: • data encryption • user authentication • Anonymity is also important • Traditional notion not quite right: e2e privacy only • Data encryption doesn’t preserve anonymity • 3rd party can still track where a user goes, with whom he might be communicating, what sorts of data he might be exchanging, and what sorts of applications he might be running • traditionally known as traffic analysis, but much easier to do with ubiquitous wireless

  4. What information is being leaked? • The link between a wireless card and its associated AP • Where a user has been • Thug tracks a user from the bank’s network to the dark alley’s network  • Who has been in an area • Jealous boyfriend monitors girlfriend’s apartment network • Timestamps of user transmissions • When are people talking and how much are they saying (chatter) • Who is talking to whom? (assumes monitors at both edges) • A dermatologist shares records with an oncologist near patient X, ergo X may have melanoma

  5. An initial problem statement • Adversary: • Can passively sniff all 802.11 traffic at various locations (e.g., café, library, your home, conference) • Goal: • Wants to know where you were at and when you visited • Question: • Given a traffic sample, how accurately can an adversary accurately classify it as belonging to you or not? • Assumptions: • Adversary has some traffic samples “known” to come from you (e.g., sitting next to you while he/she is collecting it) • Adversary has collected a library of traffic samples from other (random) users in the targeted locations

  6. The obvious answer • Yes! • Trivially, by looking at MAC addresses • globally unique • always transmitted in the clear • But that is also trivially thwarted • Can change MAC address each time you associate to an AP • Suppose the next wireless driver patch does this • Knowledgeable users can do this themselves, of course • But is this a sufficient fix to advertise “improved privacy”? • Revised question: • How accurately can the adversary classify a traffic sample if MAC addresses change, say, each hour?

  7. Initial approach • Fairly generic machine learning algorithm: • Compute a “profile” based on known traffic from target user • Based on profile, generate features for each traffic sample • Use known traffic samples to train a naïve bayes model (e.g., generate a probability table for each feature) • Given a new sample, model outputs a probability p that sample came from target user • Assume positive match if p > T, for some T • Two types of profile features: • 802.11 specific (ctrl pkt contents, driver timing behavior, etc.) • Ben Greenstein working on this • 802.11 agnostic (IP/application traffic features) • I’m working on this

  8. Initial features • Conjecture: the sites you visit identify you • e.g., only you visit slashdot, cnn, joe’s blog, etc. • Profile P: • Set of IP destinations we observe you talking to • Feature: • Set similarity of the IPs seen in the traffic sample S and your profile; i.e., • intersection(P, S)/union(P, S) • Higher scores mean the traffic sample visited more of the same sites

  9. Initial features (2) • Problem: User can mask IP packet contents • AP can use WEP/WPA • User might tunnel traffic through a VPN • Attempt to use other exposed features • Object sizes: previous work shows object sizes from a website identifies it accurately • use packet timings to group packets into “objects” • feature: set similarity based on the set of object sizes users accessed • challenges: overlapping flows, dynamic web content • Other possibilities: infer site RTT, site bandwidth, etc. • Question: how good can we do?

  10. Initial results • Setup • SIGCOMM ’04 Wireless traces • Wireless traffic from ~200 users across 3 days at the conference • Limitations: homogenous location, biased user population, limited timeframe • Looking for volunteers to collect better data!  • Build profiles and train model using traffic on the first day • For each hourly traffic sample in the 2nd and 3rd day: • For each user: • Can we determine if a sample comes from that user or not? • Metrics: • True positive rate • the fraction of samples from that user that are correctly classified • False positive rate • the fraction of samples not from that user that are misclassified • Tune the classification threshold T to trade-off one for the other

  11. Mean accuracy

  12. Some profiles better than others IP Destinations Object Sizes

  13. Summary + Near Future Work • Using sites visited is one promising feature to identify users • Current inference of object sizes is insufficient as a stand-in when IP traffic is encrypted • But for some users, does give positive information gain • Next steps: • Combine with other application traffic features like inferred RTT • Combine with 802.11 specific features. E.g., SSID broadcasts: • 43% of sources had at least one unique SSID

More Related