1 / 77

– with special attention to location ( ) privacy

Bettina Berendt Dept. Computer Science K.U. Leuven. – with special attention to location ( ) privacy. SPACE. WEB. MINING. and. PRIVACY. : foes or friends?. SPACE. WEB. MINING. PRIVACY. BASICS. SPACE. WEB. MINING. PRIVACY. What is Web Mining? And who am I?.

areginald
Download Presentation

– with special attention to location ( ) privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bettina Berendt Dept. Computer Science K.U. Leuven – with special attention to location ( ) privacy SPACE WEB MINING and PRIVACY : foes or friends?

  2. SPACE WEB MINING PRIVACY

  3. BASICS

  4. SPACE WEB MINING PRIVACY

  5. What is Web Mining? And who am I? Knowledge discovery (aka Data mining): "the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data." Web Mining: the application of data mining techniques on the content, (hyperlink) structure, and usage of Web resources. Web structure mining Web usage mining Web mining areas: Web content mining Navigation, queries, content access & creation

  6. Why Web / data mining? “the database of Intentions“ (J. Battelle)

  7. SPACE WEB MINING PRIVACY

  8. Location-based services and augmented reality www.poynt.com

  9. Semiotically augmented reality: semapedia and related ideas

  10. Mobile Social Web

  11. SPACE WEB MINING PRIVACY

  12. What's special about spatial information? 1. Interpreting Rich inferences from spatial position to personal properties and/or identity possible Pos(A,9-17) = P1 → workplace(A,P1) Pos(A,20-6) = P2 → home(A,P2) An even richer „database of intentions“?! Pos(A,now) = P3 & temp(P3,now,hot) → wants(A,ice-cream) (location-based services) Pos(A, t in 13-18) = Pos(Demonstration,13-18) → suspicious(A) (ex. Dresden phone surveillance case 2011)

  13. What's special about spatial information? 2. Sending, or: Opt-out impossible?! Physically: You cannot be nowhere Corollary: You cannot be in two places at once → limits on identity-building Contractually: Rental car with tracking, ... Culturally I: Opt-out may preclude basics of identity construction No mobile phone/internet communication Culturally II: Opt-out considered suspicious in itself (ex. A. Holm surveillance case 2007)

  14. FOES ?

  15. SPACE WEB MINING PRIVACY

  16. Behaviour on the Web (and elsewhere) Data

  17. (Web) data analysis and mining Data Privacy problems!

  18. Technical background of the problem: • The dataset allows for Web mining (e.g., which search queries lead to which site choices), • it violates k-anonymity (e.g. "Lilburn"  a likely k = #inhabitants of Lilburn)

  19. SPACE WEB MINING PRIVACY

  20. Inferences Data mining / machine learning: inductive learning of models („knowledge“) from data Privacy-relevant (Re-)identification: inferences towards identity Profiling: inferences towards properties Application of the inferred knowledge

  21. What is identity merging?Or: Is this the same person?

  22. Data integration: an example Paper published by the MovieLens team (collaborative-filtering movie ratings) who were considering publishing a ratings dataset, see http://movielens.umn.edu/ Public dataset: users mention films in forum posts Private dataset (may be released e.g. for research purposes): users‘ ratings Film IDs can easily be extracted from the posts Observation: Every user will talk about items from a sparse relation space (those – generally few – films s/he has seen) [Frankowski, D., Cosley, D., Sen, S., Terveen, L., & Riedl, J. (2006). You are what you say: Privacy risks of public mentions. In Proc. SIGIR‘06] Generalisation with more robust de-anonymization attacks and different data: [Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: Proc. 30th IEEE Symposium on Security and Privacy 2009]

  23. Given a target user t from the forum users, find similar users (in terms of which items they related to) in the ratings dataset Rank these users u by their likelihood of being t Evalute: If t is in the top k of this list, then t is k-identified Count percentage of users who are k-identified E.g. measure likelihood by TF.IDF (m: item) Merging identities – the computational problem

  24. Results

  25. What do you think helps?

  26. What is classification (and prediction)?

  27. Predicting political affiliation from Facebook profile and link data (1): Most Conservative Traits Lindamood et al. 09 & Heatherly et al. 09

  28. Predicting political affiliation from Facebook profile and link data (2): Most Liberal Traits per Trait Name Lindamood et al. 09 & Heatherly et al. 09

  29. What is collaborative filtering? "People like what people like them like"

  30. User-based Collaborative Filtering • Idea: People who agreed in the past are likely to agree again • To predict a user’s opinion for an item, use the opinion of similar users • Similarity between users is decided by looking at their overlap in opinions for other items

  31. Example: User-based Collaborative Filtering

  32. Similarity between users • How similar are users 1 and 2? • How similar are users 1 and 4? • How do you calculate similarity?

  33. Popular similarity measures Cosine basedsimilarity Adjusted cosine basedsimilarity Correlation based similarity

  34. Algorithm 1: using entire matrix 5 7 7 Aggregation function: often weighted sum Weight depends on similarity 8 4

  35. Algorithm 2: K-Nearest-Neighbour Neighbours are people who have historically had the same taste as our user 5 7 7 Aggregation function: often weighted sum Weight depends on similarity 8 4

  36. SPACE WEB MINING PRIVACY

  37. Summary: Lots of data → lots of privacy threats (and opportunities) The Web incites one of the semiotically richest (and often machine-processable) types of interaction Space incites data-rich types of interaction → two rich sources of „the database of intentions“

  38. SPACE WEB MINING PRIVACY

  39. How many people see an ad? Television: sample viewers, extrapolate to population Web: count viewers/clickers through clickstream City streets: count pedestrians / motorists? Too many streets! → Solution intuition: sample streets, predict

  40. Fraunhofer IAIS (2007): predict frequencies based on similar streets Street segments modelled as vectors Spatial / geometric information Type of street, direction, speed class, … Demographic, socio-economic data about vicinity Nearby points of interest (buffer around segment, count #POI) KNN algorithm Frequency of a street segment = weighted sum of frequencies from most similar k segments in sample Dynamic + selective calculation of distance to counter the huge numbers of segments and measurements

  41. SPACE WEB MINING PRIVACY

  42. IP filtering:a deterministic classification modelIP → country

  43. Where do people live who will buy the Koran soon? Technical background of the problem: • A mashup of different data sources • Amazon wishlists • Yahoo! People (addresses) • Google Maps each with insufficient k-anonymity, allows for attribute matching and thereby inferences

  44. Multiple views on traffic Weather Major events Incident reports Operator ID: Nick Heading: INCIDENT Message: INCIDENT INFORMATION Cleared 1637: I-405 SB JS I-90 ACC BLK RL CCTV 1623 – WSP, FIR ON SCENE • Event store • Learning • Reasoning Traffic Prediction: space data + Web data + ... E.g. LARKC project: I. Celino, D. Dell'Aglio, E. Della Valle, R. Grothmann, F. Steinke and V. Tresp: Integrating Machine Learning in a Semantic Web Platform for Traffic Forecasting and Routing. IRMLeS 2011 Workshop at ESWC 2011.

  45. PEACEFUL COEXISTENCE ?

  46. Recall (a simple view): Cryptographic privacy solutions Data not all !

  47. "Privacy-preserving data mining" Data not all !

  48. Privacy-preserving data mining (PPDM) Database inference problem: "The problem that arises when confidential information can be derived from released data by unauthorized users” Objective of PPDM : "develop algorithms for modifying the original data in some way, so that the private data and private knowledge remain private even after the mining process.” Approaches: Data distribution Decentralized holding of data Data modification Aggregation/merging into coarser categories Perturbation, blocking of attribute values Swapping values of individual records sampling Data or rule hiding Push the support of sensitive patterns below a threshold

  49. Example 1: Collaborative filtering

  50. Collaborative filtering: ideaand architecture Basic idea of collaborative filtering: "Users who liked this also liked ..."  generalize from "similar profiles" Standard solution: At the community site / centralized: Compute, from all users and their ratings/purchases, etc., a global model To derive a recommendation for a given user: find "similar profiles" in this model and derive a prediction Mathematically: depends on simple vector computations in the user-item space

More Related