690 likes | 698 Views
This presentation explores the role of data mining in privacy, beyond just hiding information. It discusses the impact of data mining on our perception of reality and the need for privacy feedback and awareness tools. The presentation also examines case studies, such as the FreeBu tool for audience management on Facebook, and the relationship between privacy and discrimination/fairness.
E N D
Four years a SPY - Lessons learned in the interdisciplinary project SPION (Security and Privacy in Online Social Networks) Bettina Berendt Department of Computer Science, KU Leuven, Belgium www.berendt.de , www.spion.me
Thanks to (in more or less chronological order) • Sarah Spiekermann • Seda Gürses • Sören Preibusch • Bo Gao • Ralf De Wolf • Brendan Van Alsenoy • Rula Sayaf • Thomas Peetz • Ellen Vanderhoven • my other SPION colleagues • and many others co-authors and collaborators! [All references for these slides are at the end of the slide set.]
Overview1. What can data mining do for privacy?2. Beyond privacy: discrimination/fairness, democracy3. Towards sustainable solutions
3. DM can be modified to avoid privacy violations Is that sufficient?
... because: What is privacy? • Privacy is not only hiding information: • “dynamic boundary regulation processes […] a selective control of access to the self or to one's group“ (Altman/Petronio) • Different research traditions relevant to CS: & vis-à-vis whom? Social vs. institutional privacy
AND: Privacy vis-à-vis whom? Social privacy, institutional privacy, freedom from surveillance
... because: What is privacy? ... and what is data mining? whom? Social vs. institutional privacy
Goal (AD ~ 2008): From thesimple view... Goal (AD ~ 2008): From thesimple view...towardsa more comprehensive view
4. DM can affect our perception of reality – also enhancing awareness & reflection?! Privacy feedback and awareness tools
encrypted content, unobservable communication selectivity by access control offline communities: social identities, social requirements identification of information flows legal aspects profiling feedback & awareness tools educational materials and communication design cognitive biases and nudging interventions
Complementary technical approaches in SPION Only these ^^^ friends should see it ^^^ Nobody else should even know I communicated with them Who are (groups of) recipients in this network anyway? What happens with my data? What can I do about this? DTAI is 1 of the technical partners (with COSIC and DistriNet) Developing software tool for Privacy Feedback and Awareness Collaborating with other partners (general interdisciplinary questions, requirements, evaluation) What is Privacy Feedback and Awareness?Examples ...
1. What can data mining do for privacy? Case study FreeBu: a tool that uses community-detection algorithms for helping users perform audience management on Facebook
FreeBu is interactive, but does it give a good starting point? Testing against 3 ground-truth groupings and finding “the best“ community-detection algorithm
FreeBu: better than Facebook Smart Lists for access control • User experiment, n=16 • 2 groups, same interface (circle), algo: hierarchical modularity-maximisation vs. Facebook Smart Lists • Task: think of 3 posts that you wouldn‘t want everybody to see, select from the given groups those who should see it Result:
FreeBu: What do users think? • Two user studies with a total of 12 / 147 participants • Method: exploratory, mixed methods (interview, questionnaire, log analysis) • Results: • Affordances: grouping for access control, reflection/overview, (unfriending) • Visual effects on attention – examples “map“ & “rank“ vis.s:
More observations • No relationship of tool appreciation with privacy concerns • “don‘t tell my friends I am using your tool to spy on them“ • “don‘t give these data to your colleague“ • “how can you show these photos [in an internal presentation] without getting your friends‘ consent first?“ • Trust in Facebook > trust in researchers & colleagues? • Or: machines / abstract people vs. concrete people? • Recognition of privacy interdependencies? ( discussion of „choice“ earlier today) • Feedback tools are themselves spying tools ...
Lessons learned • Social privacy trumps institutional privacy • Change in attitudes or behaviour takes time • No graceful degradation w.r.t. usability: • Tools that are <100% usable are NOT used AT ALL. • What is GOOD? What is BETTER?
“Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination.
“Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. • View 2: Privacy is one of a set of social issues.
Discrimination-aware data mining (Pedreschi, Ruggieri, & Turini, 2008, + many since then) PD and PND items: potentially (not) discriminatory goal: want to detect & block mined rules such as purpose=new_car & gender = female → credit=no measures of discriminatory power of a rule include elift (B&A → C) = conf (B&A → C) / conf (B → C) , where A is a PD item and B a PND item Note: 2 uses/tasks of data mining here: Descriptive “In the past, women who got a loan for a new car often defaulted on it.“ Prescriptive (Therefore) “Women who want a new car should not get a loan.“
Exploratory DADM: DCUBE-GUI Left: rule count (size) vs. PD/non-PD (colour) Right: rule count (size) vs. AD-measure (rainbow-colours scale)
Evaluation: Comparing c & eDADM “hiding bad patterns“, black box “highlighting bad patterns“, white box
Online experiment with 215 US mTurkers • Framing • Prevention: bank • Detection: agency • $6.00 show-up fee • Tasks • 3 Exercise tasks • 6 Assessed tasks • $0.25 performance bonus per AT • Questionnaire • Demographics • Quant/bank job • Experience with discrimination Dabiku is a Kenyan national. She is single and has no children. She has been employed as a manager for the past 10 years. She now asks for a loan of $10,000 for 24 months to set up her own business. She has $100 in her checking account and no other debts. There have been some delays in paying back past loans.
Decision-making scenario • Task structure • Vignette, describing applicant and application • Rules: positive/negative risks, flagged • Decision and motivation, optional comment • Required competencies • Discard discrimination-indexed rules • Aggregate rule certainties • Justify decision by categorising risk factors
Rule visualisation by treatment • (not DA)DM • Neither flagged nor hidden • Constrained DADM • Hide bad features • Prevention scenario • Exploratory DADM • Flag bad features • Detection scenario residence savings residence foreigner residence foreigner
Results: Actionability and decision quality • Decisions and Motivations • DA versus DADM • More correct decisions in DADM • More correct motivations in DADM • No performance impact • Relative merits • Constrained DADM better for prevention • Exploratory DADM better for detection • Biases • Discrimination persistent in cDADM • ‘‘I dropped the -.67 number a little bit because it included her being a female as a reason.’’ Berendt & Preibusch. Better decision support through exploratory discrimination-aware data mining. in: ARTI, 2014
“Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. • View 2: Privacy is one of a set of social issues. • View 3: Heightened privacy concerns are just a symptom of something more general being wrong. (e.g. Discrimination – underlying definition of fairness – who gets to decide?)
Discrimination-aware data mining (Pedreschi, Ruggieri, & Turini, 2008, + many since then) 2 uses/tasks of data mining: Descriptive “In the past, women who got a loan for a new car often defaulted on it.“ Prescriptive (Therefore) “Women who want a new car should not get a loan.“ Goal: detect the first AND/OR block the second (= push it below a threshold)
What we did • an interactive tool DCUBE-GUI • a conceptual analysis of • (anti-)discrimination as modelled in data mining (“DADM“) • unlawful discrimination as modelled in law • framework: constraint-oriented vs. exploratory DADM • two user studies (n=20, 215) with DADM as decision support that showed • DADM can help make better decisions & motivations • cDADM / eDADM better for different settings • Sanitized patterns are not sufficient to make sanitized minds
“Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. • View 2: Privacy is one of a set of social issues. • View 3: Heightened privacy concerns are just a symptom of something more general being wrong. (e.g. Discrimination – underlying definition of fairness – who gets to decide?)
Lessons learned Privacy by design?! • A systems approach is needed “Multi-stakeholder information systems“ Diverse stakeholders Value- sensitive design, Sociology, Politics, Education Information systems Experts Software Develop- ment, IS Science, Law Interactive systems (e.g. Exploratory analysis) Users HCI Algorithms No people; “solutionism“ AI / Data mining
Effectiveness of “ethical apps“? Hudson et al. (2013): • What makes people buy a fair-trade product? • Informational film shown before buying decision? • NO • Having to make the decision in public? • NO • Some prior familiarity with the goals and activities of fair-trade campaigns as well as broader understanding of national and global political issues that are only peripherally related to fair trade? • YES