1 / 17

Anti-discrimination and privacy protection in released datasets

Anti-discrimination and privacy protection in released datasets. Sara Hajian Josep Domingo- Ferrer. Data mining. There are negative social perceptions about data mining, among which potential Privacy invasion Potential discrimination. Discrmination.

adelie
Download Presentation

Anti-discrimination and privacy protection in released datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer

  2. Data mining • There are negative social perceptions about data mining, among which potential • Privacy invasion • Potential discrimination

  3. Discrmination • Discrimination is unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit.

  4. Discrimination • Example: U.S. federal laws prohibit discrimination on the basis of: • Race , Color, Religion, Nationality, Sex, Marital status, Age, Pregnancy • In a number of settings: • Credit/insurance scoring • Sale, rental, and financing of housing • Personnel selection and wage • Access to public accommodations, education, nursing homes, adoptions, and health care.

  5. Discrimination • Discrimination can be either direct or indirect: • Direct discrimination occurs when decisions are made based on sensitive attributes. • Indirect discrimination occurs when decisions are made based on non-sensitive attributes which are strongly correlated with biased sensitive ones.

  6. Discrimination in Data mining • Automated data collection and Data mining techniques such as classification rule mining have paved the way to making automated decisions: • loan granting/denial • insurance premium computation • Personnel selection and wage

  7. Discrimination in Data mining • If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may ensue. • Anti-discrimination techniques have been introduced in data mining • Discrimination discovery • Discrimination prevention

  8. Discrimination in Data mining • Discrimination discovery • Consists of supporting the discovery of discriminatory decisions hidden, either directly or indirectly, in a dataset of historical decision records.

  9. Discrimination Discovery • Different measures of discrimination power of the mined decision rules can be defined, according to the provision of different anti-discrimination regulations. • Extended lift (elift) • Selection lift (slift)

  10. Discrimination in Data mining • Discrimination prevention • Consists of inducing patterns that do not lead to discriminatory decisions even if trained from a dataset containing them.

  11. Discrimination Prevention • How can we train an unbiased classifier when the training data is biased? • As for privacy, the challenge is to find an optimal trade-off between (measurable) protection against unfair discrimination, and (measurable) utility of the data/models for data mining.

  12. Discrimination Prevention • Methods: • Transform the source data • Modify the data mining methods • Modifying discriminatory models

  13. The framework • The framework for discrimination prevention can be described in terms of two phases: • Discrimination Measurement • Data Transformation

  14. Data transformation • The purpose is transform the original data DB in such a way to remove direct and/or indirect discriminatory biases, with minimum impact • on the data and • on legitimate decision rules, • so that no unfair decision rule can be mined from the transformed data.

  15. Data transformation • As part of this effort, the metrics should be developed that specify • which records should be changed, • how many records should be changed • and how those records should be changed during data transformation.

  16. Utility measures • Measuring direct discrimination removal • Measuring indirect discrimination removal • Measuring Data Quality • Misses Cost (MC) • Ghost Cost (GC)

  17. Thanks for your attention

More Related