1 / 40

Developing AI (Responsibly)

Developing AI (Responsibly). Sarah Bird. Agenda. AI development: where are we going?. It’s fragile We don’t understand the full system We don’t have good abstractions Our development tools are primitive Our process is ad-hoc. Machine Learning is Hard.

una
Download Presentation

Developing AI (Responsibly)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing AI (Responsibly) Sarah Bird

  2. Agenda

  3. AI development: where are we going?

  4. It’s fragile • We don’t understand the full system • We don’t have good abstractions • Our development tools are primitive • Our process is ad-hoc Machine Learning is Hard

  5. The Machine Learning Process is Maturing Preparation Analysis Dissemination

  6. Scaling and Performance • Flexible Experimentation and Development • Programing Languages and Testing • MLOps Development Tools are Catching up

  7. Major Themes in Development Tools

  8. End-to-End Adaptivity Automation • Full ML Life Cycle/ • Research to Production • Seamless • Fast • Responsiveness to Changes • Environment • New information • User behavior • Adversarial conditions • Automate the Process • Feature Selection • Architecture Search • Automated Machine Learning • Performance Optimizers Reinforcement Learning

  9. Performance • Cost • Fairness and Bias • Privacy • Security • Safety • Robustness Beyond Accuracy

  10. Fairness and Machine Learning

  11. "[H]iring could become faster and less expensive, and […] lead recruiters to more highly skilled people who are better matches for their companies. Another potential result: a more diverse workplace. The software relies on data to surface candidates from a wide variety of places and match their skills to the job requirements, free of human biases." Miller (2015) [Barocas & Hardt 2017]

  12. "But software is not free of human influence. Algorithms are written and maintained by people, and machine learning algorithms adjust what they do based on people’s behavior. As a result […] algorithms can reinforce human prejudices." Miller (2015) [Barocas & Hardt 2017]

  13. Do Better Avoid Harm [Cramer et al 2019]

  14. More positive outcomes & avoiding harmful outcomesof automated systems for groups of people [Cramer et al 2019]

  15. Harms of allocation withhold opportunity or resources Harms of representation reinforce subordination along the lines of identity, stereotypes Types of Harm [Cramer et al 2019, Shapiro et al., 2017, Kate Crawford, “The Trouble With Bias” keynote N(eur)IPS’17]

  16. Race (Civil Rights Act of 1964); Color (Civil Rights Act of 1964); Sex (Equal Pay Act of 1963; Civil Rights Act of 1964); Religion (Civil Rights Act of 1964);National origin (Civil Rights Act of 1964); Citizenship (Immigration Reform and Control Act); Age (Age Discrimination in Employment Act of 1967);Pregnancy (Pregnancy Discrimination Act); Familial status (Civil Rights Act of 1968); Disability status (Rehabilitation Act of 1973; Americans with Disabilities Act of 1990); Veteran status (Vietnam Era Veterans' Readjustment Assistance Act of 1974; Uniformed Services Employment and Reemployment Rights Act); Genetic information (Genetic Information Nondiscrimination Act) Legally Recognized Protected Classes [Boracas & Hardt 2017]

  17. Societal Categories i.e., political ideology, language, income, location, topical interests, (sub)culture, physical traits, etc. Intersectional Subpopulations i.e., women from tech Application-specific subpopulations i.e., device type Other Categories

  18. Better product and Serving Broader Population Responsibility and Social Impact Legal and Policy Competitive Advantage and Brand Different Motivations [Boracas & Hardt 2017]

  19. Isn’t bias a technical concept? Selection, sampling, reporting bias, Bias of an estimator, Inductive bias Isn’t discrimination the very point of machine learning? Unjustified basis for differentiation Bias, Discrimination & Machine Learning

  20. It is domain specific Concerned with important opportunities that affect people’s life chances It is feature specific Concerned with socially salient qualities that have served as the basis for unjustified and systematically adverse treatment in the past Discrimination is not a general concept [Barocas & Hardt 2017]

  21. Treatment Disparate Treatment, Equality of Opportunity, Procedural Fairness Outcome Disparate Impact, Distributive justice, Minimized inequality of outcome Discrimination Law and Legal Terms

  22. Fairness is Political

  23. Decisions will depend on the product, company, laws, country, etc. Someone must decide

  24. Fairness in Practice

  25. Good ML Practices Go a Long Way

  26. Breadth and Depth Required

  27. Identify product goals Get the right people in the room Identify stakeholders Select a fairness approach Analyze and evaluate your system Mitigate issues Monitor Continuously and Escalation Plans Auditing and Transparency Process Best Practices

  28. Repeat for every new feature, product change, etc.

  29. Consider the complete system end-to-end including people, technology and processes Break your system into components Analyze each component to understand the decisions made and their impact Determine how well it matches up to your selected fairness approach Analyze and evaluate your system

  30. Is an algorithm an ethical solution to our problem? Is algorithm misusable in other contexts? Does the model encourage feedback loops that can produce increasingly unfair outcomes? Does our data include enough minority samples? Is the data skewed? Can we collect more data or reweight? Are there missing/biased features? Was our historical data generated by a biased processed that we reify? Do our labels reinforce stereotypes? Do we need to apply debiasing algorithms to preprocess our data? Engineering for equity during all phases of ML design Are we deploying our model on a population that we did not train/ test on? Is the objective function in line with ethics? Do we need to include fairness constraints in the function? Do our proxies really measure what we think they do? Do we need to model minority populations separately? Have we evaluated the model using relevant fairness metrics? Do our selected fairness metrics capture our customers needs? Can we evaluate the model on other datasets beyond test set? Credit: K. Browne & J. Draper

  31. Measurement Fairness Tooling Preparation Analysis Dissemination

  32. Mitigating Issues Automated Testing and Alerting Auditing and Transparency Open Problems

  33. It’s fragile • We don’t understand the full system • We don’t have good abstractions • Our development tools are primitive • Our process is ad-hoc Fairness is Hard

  34. What Does This Mean for Systems?

  35. Key Open Problems in Applied Fairness

  36. End-to-End Automation Adaptivity • Full ML Life Cycle • Seamless • Fast Understand Sensitive Attributes Automatically Analyze/Test and Surface Issues Context Aware Self Mitigating Integration

  37. Performance • Cost • Fairness and Bias • Privacy • Security • Safety • Robustness Beyond Accuracy

  38. slbird@microsoft.com

More Related