Resources

Scholarship

  • Barreno, Marco, Blaine Nelson, Anthony D Joseph, and J D Tygar. “The Security of Machine Learning.” Machine Learning 81, no. 2 (May 20, 2010): 121–48. doi:10.1007/s10994-010-5188-5
  • Berendt, Bettina, and Soren Preibusch. “Better Decision Support Through Exploratory Discrimination-Aware Data Mining: Foundations and Empirical Evidence.” Artificial Intelligence and Law 22, no. 2 (January 10, 2014): 175–209. doi:10.1007/s10506-013-9152-0
  • Berendt, Bettina, and Soren Preibusch. “Exploring Discrimination: a User-Centric Evaluation of Discrimination-Aware Data Mining.” 2012 IEEE 12th International Conference on Data Mining Workshops (December 10, 2012): 344–351. doi:10.1109/ICDMW.2012.109
  • Calders, Toon, and Sicco Verwer. “Three Naive Bayes Approaches for Discrimination-Free Classification.” Data Mining and Knowledge Discovery 21, no. 2 (July 27, 2010): 277–292. doi:10.1007/s10618-010-0190-x
  • Calders, Toon, Faisal Kamiran, and Mykola Pechenizkiy. “Building Classifiers with Independency Constraints.” 2009 IEEE 9th International Conference on Data Mining Workshops (December 6, 2009): 13–18. doi:10.1109/ICDMW.2009.83
  • Custers, Bart, and Bart Schermer. "Responsibly Innovating Data Mining and Profiling Tools: A New Approach to Discrimination Sensitive and Privacy Sensitive Attributes." Responsible Innovation 1: Innovative Solutions for Global Issues. (2014): 335-350. doi:10.1007/978-94-017-8956-1_19
  • Custers, Bart, Toon Calders, Bart Schermer, and Tal Z Zarsky. Discrimination and Privacy in the Information Society: Data Mining and Profiling in Large Databases. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. doi:10.1007/978-3-642-30487-3
  • Datta, Anupam, Shayak Sen, and Yair Zick. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems, Proceedings of 37th IEEE Symposium on Security and Privacy (May 2016). https://www.andrew.cmu.edu/user/danupam/datta-sen-zick-oakland16.pdf
  • DeDeo, Simon. "Wrong Side of the Tracks: Big Data and Protected Categories" (May 27, 2015). arXiv:1412.4643v2
  • Dwork, Cynthia, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. “Fairness Through Awareness,” 2012 Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (January 9, 2012): 214–226. doi:10.1145/2090236.2090255
  • El-Arini, Khalid, Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, and Blaise Agüera y Arcas. “Transparent User Models for Personalization,” 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (August 8, 2012): 678-686. doi:10.1145/2339530.2339639
  • Feldman, Michael, Sorelle Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. "Certifying and Removing Disparate Impact" (July 16, 2015). arXiv:1412.3756v3
  • Freitas, Alex A. “Comprehensible Classification Models - A Position Paper.” ACM SIGKDD Explorations Newsletter 15, no. 1 (March 17, 2014): 1–10. doi:10.1145/2594473.2594475
  • Hajian, Sara, and Josep Domingo-Ferrer. “A Methodology for Direct and Indirect Discrimination Prevention in Data Mining.” IEEE Transactions on Knowledge and Data Engineering 25, no. 7 (May 21, 2013): 1445–1459. doi:10.1109/TKDE.2012.72
  • Hajian, Sara, and Josep Domingo-Ferrer. “A Study on the Impact of Data Anonymization on Anti-Discrimination.” 2012 IEEE 12th International Conference on Data Mining Workshops (December 10, 2012): 352–359. doi:10.1109/ICDMW.2012.19
  • Hajian, Sara, Josep Domingo-Ferrer, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. "Discrimination- and Privacy-Aware Patterns." Data Mining and Knowledge Discovery. Forthcoming. doi:10.1007/s10618-014-0393-7
  • Hajian, Sara, Anna Monreale, Dino Pedreschi, Josep Domingo-Ferrer, and Fosca Giannotti. “Fair Pattern Discovery.” 2014 Proceedings of the 29th Annual ACM Symposium on Applied Computing (March 24, 2014): 113-120 doi:10.1145/2554850.2555043
  • Hajian, Sara, Anna Monreale, Dino Pedreschi, Josep Domingo-Ferrer, and Fosca Giannotti. “Injecting Discrimination and Privacy Awareness Into Pattern Discovery.” 2012 IEEE 12th International Conference on Data Mining Workshops (December 10, 2012): 360–369, doi:10.1109/ICDMW.2012.51
  • Hajian, Sara, Josep Domingo-Ferrer, and Antoni Martinez-Balleste. “Discrimination Prevention in Data Mining for Intrusion and Crime Detection.” 2011 IEEE Symposium on Computational Intelligence in Cyber Security (April 11-15, 2011): 47–54. doi:10.1109/CICYBS.2011.5949405
  • Hajian, Sara, Josep Domingo-Ferrer, and Oriol Farràs. “Generalization-Based Privacy Preservation and Discrimination Prevention in Data Publishing and Mining.” Data Mining and Knowledge Discovery 28, no. 5-6 (January 25, 2014): 1158-1188. doi:10.1007/s10618-014-0346-1
  • Hajian, Sara. “Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining.” PhD Thesis. Universitat Rovira i Virgili. (June 28, 2013). arXiv:1306.6805
  • Herlocker, Jonathan L, Joseph A Konstan, and John Riedl. “Explaining Collaborative Filtering Recommendations,” Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work. (December 1, 2000): 241-250. doi:10.1145/358916.358995.
  • Kamiran, Faisal, and Toon Calders. “Classifying Without Discriminating.” 2009 2nd International Conference on Computer, Control and Communication. (17-18, February 2009): 1–6. doi:10.1109/IC4.2009.4909197
  • Kamiran, Faisal, and Toon Calders. “Data Preprocessing Techniques for Classification Without Discrimination.” Knowledge and Information Systems 33, no. 1 (December 3, 2011): 1–33. doi:10.1007/s10115-011-0463-8
  • Kamiran, Faisal, Asim Karim, Sicco Verwer, and Heike Goudriaan. “Classifying Socially Sensitive Data Without Discrimination: an Analysis of a Crime Suspect Dataset.” 2012 IEEE 12th International Conference on Data Mining Workshops (December 10, 2012): 370–377. doi:10.1109/ICDMW.2012.117
  • Kamiran, Faisal, Indrė Žliobaitė, and Toon Calders. “Quantifying Explainable Discrimination and Removing Illegal Discrimination in Automated Decision Making.” Knowledge and Information Systems 35, no. 3 (November 18, 2012): 613–44. doi:10.1007/s10115-012-0584-8
  • Kamiran, Faisal, Toon Calders, and Mykola Pechenizkiy. “Discrimination Aware Decision Tree Learning.” 2010 IEEE 10th International Conference on Data Mining (December 13-17, 2010): 869–874. doi:10.1109/ICDM.2010.50
  • Kamishima, Toshihiro, Shotaro Akaho, and Jun Sakuma. “Fairness-Aware Learning Through Regularization Approach.” 2011 IEEE 11th International Conference on Data Mining Workshops (December 11, 2011): 643–650. doi:10.1109/ICDMW.2011.83
  • Kamishima, Toshihiro, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. “Considerations on Fairness-Aware Data Mining." 2012 IEEE 12th International Conference on Data Mining Workshops. (December 10, 2012): 378–385. doi:10.1109/ICDMW.2012.101
  • Kamishima, Toshihiro, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. “Fairness-Aware Classifier with Prejudice Remover Regularizer.” 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. (September 24-28, 2012): 35–50. doi:10.1007/978-3-642-33486-3_3
  • Kamishima, Toshihiro, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. “The Independence of Fairness-Aware Classifiers." 2013 IEEE 13th International Conference on Data Mining Workshops. (December 7-10, 2013): 849–58. doi:10.1109/ICDMW.2013.133
  • Letham, Benjamin, Cynthia Rudin, Tyler McCormick and David Madigan. "Building Interpretable Classifiers with Rules using Bayesian Analysis: Building a Better Stroke Prediction Model." (August 2013). http://web.mit.edu/rudin/www/LethamRuMcMa14.pdf
  • Lowd, Daniel, and Christopher Meek. “Adversarial Learning,” 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. (August 21, 2005): 641-647. doi:10.1145/1081870.1081950
  • Luong, Binh Thanh, Salvatore Ruggieri, and Franco Turini. “K-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention.” 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (July 21, 2011): 502–510. doi:10.1145/2020408.2020488
  • Mancuhan, Koray, and Chris Clifton. “Combating Discrimination Using Bayesian Networks.” Artificial Intelligence and Law 22, no. 2 (February 17, 2014): 211–238. doi:10.1007/s10506-014-9156-4
  • Mancuhan, Koray, and Chris Clifton. “Discriminatory Decision Policy Aware Classification." 2012 IEEE 12th International Conference on Data Mining Workshops. (December 10, 2012): 386–393. doi:10.1109/ICDMW.2012.96
  • Martens, David, and Bart Baesens, “Building Acceptable Classification Models.” Annals of Information Systems 8 (2010): 53-74. doi:10.1007/978-1-4419-1280-0_3
  • Martens, David, and Foster Provost, “Explaining Data-Driven Document Classifications,” MIS Quarterly 38, no. 1 (March 2014): 73–99. http://misq.org/explaining-data-driven-document-classifications.html
  • Martens, David, Jan Vanthienen, Wouter Verbeke, and Bart Baesens. "Performance of Classification Models from a User Perspective." Decision Support Systems 51, no. 4 (November 2011): 782–793. doi:10.1016/j.dss.2011.01.01
  • Mascetti, Sergio, Annarita Ricci, and Salvatore Ruggieri. “Introduction to Special Issue on Computational Methods for Enforcing Privacy and Fairness in the Knowledge Society.” Artificial Intelligence and Law 22, no. 2 (February 11, 2014): 109–11. doi:10.1007/s10506-014-9153-7
  • Moritz Hardt, A Study of Privacy and Fairness in Sensitive Data Analysis, PhD Thesis, Princeton University (2011) http://arks.princeton.edu/ark:/88435/dsp01vq27zn422
  • Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. “A Study of Top-K Measures for Discrimination Discovery.” 2012 Proceedings of the 27th Annual ACM Symposium on Applied Computing. (March 26, 2012) 126–131. doi:10.1145/2245276.2245303
  • Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. “Measuring Discrimination in Socially-Sensitive Decision Records.” Proceedings of the 2009 SIAM International Conference on Data Mining. (2009): 581-592. doi:10.1137/1.9781611972795.50
  • Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. “Discrimination-Aware Data Mining.” 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (July 24, 2008): 560-568. doi:10.1145/1401890.1401959
  • Pope, Devin G, and Justin R Sydnor. “Implementing Anti-Discrimination Policies in Statistical Profiling Models.” American Economic Journal: Economic Policy 3, no. 3 (August 2011): 206–31. doi:10.1257/pol.3.3.206
  • Romei, Andrea, and Salvatore Ruggieri. “A Multidisciplinary Survey on Discrimination Analysis.” The Knowledge Engineering Review 29, no. 5 (April 3, 2013): 1–57. doi:10.1017/S0269888913000039
  • Romei, Andrea, Salvatore Ruggieri, and Franco Turini. “Discovering Gender Discrimination in Project Funding." 2012 IEEE 12th International Conference on Data Mining Workshops. (December 10, 2012): 394–401. doi:10.1109/ICDMW.2012.39
  • Romei, Andrea, Salvatore Ruggieri, and Franco Turini. “Discrimination Discovery in Scientific Project Evaluation: A Case Study.” Expert Systems with Applications 40, no. 15 (November 2013): 6064–79. doi:10.1016/j.eswa.2013.05.016
  • Ruggieri, Salvatore, Dino Pedreschi, and Franco Turini. “Data Mining for Discrimination Discovery.” ACM Transactions on Knowledge Discovery From Data 4, no. 2 (May 1, 2010): 1–40. doi:10.1145/1754428.1754432
  • Ruggieri, Salvatore, Dino Pedreschi, and Franco Turini. “DCUBE: Discrimination Discovery in Databases." Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (June 6, 2010): 1127–1130. doi:10.1145/1807167.1807298
  • Ruggieri, Salvatore, Dino Pedreschi, and Franco Turini. “Integrating Induction and Deduction for Finding Evidence of Discrimination.” Artificial Intelligence and Law 18, no. 1 (June 5, 2010): 1–43. doi:10.1007/s10506-010-9089-5
  • Ruggieri, Salvatore, Hajian, Sara, Faisal Kamiran, and Xiangliang Zhang. “Anti-discrimination Analysis Using Privacy Attack Strategies.” 2014 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2014): 694-710. doi:10.1007/978-3-662-44851-9_44
  • Ruggieri, Salvatore. “Data Anonymity Meets Non-Discrimination.” 2013 IEEE 13th International Conference on Data Mining Workshops (December 7-10, 2013): 875–882. doi:10.1109/ICDMW.2013.56
  • Sinha, Rashmi, and Kirsten Swearingen. “The Role of Transparency in Recommender Systems.” CHI '02 Extended Abstracts on Human Factors in Computing Systems. (April 20, 2002): 830-831. doi:10.1145/506443.506619.
  • Ustun, Berk, and Cynthia Rudin. "Methods and Models for Interpretable Linear Classification" (October 1, 2014). arXiv:1405.4047
  • Zemel, Rich, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. “Learning Fair Representations.” 30th International Conference on Machine Learning (June 16-21, 2013) http://jmlr.org/proceedings/papers/v28/zemel13.html
  • Žliobaitė, Indre, Faisal Kamiran, and Toon Calders. “Handling Conditional Discrimination." 2011 IEEE 11th International Conference on Data Mining. (December 11-14, 2011): 992–1001. doi:10.1109/ICDM.2011.72

Principles for Accountable Algorithms and a Social Impact Statement for Algorithms

Principles for Accountable Algorithms

Automated decision making algorithms are now used throughout industry and government, underpinning many processes from dynamic pricing to employment practices to criminal sentencing. Given that such algorithmically informed decisions have the potential for significant societal impact, the goal of this document is to help developers and product managers design and implement algorithmic systems in publicly accountable ways. Accountability in this context includes an obligation to report, explain, or justify algorithmic decision-making as well as mitigate any negative social impacts or potential harms.

We begin by outlining five equally important guiding principles that follow from this premise:

Algorithms and the data that drive them are designed and created by people -- There is always a human ultimately responsible for decisions made or informed by an algorithm. "The algorithm did it" is not an acceptable excuse if algorithmic systems make mistakes or have undesired consequences, including from machine-learning processes.

Responsibility

Make available externally visible avenues of redress for adverse individual or societal effects of an algorithmic decision system, and designate an internal role for the person who is responsible for the timely remedy of such issues.

Explainability

Ensure that algorithmic decisions as well as any data driving those decisions can be explained to end-users and other stakeholders in non-technical terms.

Accuracy

Identify, log, and articulate sources of error and uncertainty throughout the algorithm and its data sources so that expected and worst case implications can be understood and inform mitigation procedures.

Auditability

Enable interested third parties to probe, understand, and review the behavior of the algorithm through disclosure of information that enables monitoring, checking, or criticism, including through provision of detailed documentation, technically suitable APIs, and permissive terms of use.

Fairness

Ensure that algorithmic decisions do not create discriminatory or unjust impacts when comparing across different demographics (e.g. race, sex, etc).

We have left some of the terms above purposefully under-specified to allow these principles to be broadly applicable. Applying these principles well should include understanding them within a specific context. We also suggest that these issues be revisited and discussed throughout the design, implementation, and release phases of development. Two important principles for consideration were purposefully left off of this list as they are well-covered elsewhere: privacy and the impact of human experimentation. We encourage you to incorporate those issues into your overall assessment of algorithmic accountability as well.

Social Impact Statement for Algorithms

In order to ensure their adherence to these principles and to publicly commit to associated best practices, we propose that algorithm creators develop a Social Impact Statement using the above principles as a guiding structure. This statement should be revisited and reassessed (at least) three times during the design and development process:

  • design stage,
  • pre-launch,
  • and post-launch.

When the system is launched, the statement should be made public as a form of transparency so that the public has expectations for social impact of the system.

The Social Impact Statement should minimally answer the questions below. Included below are concrete steps that can be taken, and documented as part of the statement, to address these questions. These questions and steps make up an outline of such a social impact statement.

Responsibility

Guiding Questions

  • Who is responsible if users are harmed by this product?
  • What will the reporting process and process for recourse be?
  • Who will have the power to decide on necessary changes to the algorithmic system during design stage, pre-launch, and post-launch?

Initial Steps to Take

  • Determine and designate a person who will be responsible for the social impact of the algorithm.
  • Make contact information available so that if there are issues it’s clear to users how to proceed
  • Develop a plan for what to do if the project has unintended consequences. This may be part of a maintenance plan and should involve post-launch monitoring plans.
  • Develop a sunset plan for the system to manage algorithm or data risks after the product is no longer in active development.

Explainability

Guiding Questions

  • Who are your end-users and stakeholders?
  • How much of your system / algorithm can you explain to your users and stakeholders?
  • How much of the data sources can you disclose?

Initial Steps to Take

  • Have a plan for how decisions will be explained to users and subjects of those decisions. In some cases it may be appropriate to develop an automated explanation for each decision.
  • Allow data subjects visibility into the data you store about them and access to a process in order to change it.
  • If you are using a machine-learning model:

    • consider whether a directly interpretable or explainable model can be used.
    • describe the training data including how, when, and why it was collected and sampled.
    • describe how and when test data about an individual that is used to make a decision is collected or inferred.
  • Disclose the sources of any data used and as much as possible about the specific attributes of the data. Explain how the data was cleaned or otherwise transformed.

Accuracy

Guiding Questions

  • What sources of error do you have and how will you mitigate their effect?
  • How confident are the decisions output by your algorithmic system?
  • What are realistic worst case scenarios in terms of how errors might impact society, individuals, and stakeholders?
  • Have you evaluated the provenance and veracity of data and considered alternative data sources?

Initial Steps to Take

  • Assess the potential for errors in your system and the resulting potential for harm to users.
  • Undertake a sensitivity analysis to assess how uncertainty in the output of the algorithm relates to uncertainty in the inputs.
  • Develop a process by which people can correct errors in input data, training data, or in output decisions.
  • Perform a validity check by randomly sampling a portion of your data (e.g., input and/or training data) and manually checking its correctness. This check should be performed early in your development process before derived information is used. Report the overall data error rate on this random sample publicly.
  • Determine how to communicate the uncertainty / margin of error for each decision.

Auditability

Guiding Questions

  • Can you provide for public auditing (i.e. probing, understanding, reviewing of system behavior) or is there sensitive information that would necessitate auditing by a designated 3rd party?
  • How will you facilitate public or third-party auditing without opening the system to unwarranted manipulation?

Initial Steps to Take

  • Document and make available an API that allows third parties to query the algorithmic system and assess its response.
  • Make sure that if data is needed to properly audit your algorithm, such as in the case of a machine-learning algorithm, that sample (e.g., training) data is made available.
  • Make sure your terms of service allow the research community to perform automated public audits.
  • Have a plan for communication with outside parties that may be interested in auditing your algorithm, such as the research and development community.

Fairness

Guiding Questions

  • Are there particular groups which may be advantaged or disadvantaged, in the context in which you are deploying, by the algorithm / system you are building?
  • What is the potential damaging effect of uncertainty / errors to different groups?

Initial Steps to Take

  • Talk to people who are familiar with the subtle social context in which you are deploying. For example, you should consider whether the following aspects of people’s identities will have impacts on their equitable access to and results from your system:

  • Race

  • Sex
  • Gender identity
  • Ability status
  • Socio-economic status
  • Education level
  • Religion
  • Country of origin

  • If you are building an automated decision-making tool, you should deploy a fairness-aware data mining algorithm. (See, e.g., the resources gathered at http://fatml.org).

  • Calculate the error rates and types (e.g., false positives vs. false negatives) for different sub-populations and assess the potential differential impacts.

Authors

Nicholas Diakopoulos, University of Maryland, College Park

Sorelle Friedler, Haverford College

Marcelo Arenas, Pontificia Universidad Catolica de Chile, CL

Solon Barocas, Microsoft Research

Michael Hay, Colgate University

Bill Howe, University of Washington

H. V. Jagadish, University of Michigan

Kris Unsworth, Drexel University

Arnaud Sahuguet, Cornell Tech

Suresh Venkatasubramanian, University of Utah

Christo Wilson, Northeastern University

Cong Yu, Google

Bendert Zevenbergen, University of Oxford

Principles and Best Practices

Projects

Events

papers-2017

from_parity_to_preference_notions_of_fairness

better_fair_algorithms_for_contextual_bandits

better_fair_algorithms_for_infinite_contextual_bandits

convex_framework_for_fair_regression

reductions_approach_to_fair_classification

recidivism_prediction_and_predictive_policing.pdf

Fair Clustering Through Fairlets

Decoupled Classifiers for Fair and Efficient Machine Learning