Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez and Krishna Gummadi
As the use of automated decision making systems becomes wide-spread, there is a growing concern about their potential unfairness towards people with certain traits. Anti-discrimination laws in various countries prohibit unfair treatment of individuals based on specific traits, also called sensitive attributes (e.g., gender, race). In many learning scenarios, the trained algorithms (classifiers) make decisions with certain inaccuracy (misclassification rate). As learning mechanisms target minimizing the error rate for all decisions, it is quite possible that the optimally trained algorithm makes decisions for users belonging to different sensitive attribute groups with different error rates (e.g., decision errors for females are higher than for males). To account for and avoid such unfairness when learning, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. We then propose an intuitive measure of disparate mistreatment for decision boundary-based classifiers, which can be easily incorporated into their formulation as a convex-concave constraint. Experiments on synthetic as well as real world datasets show that our methodology is effective at avoiding disparate mistreatment, often at a small cost in terms of accuracy.