Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern and Aaron Roth
We introduce the study of fairness in the setting of reinforcement learning. We adapt our notion of fairness from Joseph et al. (2016), which requires that over the course of its learning process a learning algorithm never favors a worse action over a better one, with high probability. Working in the setting of Markov Decision Processes, we begin by proving an exponential separation in the time required for fair and unfair algorithms to achieve near- optimal performance. This separation motivates a relaxation to a new notion of approximate fairness, for which we prove more favorable lower and upper bounds.