Costs and Benefits of Fair Regression
Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a re...
Saved in:
| Main Author | |
|---|---|
| Format | Journal Article |
| Language | English |
| Published |
16.06.2021
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.48550/arxiv.2106.08812 |
Cover
| Summary: | Real-world applications of machine learning tools in high-stakes domains are
often regulated to be fair, in the sense that the predicted target should
satisfy some quantitative notion of parity with respect to a protected
attribute. However, the exact tradeoff between fairness and accuracy with a
real-valued target is not entirely clear. In this paper, we characterize the
inherent tradeoff between statistical parity and accuracy in the regression
setting by providing a lower bound on the error of any fair regressor. Our
lower bound is sharp, algorithm-independent, and admits a simple
interpretation: when the moments of the target differ between groups, any fair
algorithm has to make an error on at least one of the groups. We further extend
this result to give a lower bound on the joint error of any (approximately)
fair algorithm, using the Wasserstein distance to measure the quality of the
approximation. With our novel lower bound, we also show that the price paid by
a fair regressor that does not take the protected attribute as input is less
than that of a fair regressor with explicit access to the protected attribute.
On the upside, we establish the first connection between individual fairness,
accuracy parity, and the Wasserstein distance by showing that if a regressor is
individually fair, it also approximately verifies the accuracy parity, where
the gap is given by the Wasserstein distance between the two groups. Inspired
by our theoretical results, we develop a practical algorithm for fair
regression through the lens of representation learning, and conduct experiments
on a real-world dataset to corroborate our findings. |
|---|---|
| DOI: | 10.48550/arxiv.2106.08812 |