Machine learning explainability via microaggregation and shallow decision trees
Artificial intelligence (AI) is being deployed in missions that are increasingly critical for human life. To build trust in AI and avoid an algorithm-based authoritarian society, automated decisions should be explainable. This is not only a right of citizens, enshrined for example in the European Ge...
Saved in:
Published in | Knowledge-based systems Vol. 194; p. 105532 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Amsterdam
Elsevier B.V
22.04.2020
Elsevier Science Ltd |
Subjects | |
Online Access | Get full text |
ISSN | 0950-7051 1872-7409 |
DOI | 10.1016/j.knosys.2020.105532 |
Cover
Summary: | Artificial intelligence (AI) is being deployed in missions that are increasingly critical for human life. To build trust in AI and avoid an algorithm-based authoritarian society, automated decisions should be explainable. This is not only a right of citizens, enshrined for example in the European General Data Protection Regulation, but a desirable goal for engineers, who want to know whether the decision algorithms are capturing the relevant features. For explainability to be scalable, it should be possible to derive explanations in a systematic way. A common approach is to use simpler, more intuitive decision algorithms to build a surrogate model of the black-box model (for example a deep learning algorithm) used to make a decision. Yet, there is a risk that the surrogate model is too large for it to be really comprehensible to humans. We focus on explaining black-box models by using decision trees of limited depth as a surrogate model. Specifically, we propose an approach based on microaggregation to achieve a trade-off between the comprehensibility and the representativeness of the surrogate model on the one side and the privacy of the subjects used for training the black-box model on the other side.
•We give explanations of deep learning decisions using shallow decision trees.•Decision trees are computed on clusters obtained via microaggregation.•The cluster size trades off comprehensibility, representativeness and privacy.•We present experiments on large numerical and categorical data sets.•For categorical data sets, we use ontologies for semantic consistency. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2020.105532 |