Invariant Policy Learning: A Causal Perspective

Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One re...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 7; pp. 8606 - 8620
Main Authors	Saengkyongam, Sorawit, Thams, Nikolaj, Peters, Jonas, Pfister, Niklas
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Causality contextual bandits distributional shift Extraterrestrial measurements Heuristic algorithms Interactive systems Invariance Invariants Machine learning off-policy learning Particle measurements Random variables Recommender systems Reinforcement learning Training Visualization
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2022.3232363

Cover

More Information
Summary:	Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2022.3232363