Metalearners for estimating heterogeneous treatment effects using machine learning

There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the condition...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 116; no. 10; pp. 4156 - 4165
Main Authors	Künzel, Sören R., Sekhon, Jasjeet S., Bickel, Peter J., Yu, Bin
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 05.03.2019
Series	PNAS Plus
Subjects	Algorithms Artificial intelligence Bayesian analysis Computer simulation conditional average treatment effect ENGINEERING Estimation Field tests heterogeneous treatment effects Learning algorithms Machine learning minimax optimality Neural networks Observational studies Physical Sciences PNAS Plus Political science Political Sciences randomized controlled trials Regression analysis Response functions Social Sciences Statistics conditional average treatment effect minimax optimality observational studies randomized controlled trials heterogeneous treatment effects
Online Access	Get full text
ISSN	0027-8424 1091-6490 1091-6490
DOI	10.1073/pnas.1804597116

Cover

More Information
Summary:	There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 AC02-05CH11231; N00014-17-1-2176; N00014-15-2367; N00014-16-1-2664; DMS 1713083; W911NF-17-10005; CCF-0939370 USDOE Office of Science (SC) National Science Foundation (NSF) US Department of the Navy, Office of Naval Research (ONR) US Army Research Office (ARO) Contributed by Bin Yu, December 18, 2018 (sent for review March 16, 2018; reviewed by Jake Bowers and Dylan Small) Reviewers: J.B., University of Illinois at Urbana–Champaign; and D.S., Wharton School, University of Pennsylvania. Author contributions: S.R.K., J.S.S., P.J.B., and B.Y. designed research, performed research, analyzed data, and wrote the paper.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1804597116