Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting

[Display omitted] •Right-censored outcomes are common in biomedical prediction problems.•We discuss adapting machine learning (ML) algorithms to these outcomes using IPCW.•IPCW is a general-purpose approach which can be applied to many ML techniques.•ML with IPCW leads to more accurate predictive pr...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical informatics Vol. 61; pp. 119 - 131
Main Authors	Vock, David M., Wolfson, Julian, Bandyopadhyay, Sunayan, Adomavicius, Gediminas, Johnson, Paul E., Vazquez-Benitez, Gabriela, O’Connor, Patrick J.
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.06.2016
Subjects	Algorithms Bayes Theorem Censored data Cluster Analysis Electronic health data Electronic Health Records Frames Health Health care Humans Inverse Inverse probability weighting Machine Learning Mathematical models Patients Probability Risk prediction Survival analysis Weighting Censored data Electronic health data Inverse probability weighting Survival analysis Risk prediction Machine learning
Online Access	Get full text
ISSN	1532-0464 1532-0480 1532-0480
DOI	10.1016/j.jbi.2016.03.009

Cover

More Information
Summary:	[Display omitted] •Right-censored outcomes are common in biomedical prediction problems.•We discuss adapting machine learning (ML) algorithms to these outcomes using IPCW.•IPCW is a general-purpose approach which can be applied to many ML techniques.•ML with IPCW leads to more accurate predictive probabilities than ad hoc approaches. Models for predicting the probability of experiencing various health outcomes or adverse events over a certain time frame (e.g., having a heart attack in the next 5years) based on individual patient characteristics are important tools for managing patient care. Electronic health data (EHD) are appealing sources of training data because they provide access to large amounts of rich individual-level data from present-day patient populations. However, because EHD are derived by extracting information from administrative and clinical databases, some fraction of subjects will not be under observation for the entire time frame over which one wants to make predictions; this loss to follow-up is often due to disenrollment from the health system. For subjects without complete follow-up, whether or not they experienced the adverse event is unknown, and in statistical terms the event time is said to be right-censored. Most machine learning approaches to the problem have been relatively ad hoc; for example, common approaches for handling observations in which the event status is unknown include (1) discarding those observations, (2) treating them as non-events, (3) splitting those observations into two observations: one where the event occurs and one where the event does not. In this paper, we present a general-purpose approach to account for right-censored outcomes using inverse probability of censoring weighting (IPCW). We illustrate how IPCW can easily be incorporated into a number of existing machine learning algorithms used to mine big health care data including Bayesian networks, k-nearest neighbors, decision trees, and generalized additive models. We then show that our approach leads to better calibrated predictions than the three ad hoc approaches when applied to predicting the 5-year risk of experiencing a cardiovascular adverse event, using EHD from a large U.S. Midwestern healthcare system.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 julianw@umn.edu (Julian Wolfson), band0064@umn.edu (Sunayan Bandyopadhyay), gedas@umn.edu (Gediminas Adomavicius), johns021@umn.edu (Paul E. Johnson), gabriela.x.vazquezbenitez@healthpartners.com (Gabriela Vazquez-Benitez), patrick.j.oconnor@healthpartners.com (Patrick J. O’Connor)
ISSN:	1532-0464 1532-0480 1532-0480
DOI:	10.1016/j.jbi.2016.03.009