GRU-DF: A Temporal Model with Dynamic Imputation for Missing Target Values in Longitudinal Patient Data

Temporal models are desirable in studying progressive diseases because the data are typically collected at regular time intervals. However, such clinical data often contain many missing entries, including those from the target variable that we are interested in predicting. Standard imputation techni...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE International Conference on Healthcare Informatics. Online) pp. 1 - 7
Main Authors	Zhao, Yijun, Berretta, Matias, Wang, Tong, Chitnis, Tanuja
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2020
Subjects	Data models disease progression gated recurrent unit (GRU) longitudinal data missing value imputation Multiple sclerosis Performance gain Predictive models recurrent neural network (RNN) temporal model time series Time series analysis Training
Online Access	Get full text
ISSN	2575-2634
DOI	10.1109/ICHI48887.2020.9374359

Cover

More Information
Summary:	Temporal models are desirable in studying progressive diseases because the data are typically collected at regular time intervals. However, such clinical data often contain many missing entries, including those from the target variable that we are interested in predicting. Standard imputation techniques (e.g., linear interpolation) are inappropriate in treating missing target observations because they approximate the missing entries before the onset of model training and, thus, would inevitably lead to training a self-fulfilling model. The absence of target observations is particularly problematic for time series data where their availability at each time step is indispensable in building a temporal model. We propose a novel approach that incorporates the missing target value imputation into the training process of the Gated Recurrent Unit (GRU) model. We evaluate our new model in our motivating domain of predicting disease progression of multiple sclerosis patients using a real-world dataset of 508 subjects. The goal is to forecast patients' disability levels based on data collected in six-month intervals. Our model demonstrates a 27.9% performance gain over a GRU model with a standard forward-fill treatment for the missing target observations. Additionally, our model displays a 21.6% advantage over a non-temporal approach for our machine learning task.
ISSN:	2575-2634
DOI:	10.1109/ICHI48887.2020.9374359