Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests

A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to c...

Full description

Saved in:
Bibliographic Details
Published inJournal of biopharmaceutical statistics Vol. 28; no. 2; pp. 333 - 349
Main Authors Hu, Chen, Steingrimsson, Jon Arni
Format Journal Article
LanguageEnglish
Published England Taylor & Francis 04.03.2018
Taylor & Francis Ltd
Subjects
Online AccessGet full text
ISSN1054-3406
1520-5711
1520-5711
DOI10.1080/10543406.2017.1377730

Cover

More Information
Summary:A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1054-3406
1520-5711
1520-5711
DOI:10.1080/10543406.2017.1377730