Censoring Sensitivity Analysis for Benchmarking Survival Machine Learning Methods

(1) Background: Survival analysis models in clinical research must effectively handle censored data, where complete survival times are unknown for some subjects. While established methodologies exist for validating standard machine learning models, current benchmarking approaches rarely assess model...

Full description

Saved in:

Bibliographic Details
Published in	Sci Vol. 7; no. 1; p. 18
Main Authors	Báskay, János, Mezei, Tamás, Banczerowski, Péter, Horváth, Anna, Joó, Tamás, Pollner, Péter
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.03.2025
Subjects	Algorithms Clinical outcomes Datasets Deep learning Electronic health records gradient-boosted survival trees Machine learning mixture density networks Neural networks random survival forests Sensitivity analysis Survival analysis
Online Access	Get full text
ISSN	2413-4155 2413-4155
DOI	10.3390/sci7010018

Cover

More Information
Summary:	(1) Background: Survival analysis models in clinical research must effectively handle censored data, where complete survival times are unknown for some subjects. While established methodologies exist for validating standard machine learning models, current benchmarking approaches rarely assess model robustness under varying censoring conditions. This limitation creates uncertainty about model reliability in real-world applications where censoring patterns may differ from training data. We address this gap by introducing a systematic benchmarking methodology focused on censoring sensitivity. (2) Methods: We developed a benchmarking framework that assesses survival models through controlled modification of censoring conditions. Five models were evaluated: Cox proportional hazards, survival tree, random survival forest, gradient-boosted survival analysis, and mixture density networks. The framework systematically reduced observation periods and increased censoring rates while measuring performance through multiple metrics following Bayesian hyperparameter optimization. (3) Results: Model performance showed greater sensitivity to increased censoring rates than to reduced observation periods. Non-linear models, especially mixture density networks, exhibited higher vulnerability to data quality degradation. Statistical comparisons became increasingly challenging with higher censoring rates due to widened confidence intervals. (4) Conclusions: Our methodology provides a new standard for evaluating survival analysis models, revealing the critical impact of censoring on model performance. These findings offer practical guidance for model selection and development in clinical applications, emphasizing the importance of robust censoring handling strategies.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2413-4155 2413-4155
DOI:	10.3390/sci7010018