Data processing pipeline for cardiogenic shock prediction using machine learning

Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patient...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in cardiovascular medicine Vol. 10; p. 1132680
Main Authors	Jajcay, Nikola, Bezak, Branislav, Segev, Amitai, Matetzky, Shlomi, Jankova, Jana, Spartalis, Michael, El Tahlawi, Mohammad, Guerra, Federico, Friebel, Julian, Thevathasan, Tharusan, Berta, Imrich, Pölzl, Leo, Nägele, Felix, Pogran, Edita, Cader, F. Aaysha, Jarakovic, Milana, Gollmann-Tepeköylü, Can, Kollarova, Marta, Petrikova, Katarina, Tica, Otilia, Krychtiuk, Konstantin A., Tavazzi, Guido, Skurk, Carsten, Huber, Kurt, Böhm, Allan
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 23.03.2023
Subjects	cardiogenic shock Cardiovascular Medicine classification machine learning missing data imputation prediction model processing pipeline processing pipeline cardiogenic shock prediction model classification machine learning missing data imputation
Online Access	Get full text
ISSN	2297-055X 2297-055X
DOI	10.3389/fcvm.2023.1132680

Cover

More Information
Summary:	Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Benedikt Schrage, University Medical Center Hamburg-Eppendorf, Germany Specialty Section: This article was submitted to Heart Failure and Transplantation, a section of the journal Frontiers in Cardiovascular Medicine Reviewed by: Stefania Sacchi, San Raffaele Scientific Institute (IRCCS), Italy Meraj Neyazi, University Medical Center Hamburg-Eppendorf, Germany Kishore Surendra, University Medical Center Hamburg-Eppendorf, Germany
ISSN:	2297-055X 2297-055X
DOI:	10.3389/fcvm.2023.1132680