Data processing pipeline for cardiogenic shock prediction using machine learning

Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patient...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in cardiovascular medicine Vol. 10; p. 1132680
Main Authors Jajcay, Nikola, Bezak, Branislav, Segev, Amitai, Matetzky, Shlomi, Jankova, Jana, Spartalis, Michael, El Tahlawi, Mohammad, Guerra, Federico, Friebel, Julian, Thevathasan, Tharusan, Berta, Imrich, Pölzl, Leo, Nägele, Felix, Pogran, Edita, Cader, F. Aaysha, Jarakovic, Milana, Gollmann-Tepeköylü, Can, Kollarova, Marta, Petrikova, Katarina, Tica, Otilia, Krychtiuk, Konstantin A., Tavazzi, Guido, Skurk, Carsten, Huber, Kurt, Böhm, Allan
Format Journal Article
LanguageEnglish
Published Switzerland Frontiers Media S.A 23.03.2023
Subjects
Online AccessGet full text
ISSN2297-055X
2297-055X
DOI10.3389/fcvm.2023.1132680

Cover

More Information
Summary:Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Edited by: Benedikt Schrage, University Medical Center Hamburg-Eppendorf, Germany
Specialty Section: This article was submitted to Heart Failure and Transplantation, a section of the journal Frontiers in Cardiovascular Medicine
Reviewed by: Stefania Sacchi, San Raffaele Scientific Institute (IRCCS), Italy Meraj Neyazi, University Medical Center Hamburg-Eppendorf, Germany Kishore Surendra, University Medical Center Hamburg-Eppendorf, Germany
ISSN:2297-055X
2297-055X
DOI:10.3389/fcvm.2023.1132680