Patient contrastive learning: A performant, expressive, and practical approach to electrocardiogram modeling

Supervised machine learning applications in health care are often limited due to a scarcity of labeled training data. To mitigate the effect of small sample size, we introduce a pre-training approach, P atient C ontrastive L earning of R epresentations (PCLR), which creates latent representations of...

Full description

Saved in:

Bibliographic Details
Published in	PLoS computational biology Vol. 18; no. 2; p. e1009862
Main Authors	Diamant, Nathaniel, Reinertsen, Erik, Song, Steven, Aguirre, Aaron D., Stultz, Collin M., Batra, Puneet
Format	Journal Article
Language	English
Published	United States Public Library of Science 01.02.2022 Public Library of Science (PLoS)
Subjects	Analysis Atrial Fibrillation Datasets Deep learning EKG Electrocardiogram Electrocardiography Fibrillation Health care Heart Humans Hypertrophy Machine learning Neural networks Neural Networks, Computer Patients Representations Supervised Machine Learning Training Ventricle United States
Online Access	Get full text
ISSN	1553-7358 1553-734X 1553-7358
DOI	10.1371/journal.pcbi.1009862

Cover

More Information
Summary:	Supervised machine learning applications in health care are often limited due to a scarcity of labeled training data. To mitigate the effect of small sample size, we introduce a pre-training approach, P atient C ontrastive L earning of R epresentations (PCLR), which creates latent representations of electrocardiograms (ECGs) from a large number of unlabeled examples using contrastive learning. The resulting representations are expressive, performant, and practical across a wide spectrum of clinical tasks. We develop PCLR using a large health care system with over 3.2 million 12-lead ECGs and demonstrate that training linear models on PCLR representations achieves a 51% performance increase, on average, over six training set sizes and four tasks (sex classification, age regression, and the detection of left ventricular hypertrophy and atrial fibrillation), relative to training neural network models from scratch. We also compared PCLR to three other ECG pre-training approaches (supervised pre-training, unsupervised pre-training with an autoencoder, and pre-training using a contrastive multi ECG-segment approach), and show significant performance benefits in three out of four tasks. We found an average performance benefit of 47% over the other models and an average of a 9% performance benefit compared to best model for each task. We release PCLR to enable others to extract ECG representations at https://github.com/broadinstitute/ml4h/tree/master/model_zoo/PCLR .
Bibliography:	new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 The authors have declared that no competing interests exist.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1009862