Mining multi-center heterogeneous medical data with distributed synthetic learning

Overcoming barriers on the use of multi-center data for medical analytics is challenging due to privacy protection and data heterogeneity in the healthcare system. In this study, we propose the Distributed Synthetic Learning (DSL) architecture to learn across multiple medical centers and ensure the...

Full description

Saved in:

Bibliographic Details
Published in	Nature communications Vol. 14; no. 1; pp. 5510 - 16
Main Authors	Chang, Qi, Yan, Zhennan, Zhou, Mu, Qu, Hui, He, Xiaoxiao, Zhang, Han, Baskaran, Lohendran, Al’Aref, Subhi, Li, Hongsheng, Zhang, Shaoting, Metaxas, Dimitris N.
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 07.09.2023 Nature Publishing Group Nature Portfolio
Subjects	631/114/1305 631/114/2164 631/114/2400 692/700/1421 Angiography Brain cancer Brain tumors Computed tomography Datasets Health care Health care facilities Heterogeneity Histopathology Humanities and Social Sciences Image quality Image segmentation Learning Medical imaging multidisciplinary Performance evaluation Privacy Science Science (multidisciplinary) Synthetic data
Online Access	Get full text
ISSN	2041-1723 2041-1723
DOI	10.1038/s41467-023-40687-y

Cover

More Information
Summary:	Overcoming barriers on the use of multi-center data for medical analytics is challenging due to privacy protection and data heterogeneity in the healthcare system. In this study, we propose the Distributed Synthetic Learning (DSL) architecture to learn across multiple medical centers and ensure the protection of sensitive personal information. DSL enables the building of a homogeneous dataset with entirely synthetic medical images via a form of GAN-based synthetic learning. The proposed DSL architecture has the following key functionalities: multi-modality learning, missing modality completion learning, and continual learning. We systematically evaluate the performance of DSL on different medical applications using cardiac computed tomography angiography (CTA), brain tumor MRI, and histopathology nuclei datasets. Extensive experiments demonstrate the superior performance of DSL as a high-quality synthetic medical image provider by the use of an ideal synthetic quality metric called Dist-FID. We show that DSL can be adapted to heterogeneous data and remarkably outperforms the real misaligned modalities segmentation model by 55% and the temporal datasets segmentation model by 8%. Here the authors present Distributed Synthetic Learning, a system that addresses data privacy, isolated data islands, and heterogeneity concerns in healthcare analytics by learning to generate state-of-the-art synthetic data for downstream tasks.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-023-40687-y