Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection

[Display omitted] To develop a lossless distributed algorithm for regularized Cox proportional hazards model with variable selection to support federated learning for vertically distributed data. We propose a novel distributed algorithm for fitting regularized Cox proportional hazards model when dat...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 149; p. 104581
Main Authors Miao, Guanhong, Yu, Lei, Yang, Jingyun, Bennett, David A., Zhao, Jinying, Wu, Samuel S.
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.01.2024
Subjects
Online AccessGet full text
ISSN1532-0464
1532-0480
1532-0480
DOI10.1016/j.jbi.2023.104581

Cover

More Information
Summary:[Display omitted] To develop a lossless distributed algorithm for regularized Cox proportional hazards model with variable selection to support federated learning for vertically distributed data. We propose a novel distributed algorithm for fitting regularized Cox proportional hazards model when data sharing among different data providers is restricted. Based on cyclical coordinate descent, the proposed algorithm computes intermediary statistics by each site and then exchanges them to update the model parameters in other sites without accessing individual patient-level data. We evaluate the performance of the proposed algorithm with (1) a simulation study and (2) a real-world data analysis predicting the risk of Alzheimer’s dementia from the Religious Orders Study and Rush Memory and Aging Project (ROSMAP). Moreover, we compared the performance of our method with existing privacy-preserving models. Our algorithm achieves privacy-preserving variable selection for time-to-event data in the vertically distributed setting, without degradation of accuracy compared with a centralized approach. Simulation demonstrates that our algorithm is highly efficient in analyzing high-dimensional datasets. Real-world data analysis reveals that our distributed Cox model yields higher accuracy in predicting the risk of Alzheimer’s dementia than the conventional Cox model built by each data provider without data sharing. Moreover, our algorithm is computationally more efficient compared with existing privacy-preserving Cox models with or without regularization term. The proposed algorithm is lossless, privacy-preserving and highly efficient to fit regularized Cox model for vertically distributed data. It provides a suitable and convenient approach for modeling time-to-event data in a distributed manner.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Guanhong Miao: methodology, formal analysis, investigation, writing - original draft preparation. Lei Yu: writing - review & editing. Jingyun Yang: writing - review & editing. David A. Bennett: supervision, resources, writing - review & editing. Jinying Zhao: supervision, resources, writing - review & editing. Samuel S. Wu: conceptualization, supervision, resources, writing - review & editing, funding acquisition.
ISSN:1532-0464
1532-0480
1532-0480
DOI:10.1016/j.jbi.2023.104581