Multi-modality machine learning predicting Parkinson’s disease

Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a mo...

Full description

Saved in:

Bibliographic Details
Published in	NPJ Parkinson's Disease Vol. 8; no. 1; pp. 35 - 13
Main Authors	Makarious, Mary B., Leonard, Hampton L., Vitale, Dan, Iwaki, Hirotaka, Sargent, Lana, Dadu, Anant, Violich, Ivo, Hutchins, Elizabeth, Saffo, David, Bandres-Ciga, Sara, Kim, Jonggeol Jeff, Song, Yeajin, Maleknia, Melina, Bookman, Matt, Nojopranoto, Willy, Campbell, Roy H., Hashemi, Sayed Hadi, Botia, Juan A., Carter, John F., Craig, David W., Van Keuren-Jensen, Kendall, Morris, Huw R., Hardy, John A., Blauwendraat, Cornelis, Singleton, Andrew B., Faghri, Faraz, Nalls, Mike A.
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 01.04.2022 Nature Publishing Group Nature Portfolio
Subjects	631/114/2413 631/208/212 692/499 692/53/2423 Algorithms Artificial intelligence Automation Biobanks Biomarkers Biomedical and Life Sciences Biomedicine Datasets Genomes Genomics Machine learning Neurology Neurosciences Parkinson's disease Preventive medicine Research centers Standard deviation
Online Access	Get full text
ISSN	2373-8057 2373-8057
DOI	10.1038/s41531-022-00288-w

Cover

More Information
Summary:	Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug–gene interactions. We performed automated ML on multimodal data from the Parkinson’s progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2373-8057 2373-8057
DOI:	10.1038/s41531-022-00288-w