CCS Predictor 2.0: An Open-Source Jupyter Notebook Tool for Filtering Out False Positives in Metabolomics

Metabolite annotation continues to be the widely accepted bottleneck in nontargeted metabolomics workflows. Annotation of metabolites typically relies on a combination of high-resolution mass spectrometry (MS) with parent and tandem measurements, isotope cluster evaluations, and Kendrick mass defect...

Full description

Saved in:
Bibliographic Details
Published inAnalytical chemistry (Washington) Vol. 94; no. 50; pp. 17456 - 17466
Main Authors Rainey, Markace A., Watson, Chandler A., Asef, Carter K., Foster, Makayla R., Baker, Erin S., Fernández, Facundo M.
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 20.12.2022
Subjects
Online AccessGet full text
ISSN0003-2700
1520-6882
1520-6882
DOI10.1021/acs.analchem.2c03491

Cover

More Information
Summary:Metabolite annotation continues to be the widely accepted bottleneck in nontargeted metabolomics workflows. Annotation of metabolites typically relies on a combination of high-resolution mass spectrometry (MS) with parent and tandem measurements, isotope cluster evaluations, and Kendrick mass defect (KMD) analysis. Chromatographic retention time matching with standards is often used at the later stages of the process, which can also be followed by metabolite isolation and structure confirmation utilizing nuclear magnetic resonance (NMR) spectroscopy. The measurement of gas-phase collision cross-section (CCS) values by ion mobility (IM) spectrometry also adds an important dimension to this workflow by generating an additional molecular parameter that can be used for filtering unlikely structures. The millisecond timescale of IM spectrometry allows the rapid measurement of CCS values and allows easy pairing with existing MS workflows. Here, we report on a highly accurate machine learning algorithm (CCSP 2.0) in an open-source Jupyter Notebook format to predict CCS values based on linear support vector regression models. This tool allows customization of the training set to the needs of the user, enabling the production of models for new adducts or previously unexplored molecular classes. CCSP produces predictions with accuracy equal to or greater than existing machine learning approaches such as CCSbase, DeepCCS, and AllCCS, while being better aligned with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Another unique aspect of CCSP 2.0 is its inclusion of a large library of 1613 molecular descriptors via the Mordred Python package, further encoding the fine aspects of isomeric molecular structures. CCS prediction accuracy was tested using CCS values in the McLean CCS Compendium with median relative errors of 1.25, 1.73, and 1.87% for the 170 [M – H]−, 155 [M + H]+, and 138 [M + Na]+ adducts tested. For superclass-matched data sets, CCS predictions via CCSP allowed filtering of 36.1% of incorrect structures while retaining a total of 100% of the correct annotations using a ΔCCS threshold of 2.8% and a mass error of 10 ppm.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
M.R. and C.W. developed and optimized the CCSP 2.0 machine learning algorithm. C.K.A. and M.F. provided input on the platform’s user interactivity. M.R. evaluated CCSP 2.0 prediction quality and completed performance comparisons across CCS prediction platforms. F.M.F. and E.S.B. supervised the algorithm development and performance evaluations, and M.R. and F.M.F. wrote the manuscript.
Author Contributions
ISSN:0003-2700
1520-6882
1520-6882
DOI:10.1021/acs.analchem.2c03491