Early detection of squamous cell carcinoma of the oral tongue using multidimensional plasma protein analysis and interpretable machine learning

Background Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. Methods Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub‐group...

Full description

Saved in:
Bibliographic Details
Published inJournal of oral pathology & medicine Vol. 52; no. 7; pp. 637 - 643
Main Authors Gu, Xiaolian, Salehi, Amir, Wang, Lixiao, Coates, Philip J., Sgaramella, Nicola, Nylander, Karin
Format Journal Article
LanguageEnglish
Published Denmark Wiley Subscription Services, Inc 01.08.2023
Subjects
Online AccessGet full text
ISSN0904-2512
1600-0714
1600-0714
DOI10.1111/jop.13461

Cover

More Information
Summary:Background Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. Methods Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub‐group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor‐to‐be and all other samples as tumor‐free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP. Results Using the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC‐AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person‐specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12). Conclusion Using multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0904-2512
1600-0714
1600-0714
DOI:10.1111/jop.13461