Early detection of squamous cell carcinoma of the oral tongue using multidimensional plasma protein analysis and interpretable machine learning
Background Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. Methods Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub‐group...
Saved in:
| Published in | Journal of oral pathology & medicine Vol. 52; no. 7; pp. 637 - 643 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Denmark
Wiley Subscription Services, Inc
01.08.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0904-2512 1600-0714 1600-0714 |
| DOI | 10.1111/jop.13461 |
Cover
| Summary: | Background
Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention.
Methods
Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub‐group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor‐to‐be and all other samples as tumor‐free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP.
Results
Using the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC‐AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person‐specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12).
Conclusion
Using multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 0904-2512 1600-0714 1600-0714 |
| DOI: | 10.1111/jop.13461 |