Machine learning algorithms predict breast cancer incidence risk: a data-driven retrospective study based on biochemical biomarkers
Background Current breast cancer prediction models typically rely on personal information and medical history, with limited inclusion of blood-based biomarkers. This study aimed to identify novel breast cancer risk factors using machine learning algorithms. By integrating both personal clinical fact...
Saved in:
| Published in | BMC cancer Vol. 25; no. 1; pp. 1061 - 11 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
BioMed Central
01.07.2025
BioMed Central Ltd Springer Nature B.V BMC |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1471-2407 1471-2407 |
| DOI | 10.1186/s12885-025-14444-x |
Cover
| Summary: | Background
Current breast cancer prediction models typically rely on personal information and medical history, with limited inclusion of blood-based biomarkers. This study aimed to identify novel breast cancer risk factors using machine learning algorithms. By integrating both personal clinical factors and peripheral blood biochemical biomarkers, it sought to enhance the understanding of breast cancer risk.
Methods
Data were screened and normalized according to predefined inclusion and exclusion criteria. Logistic regression with forward selection and six other machine learning algorithms were employed to identify variables associated with breast cancer incidence. The performance of the models was evaluated using the area under the curve (AUC) through 5-fold cross-validation.
Results
The data were divided into a training cohort of 17,360 cases and a testing cohort of 8,551 cases. Logistic regression analysis revealed that breast cancer incidence was increased with age (odds ratio [OR]:1.136, 95% confidence interval [CI]: [1.130, 1.142],
P
< 0.001), gamma-glutamyl transferase (GGT) (OR: 1.002, 95% CI: [1.000, 1.004],
P
= 0.014), and alanine transaminase (ALT) (OR: 1.005, 95% CI: [1.001, 1.008],
P
= 0.008). Furthermore, the six machine learning algorithms consistently identified GGT and ALT as the most significant predictive features. The AUC values obtained from the six models after 5-fold cross-validation ranged from 0.779 to 0.862, with accuracy ranging from 0.780 to 0.841.
Conclusions
Our study identified two biochemical biomarkers (GGT and ALT) as promising indicators for breast cancer prediction. Incorporating these findings into a tailored breast cancer risk prediction model is needed in our future research. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1471-2407 1471-2407 |
| DOI: | 10.1186/s12885-025-14444-x |