Examining the Relationship of Breast Cancer Data With Survival Chance and Comparison of Algorithms on Breast Cancer Prediction

This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost...

Full description

Saved in:
Bibliographic Details
Published inInternational Journal of Applied Methods in Electronics and Computers
Main Authors Tiryaki, Ali Murat, Ahmet Can Mermer, Ahmet Can x, Ugurlu, Bora
Format Journal Article
LanguageEnglish
Published 31.03.2025
Online AccessGet full text
ISSN3023-4409
3023-4409
DOI10.58190/ijamec.2025.117

Cover

More Information
Summary:This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost, Random Forest, Support Vector Machines (SVM), and Logistic Regression algorithms were compared. Data preprocessing steps were applied, correlation analysis was performed, and it was determined that the XGBoost algorithm showed the best performance with hyperparameter optimization. The metrics obtained after hyperparameter optimization of the XGBoost algorithm show an overall accuracy of 92%. Optimization has resulted in high performance for class 0 (precision 92%, recall 98%), but the recall for class 1 remains at 54%. The article discusses the effect of data imbalance on the results and offers suggestions for future studies. This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost, Random Forest, Support Vector Machines (SVM), and Logistic Regression algorithms were compared. Data preprocessing steps were applied, correlation analysis was performed, and it was determined that the XGBoost algorithm showed the best performance with hyperparameter optimization. The metrics obtained after hyperparameter optimization of the XGBoost algorithm show an overall accuracy of 92%. Optimization has resulted in high performance for class 0 (precision 92%, recall 98%), but the recall for class 1 remains at 54%. The article discusses the effect of data imbalance on the results and offers suggestions for future studies.
ISSN:3023-4409
3023-4409
DOI:10.58190/ijamec.2025.117