Examining the Relationship of Breast Cancer Data With Survival Chance and Comparison of Algorithms on Breast Cancer Prediction

This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost...

Full description

Saved in:

Bibliographic Details
Published in	International Journal of Applied Methods in Electronics and Computers
Main Authors	Tiryaki, Ali Murat, Ahmet Can Mermer, Ahmet Can x, Ugurlu, Bora
Format	Journal Article
Language	English
Published	31.03.2025
Online Access	Get full text
ISSN	3023-4409 3023-4409
DOI	10.58190/ijamec.2025.117

Cover

More Information
Summary:	This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost, Random Forest, Support Vector Machines (SVM), and Logistic Regression algorithms were compared. Data preprocessing steps were applied, correlation analysis was performed, and it was determined that the XGBoost algorithm showed the best performance with hyperparameter optimization. The metrics obtained after hyperparameter optimization of the XGBoost algorithm show an overall accuracy of 92%. Optimization has resulted in high performance for class 0 (precision 92%, recall 98%), but the recall for class 1 remains at 54%. The article discusses the effect of data imbalance on the results and offers suggestions for future studies. This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost, Random Forest, Support Vector Machines (SVM), and Logistic Regression algorithms were compared. Data preprocessing steps were applied, correlation analysis was performed, and it was determined that the XGBoost algorithm showed the best performance with hyperparameter optimization. The metrics obtained after hyperparameter optimization of the XGBoost algorithm show an overall accuracy of 92%. Optimization has resulted in high performance for class 0 (precision 92%, recall 98%), but the recall for class 1 remains at 54%. The article discusses the effect of data imbalance on the results and offers suggestions for future studies.
ISSN:	3023-4409 3023-4409
DOI:	10.58190/ijamec.2025.117