Breast cancer diagnosis using supervised machine learning classification algorithms

Breast cancer is a popular disease in women and one of the most serious diseases. Most breast cancer cases cause death and It’s a most dreadful disease for women’s the early diagnosis of breast cancer cases can really help and lead to successful treatment. Machine learning algorithms with medical im...

Full description

Saved in:

Bibliographic Details
Published in	AIP conference proceedings Vol. 2839; no. 1
Main Authors	Fanoos, Zahra Q., Abdulhadi, Jumana
Format	Journal Article Conference Proceeding
Language	English
Published	Melville American Institute of Physics 29.09.2023
Subjects	Algorithms Artificial neural networks Breast cancer Classification Classifiers Data compression Datasets Decision trees Deep learning Diagnosis Discriminant analysis Independent component analysis Linear algebra Machine learning Medical imaging Neural networks Performance measurement Principal components analysis Scientific visualization Supervised learning Support vector machines Visualization
Online Access	Get full text
ISSN	0094-243X 1551-7616
DOI	10.1063/5.0167964

Cover

More Information
Summary:	Breast cancer is a popular disease in women and one of the most serious diseases. Most breast cancer cases cause death and It’s a most dreadful disease for women’s the early diagnosis of breast cancer cases can really help and lead to successful treatment. Machine learning algorithms with medical imaging data are broadly used to accurately diagnose breast cancer. In our research, we use Wisconsin breast cancer dataset, we preprocess and prepare the data and then apply multiple classifiers such as: Logistic Regression, K nearest neighbor (KNN), Support Vector Classifier, Naive Bayes, Decision Tree and Artificial Neural Network (ANN) on the data to predict the diagnosis of a patient and find the best algorithm with best classification results. Then, we used classifiers for the identification of the image as either Benign or Malignant. Thus, we recorded the classifier’s operation and noted that the Neural Network classifier was operating very well with an accurate diagnosis of 98% than the Support Vector Machine classifier. We try multiple methods of dimensionality reduction/data compression to produce other smaller datasets and more efficient such as: Principal component analysis (PCA), Independent Component Analysis (ICA), Linear discriminant analysis (LDA), an Autoencoder. Then we compare the results and make some visualization to make clear comparisons and differentiate between the used algorithms and techniques. We use various performance metrics: accuracy, recall, precision and f1-score. The measures are close to each other so we focus on accuracy and f1-score. The code is developed using python and some of its most powerful and common libraries such as: numpy for mathematical operations and linear algebra, pandas for data manipulation, sci-kit learn for machine learning algorithms and techniques, tensorflow and keras for deep learning and neural networks functionality, to deal with the user’s files, matplotlib and seaborn for data visualization and polishing.
Bibliography:	ObjectType-Conference Proceeding-1 SourceType-Conference Papers & Proceedings-1 content type line 21
ISSN:	0094-243X 1551-7616
DOI:	10.1063/5.0167964