Breast cancer diagnosis using supervised machine learning classification algorithms

Breast cancer is a popular disease in women and one of the most serious diseases. Most breast cancer cases cause death and It’s a most dreadful disease for women’s the early diagnosis of breast cancer cases can really help and lead to successful treatment. Machine learning algorithms with medical im...

Full description

Saved in:
Bibliographic Details
Published inAIP conference proceedings Vol. 2839; no. 1
Main Authors Fanoos, Zahra Q., Abdulhadi, Jumana
Format Journal Article Conference Proceeding
LanguageEnglish
Published Melville American Institute of Physics 29.09.2023
Subjects
Online AccessGet full text
ISSN0094-243X
1551-7616
DOI10.1063/5.0167964

Cover

More Information
Summary:Breast cancer is a popular disease in women and one of the most serious diseases. Most breast cancer cases cause death and It’s a most dreadful disease for women’s the early diagnosis of breast cancer cases can really help and lead to successful treatment. Machine learning algorithms with medical imaging data are broadly used to accurately diagnose breast cancer. In our research, we use Wisconsin breast cancer dataset, we preprocess and prepare the data and then apply multiple classifiers such as: Logistic Regression, K nearest neighbor (KNN), Support Vector Classifier, Naive Bayes, Decision Tree and Artificial Neural Network (ANN) on the data to predict the diagnosis of a patient and find the best algorithm with best classification results. Then, we used classifiers for the identification of the image as either Benign or Malignant. Thus, we recorded the classifier’s operation and noted that the Neural Network classifier was operating very well with an accurate diagnosis of 98% than the Support Vector Machine classifier. We try multiple methods of dimensionality reduction/data compression to produce other smaller datasets and more efficient such as: Principal component analysis (PCA), Independent Component Analysis (ICA), Linear discriminant analysis (LDA), an Autoencoder. Then we compare the results and make some visualization to make clear comparisons and differentiate between the used algorithms and techniques. We use various performance metrics: accuracy, recall, precision and f1-score. The measures are close to each other so we focus on accuracy and f1-score. The code is developed using python and some of its most powerful and common libraries such as: numpy for mathematical operations and linear algebra, pandas for data manipulation, sci-kit learn for machine learning algorithms and techniques, tensorflow and keras for deep learning and neural networks functionality, to deal with the user’s files, matplotlib and seaborn for data visualization and polishing.
Bibliography:ObjectType-Conference Proceeding-1
SourceType-Conference Papers & Proceedings-1
content type line 21
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0167964