Breast cancer diagnosis using supervised machine learning classification algorithms
Breast cancer is a popular disease in women and one of the most serious diseases. Most breast cancer cases cause death and It’s a most dreadful disease for women’s the early diagnosis of breast cancer cases can really help and lead to successful treatment. Machine learning algorithms with medical im...
Saved in:
| Published in | AIP conference proceedings Vol. 2839; no. 1 |
|---|---|
| Main Authors | , |
| Format | Journal Article Conference Proceeding |
| Language | English |
| Published |
Melville
American Institute of Physics
29.09.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0094-243X 1551-7616 |
| DOI | 10.1063/5.0167964 |
Cover
| Summary: | Breast cancer is a popular disease in women and one of the most serious diseases. Most breast cancer cases cause death and It’s a most dreadful disease for women’s the early diagnosis of breast cancer cases can really help and lead to successful treatment. Machine learning algorithms with medical imaging data are broadly used to accurately diagnose breast cancer. In our research, we use Wisconsin breast cancer dataset, we preprocess and prepare the data and then apply multiple classifiers such as: Logistic Regression, K nearest neighbor (KNN), Support Vector Classifier, Naive Bayes, Decision Tree and Artificial Neural Network (ANN) on the data to predict the diagnosis of a patient and find the best algorithm with best classification results. Then, we used classifiers for the identification of the image as either Benign or Malignant. Thus, we recorded the classifier’s operation and noted that the Neural Network classifier was operating very well with an accurate diagnosis of 98% than the Support Vector Machine classifier. We try multiple methods of dimensionality reduction/data compression to produce other smaller datasets and more efficient such as: Principal component analysis (PCA), Independent Component Analysis (ICA), Linear discriminant analysis (LDA), an Autoencoder. Then we compare the results and make some visualization to make clear comparisons and differentiate between the used algorithms and techniques. We use various performance metrics: accuracy, recall, precision and f1-score. The measures are close to each other so we focus on accuracy and f1-score. The code is developed using python and some of its most powerful and common libraries such as: numpy for mathematical operations and linear algebra, pandas for data manipulation, sci-kit learn for machine learning algorithms and techniques, tensorflow and keras for deep learning and neural networks functionality, to deal with the user’s files, matplotlib and seaborn for data visualization and polishing. |
|---|---|
| Bibliography: | ObjectType-Conference Proceeding-1 SourceType-Conference Papers & Proceedings-1 content type line 21 |
| ISSN: | 0094-243X 1551-7616 |
| DOI: | 10.1063/5.0167964 |