Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods

Background We used a hybrid machine learning systems (HMLS) strategy that includes the extensive search for the discovery of the most optimal HMLSs, including feature selection algorithms, a feature extraction algorithm, and classifiers for diagnosing breast cancer. Hence, this study aims to obtain...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 23; no. 1; pp. 1 - 9
Main Authors	Taghizadeh, Eskandar, Heydarheydari, Sahel, Saberi, Alihossein, JafarpoorNesheli, Shabnam, Rezaeijo, Seyed Masoud
Format	Journal Article
Language	English
Published	London BioMed Central 01.10.2022 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Accuracy Algorithms Analysis Artificial intelligence Bayesian analysis Bioinformatics Biological markers Biomarkers Biomedical and Life Sciences Breast cancer Business metrics Care and treatment Classification Classifiers Computational Biology/Bioinformatics Computer Appl. in Life Sciences Datasets Decision trees Diagnosis Discriminant analysis Feature extraction Feature selection Gene expression Health aspects Hybrid systems Learning algorithms Life Sciences Machine learning Methods Microarrays MicroRNAs Multilayer perceptrons Prediction Principal components analysis Statistical analysis Support vector machines Transcriptome profiling Transcriptomes Variance analysis Iran Feature selection Breast cancer Transcriptome profiling Prediction Machine learning
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-022-04965-8

Cover

More Information
Summary:	Background We used a hybrid machine learning systems (HMLS) strategy that includes the extensive search for the discovery of the most optimal HMLSs, including feature selection algorithms, a feature extraction algorithm, and classifiers for diagnosing breast cancer. Hence, this study aims to obtain a high-importance transcriptome profile linked with classification procedures that can facilitate the early detection of breast cancer. Methods In the present study, 762 breast cancer patients and 138 solid tissue normal subjects were included. Three groups of machine learning (ML) algorithms were employed: (i) four feature selection procedures are employed and compared to select the most valuable feature: (1) ANOVA; (2) Mutual Information; (3) Extra Trees Classifier; and (4) Logistic Regression (LGR), (ii) a feature extraction algorithm (Principal Component Analysis), iii) we utilized 13 classification algorithms accompanied with automated ML hyperparameter tuning, including (1) LGR; (2) Support Vector Machine; (3) Bagging; (4) Gaussian Naive Bayes; (5) Decision Tree; (6) Gradient Boosting Decision Tree; (7) K Nearest Neighborhood; (8) Bernoulli Naive Bayes; (9) Random Forest; (10) AdaBoost, (11) ExtraTrees; (12) Linear Discriminant Analysis; and (13) Multilayer Perceptron (MLP). For evaluating the proposed models' performance, balance accuracy and area under the curve (AUC) were used. Results Feature selection procedure LGR + MLP classifier achieved the highest prediction accuracy and AUC (balanced accuracy: 0.86, AUC = 0.94), followed by an LGR + LGR classifier (balanced accuracy: 0.84, AUC = 0.94). The results showed that achieved AUC for the LGR + LGR classifier belonged to the 20 biomarkers as follows: TMEM212, SNORD115-13, ATP1A4, FRG2, CFHR4, ZCCHC13, FLJ46361, LY6G6E, ZNF323, KRT28, KRT25, LPPR5, C10orf99, PRKACG, SULT2A1, GRIN2C, EN2, GBA2, CUX2, and SNORA66. Conclusions The best performance was achieved using the LGR feature selection procedure and MLP classifier. Results show that the 20 biomarkers had the highest score or ranking in breast cancer detection.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-022-04965-8