Improving the Mann–Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography

•An innovative feature selection method (named uFilter) is proposed.•A set of image-based features, from mammography lesions, were explored and successfully ranked.•Classification's performance of four different machine learning algorithms increased in almost all scenarios when using the uFilte...

Full description

Saved in:

Bibliographic Details
Published in	Artificial intelligence in medicine Vol. 63; no. 1; pp. 19 - 31
Main Authors	Pérez, Noel Pérez, Guevara López, Miguel A., Silva, Augusto, Ramos, Isabel
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 01.01.2015
Subjects	Algorithms Bayes Theorem Breast Breast cancer CADx Breast Neoplasms - diagnostic imaging Cancer Chi-Square Distribution Classification Databases, Factual Diagnosis Diagnosis, Computer-Assisted - methods Discretization Discriminant Analysis Feature selection methods Female Humans Internal Medicine Linear Models Machine Learning Machine learning algorithms Mammography - methods Mann–Whitney U-test Models, Statistical Neural networks Other Predictive Value of Tests Radiographic Image Interpretation, Computer-Assisted - methods Redundancy analysis Reproducibility of Results Statistical tests uFilter method Breast cancer CADx Mann–Whitney U-test Machine learning algorithms Feature selection methods Redundancy analysis uFilter method
Online Access	Get full text
ISSN	0933-3657 1873-2860 1873-2860
DOI	10.1016/j.artmed.2014.12.004

Cover

More Information
Summary:	•An innovative feature selection method (named uFilter) is proposed.•A set of image-based features, from mammography lesions, were explored and successfully ranked.•Classification's performance of four different machine learning algorithms increased in almost all scenarios when using the uFilter method.•The uFilter method statistically improved the breast cancer classification in mammography.•The efficiency of the uFilter method was confirmed by the Wilcoxon statistical test. This work addresses the theoretical description and experimental evaluation of a new feature selection method (named uFilter). The uFilter improves the Mann–Whitney U-test for reducing dimensionality and ranking features in binary classification problems. Also, it presented a practical uFilter application on breast cancer computer-aided diagnosis (CADx). A total of 720 datasets (ranked subsets of features) were formed by the application of the chi-square (CHI2) discretization, information-gain (IG), one-rule (1Rule), Relief, uFilter and its theoretical basis method (named U-test). Each produced dataset was used for training feed-forward backpropagation neural network, support vector machine, linear discriminant analysis and naive Bayes machine learning algorithms to produce classification scores for further statistical comparisons. A head-to-head comparison based on the mean of area under receiver operating characteristics curve scores against the U-test method showed that the uFilter method significantly outperformed the U-test method for almost all classification schemes (p<0.05); it was superior in 50%; tied in a 37.5% and lost in a 12.5% of the 24 comparative scenarios. Also, the performance of the uFilter method, when compared with CHI2 discretization, IG, 1Rule and Relief methods, was superior or at least statistically similar on the explored datasets while requiring less number of features. The experimental results indicated that uFilter method statistically outperformed the U-test method and it demonstrated similar, but not superior, performance than traditional feature selection methods (CHI2 discretization, IG, 1Rule and Relief). The uFilter method revealed competitive and appealing cost-effectiveness results on selecting relevant features, as a support tool for breast cancer CADx methods especially in unbalanced datasets contexts. Finally, the redundancy analysis as a complementary step to the uFilter method provided us an effective way for finding optimal subsets of features without decreasing the classification performances.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-3 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0933-3657 1873-2860 1873-2860
DOI:	10.1016/j.artmed.2014.12.004