A novel deep machine learning algorithm with dimensionality and size reduction approaches for feature elimination: thyroid cancer diagnoses with randomly missing data

Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimen...

Full description

Saved in:

Bibliographic Details
Published in	Briefings in bioinformatics Vol. 25; no. 4
Main Authors	Tutsoy, Onder, Sumbul, Hilmi Erdem
Format	Journal Article
Language	English
Published	England Oxford Publishing Limited (England) 23.05.2024 Oxford University Press
Subjects	Algorithms Big Data Cancer Case Study Cluster Analysis Clustering Computational efficiency Computing time Deep Learning Humans Learning algorithms Machine Learning Medical diagnosis Missing data Size reduction Thyroid Thyroid cancer Thyroid Neoplasms - diagnosis deep learning size reduction thyroid cancer big data dimension reduction feature selection missing data
Online Access	Get full text
ISSN	1467-5463 1477-4054 1477-4054
DOI	10.1093/bib/bbae344

Cover

More Information
Summary:	Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimensional big data and lack of a common approach leads to randomly distributed missing (sparse) data, which are both formidable challenges for the machine learning algorithms. This paper aims to develop an accurate and computationally efficient deep learning algorithm to diagnose the thyroid cancer. In this respect, randomly distributed missing data stemmed singularity in learning problems is treated and dimensionality reduction with inner and target similarity approaches are developed to select the most informative input datasets. In addition, size reduction with the hierarchical clustering algorithm is performed to eliminate the considerably similar data samples. Four machine learning algorithms are trained and also tested with the unseen data to validate their generalization and robustness abilities. The results yield 100% training and 83% testing preciseness for the unseen data. Computational time efficiencies of the algorithms are also examined under the equal conditions.
Bibliography:	SourceType-Scholarly Journals-1 content type line 14 ObjectType-Report-1 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1467-5463 1477-4054 1477-4054
DOI:	10.1093/bib/bbae344