A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics

The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k -means clustering technique—the Fast, Efficient, and Scalable k...

Full description

Saved in:

Bibliographic Details
Published in	EURASIP journal on bioinformatics & systems biology Vol. 2010; no. 1; pp. 1 - 14
Main Author	Oyana, Tonny J
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 2010 Springer Nature B.V Springer
Subjects	Algorithms Bioinformatics Biology Biomedical Engineering and Bioengineering Blood Blood levels Cluster analysis Computational Biology/Bioinformatics Data mining Elevated Engineering Field study Gene expression Human subjects Mathematical analysis Mean square errors Methods Quality Research Article Signal,Image and Speech Processing Systems Biology Visual Cluster Center Housing Unit Synthetic Dataset Neighbor Query Mean Square Error
Online Access	Get full text
ISSN	1687-4145 1687-4153 1687-4153
DOI	10.1155/2010/746021

Cover

More Information
Summary:	The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k -means clustering technique—the Fast, Efficient, and Scalable k -means algorithm ( FES-k -means). The FES-k -means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k -means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k -means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1687-4145 1687-4153 1687-4153
DOI:	10.1155/2010/746021