A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k -means clustering technique—the Fast, Efficient, and Scalable k...
        Saved in:
      
    
          | Published in | EURASIP journal on bioinformatics & systems biology Vol. 2010; no. 1; pp. 1 - 14 | 
|---|---|
| Main Author | |
| Format | Journal Article | 
| Language | English | 
| Published | 
        Cham
          Springer International Publishing
    
        2010
     Springer Nature B.V Springer  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1687-4145 1687-4153 1687-4153  | 
| DOI | 10.1155/2010/746021 | 
Cover
| Summary: | The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original
k
-means clustering technique—the Fast, Efficient, and Scalable
k
-means algorithm (
FES-k
-means). The
FES-k
-means algorithm uses a hybrid approach that comprises the
k-d
tree data structure that enhances the nearest neighbor query, the original
k
-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original
k
-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
| ISSN: | 1687-4145 1687-4153 1687-4153  | 
| DOI: | 10.1155/2010/746021 |