V-cluster algorithm: A new algorithm for clustering molecules based upon numeric data
Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-neares...
Saved in:
| Published in | Molecular diversity Vol. 10; no. 3; pp. 463 - 478 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Netherlands
Springer Nature B.V
01.08.2006
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1381-1991 1573-501X |
| DOI | 10.1007/s11030-006-9023-7 |
Cover
| Summary: | Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented. |
|---|---|
| Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
| ISSN: | 1381-1991 1573-501X |
| DOI: | 10.1007/s11030-006-9023-7 |