V-cluster algorithm: A new algorithm for clustering molecules based upon numeric data

Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-neares...

Full description

Saved in:
Bibliographic Details
Published inMolecular diversity Vol. 10; no. 3; pp. 463 - 478
Main Authors Xu, Jun, Zhang, Qiang, Shih, Chen-Kon
Format Journal Article
LanguageEnglish
Published Netherlands Springer Nature B.V 01.08.2006
Subjects
Online AccessGet full text
ISSN1381-1991
1573-501X
DOI10.1007/s11030-006-9023-7

Cover

More Information
Summary:Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:1381-1991
1573-501X
DOI:10.1007/s11030-006-9023-7