V-cluster algorithm: A new algorithm for clustering molecules based upon numeric data

Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-neares...

Full description

Saved in:

Bibliographic Details
Published in	Molecular diversity Vol. 10; no. 3; pp. 463 - 478
Main Authors	Xu, Jun, Zhang, Qiang, Shih, Chen-Kon
Format	Journal Article
Language	English
Published	Netherlands Springer Nature B.V 01.08.2006
Subjects	Algorithms Cluster Analysis Gene Expression Profiling Models, Biological Models, Theoretical Pattern Recognition, Automated Studies
Online Access	Get full text
ISSN	1381-1991 1573-501X
DOI	10.1007/s11030-006-9023-7

Cover

More Information
Summary:	Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	1381-1991 1573-501X
DOI:	10.1007/s11030-006-9023-7