Initializing Partition-Optimization Algorithms

Clustering datasets is a challenging problem needed in a wide array of applications. Partition-optimization approaches, such as k-means or expectation-maximization (EM) algorithms, are sub-optimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to sp...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on computational biology and bioinformatics Vol. 6; no. 1; pp. 144 - 157
Main Author	Maitra, R.
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2009 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms and association rules Arabidopsis - genetics Arabidopsis - metabolism Arrays Assessments Bioinformatics Biology Chemical Hazard Release - statistics & numerical data Circadian Rhythm - genetics classification Cluster Analysis Clustering Clustering algorithms Computational Biology - methods Data Interpretation, Statistical Degradation Escherichia coli Proteins - genetics Gene expression Humans Industrial Waste - statistics & numerical data Iterative algorithms Mercury Methylmercury Compounds Minimization methods Multivariate statistics Normal Distribution Oligonucleotide Array Sequence Analysis Partitioning algorithms Pattern Recognition, Automated - methods Proteins Public healthcare Singular value decomposition Staged Starch - biosynthesis Starch - genetics Statistical methods Testing Statistical methods Multivariate statistics Clustering classification and association rules Singular value decomposition
Online Access	Get full text
ISSN	1545-5963 1557-9964 1557-9964
DOI	10.1109/TCBB.2007.70244

Cover

More Information
Summary:	Clustering datasets is a challenging problem needed in a wide array of applications. Partition-optimization approaches, such as k-means or expectation-maximization (EM) algorithms, are sub-optimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to specifying initial values by finding a large number of local modes and then obtaining representatives from the most separated ones. Results on test experiments are excellent. We also provide a detailed comparative assessment of the suggested algorithm with many commonly-used initialization approaches in the literature. Finally, the methodology is applied to two datasets on diurnal microarray gene expressions and industrial releases of mercury.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1545-5963 1557-9964 1557-9964
DOI:	10.1109/TCBB.2007.70244