Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge sta...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 8; no. 4; p. e59795
Main Authors Darkins, Robert, Cooke, Emma J., Ghahramani, Zoubin, Kirk, Paul D. W., Wild, David L., Savage, Richard S.
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 02.04.2013
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0059795

Cover

More Information
Summary:We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
Current address: London Centre for Nanotechnology, University College London, London, United Kingdom
Conceived and designed the experiments: RD RSS EJC ZG PDWK DLW. Performed the experiments: RD. Analyzed the data: RD RSS EJC PDWK DLW. Contributed reagents/materials/analysis tools: RD RSS. Wrote the paper: RD RSS EJC ZG PDWK DLW.
Current address: Centre for Bioinformatics, Imperial College London, London, United Kingdom
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0059795