diceR: an R package for class discovery using an ensemble driven approach

Background Given a set of features, researchers are often interested in partitioning objects into homogeneous clusters. In health research, cancer research in particular, high-throughput data is collected with the aim of segmenting patients into sub-populations to aid in disease diagnosis, prognosis...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 19; no. 1; pp. 11 - 4
Main Authors Chiu, Derek S., Talhouk, Aline
Format Journal Article
LanguageEnglish
Published London BioMed Central 15.01.2018
BioMed Central Ltd
BMC
Subjects
Online AccessGet full text
ISSN1471-2105
1471-2105
DOI10.1186/s12859-017-1996-y

Cover

More Information
Summary:Background Given a set of features, researchers are often interested in partitioning objects into homogeneous clusters. In health research, cancer research in particular, high-throughput data is collected with the aim of segmenting patients into sub-populations to aid in disease diagnosis, prognosis or response to therapy. Cluster analysis, a class of unsupervised learning techniques, is often used for class discovery. Cluster analysis suffers from some limitations, including the need to select up-front the algorithm to be used as well as the number of clusters to generate, in addition, there may exist several groupings consistent with the data, making it very difficult to validate a final solution. Ensemble clustering is a technique used to mitigate these limitations and facilitate the generalization and reproducibility of findings in new cohorts of patients. Results We introduce diceR (diverse cluster ensemble in R) , a software package available on CRAN: https://CRAN.R-project.org/package=diceR Conclusions diceR is designed to provide a set of tools to guide researchers through a general cluster analysis process that relies on minimizing subjective decision-making. Although developed in a biological context, the tools in diceR are data-agnostic and thus can be applied in different contexts.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-017-1996-y