RNA-Seq Count Data Modelling by Grey Relational Analysis and Nonparametric Gaussian Process

This paper introduces an approach to classification of RNA-seq read counts using grey relational analysis (GRA) and Bayesian Gaussian process (GP) models. Read counts are transformed to microarray-like data to facilitate normal-based statistical methods. GRA is designed to select differentially expr...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 11; no. 10; p. e0164766
Main Authors	Nguyen, Thanh, Bhatti, Asim, Yang, Samuel, Nahavandi, Saeid
Format	Journal Article
Language	English
Published	United States Public Library of Science 26.10.2016 Public Library of Science (PLoS)
Subjects	Algorithms Analysis Artificial intelligence Bayes Theorem Bayesian analysis Biology and life sciences Biomarkers Cancer Cervical cancer Classification Classifiers Data analysis Data processing Deoxyribonucleic acid Discriminant analysis DNA Entropy Experiments Fuzzy logic Gaussian process Gaussian processes Gene expression Genes Information management Mathematical models Medical treatment Medicine and Health Sciences Neural networks Nonparametric statistics Normal Distribution Pathogenesis Physical Sciences Production methods Research and analysis methods Ribonucleic acid RNA RNA sequencing Sequence Analysis, RNA Statistical analysis Statistical methods Statistics as Topic - methods Statistics, Nonparametric Australia Victoria Australia
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0164766

Cover

More Information
Summary:	This paper introduces an approach to classification of RNA-seq read counts using grey relational analysis (GRA) and Bayesian Gaussian process (GP) models. Read counts are transformed to microarray-like data to facilitate normal-based statistical methods. GRA is designed to select differentially expressed genes by integrating outcomes of five individual feature selection methods including two-sample t-test, entropy test, Bhattacharyya distance, Wilcoxon test and receiver operating characteristic curve. GRA performs as an aggregate filter method through combining advantages of the individual methods to produce significant feature subsets that are then fed into a nonparametric GP model for classification. The proposed approach is verified by using two benchmark real datasets and the five-fold cross-validation method. Experimental results show the performance dominance of the GRA-based feature selection method as well as GP classifier against their competing methods. Moreover, the results demonstrate that GRA-GP considerably dominates the sparse Poisson linear discriminant analysis classifiers, which were introduced specifically for read counts, on different number of features. The proposed approach therefore can be implemented effectively in real practice for read count data analysis, which is useful in many applications including understanding disease pathogenesis, diagnosis and treatment monitoring at the molecular level.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Conceptualization: TN AB SY SN. Data curation: TN AB SN. Formal analysis: TN AB SY SN. Methodology: TN AB SY SN. Resources: TN AB. Software: TN AB. Validation: TN SY SN. Writing – original draft: TN AB. Writing – review & editing: TN AB SY SN. Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0164766