A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data

Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are num...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in genetics Vol. 9; p. 24
Main Authors Larriba, Yolanda, Rueda, Cristina, Fernández, Miguel A., Peddada, Shyamal D.
Format Journal Article
LanguageEnglish
Published Switzerland Frontiers Media S.A 02.02.2018
Subjects
Online AccessGet full text
ISSN1664-8021
1664-8021
DOI10.3389/fgene.2018.00024

Cover

More Information
Summary:Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method. In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset. A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.html.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Reviewed by: Xianwen Ren, Peking University, China; Tiejun Tong, Hong Kong Baptist University, Hong Kong
This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics
Edited by: Madhuchhanda Bhattacharjee, University of Hyderabad, India
ISSN:1664-8021
1664-8021
DOI:10.3389/fgene.2018.00024