Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene On...
Saved in:
| Published in | PloS one Vol. 7; no. 10; p. e46128 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Public Library of Science
02.10.2012
Public Library of Science (PLoS) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1932-6203 1932-6203 |
| DOI | 10.1371/journal.pone.0046128 |
Cover
| Summary: | When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist. Contributed reagents/materials/analysis tools: JSC JHC. Conceived and designed the study: GM YD SE JSC JHC. Assembled the data: GM JSC. Analyzed and interpreted the data: GM YD SE JHC. Drafted the paper: GM YD. Participated in the critical revision of the manuscript and gave final approval of the article: GM YD SE JSC JHC. |
| ISSN: | 1932-6203 1932-6203 |
| DOI: | 10.1371/journal.pone.0046128 |