Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data
RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples...
Saved in:
Published in | Genomics (San Diego, Calif.) Vol. 111; no. 4; pp. 893 - 898 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.07.2019
|
Subjects | |
Online Access | Get full text |
ISSN | 0888-7543 1089-8646 1089-8646 |
DOI | 10.1016/j.ygeno.2018.05.018 |
Cover
Summary: | RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package ‘SVAPLSseq’) to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques.
•RNA-Seq technology provides a deep resolution view of the transcriptomic expression pattern.•Unknown factors of Hidden Variation in RNA-Seq data confound the primary signals of differential expression between two sample types.•The R package SVAPLSseq provides two methods (supervised and unsupervised) to correct for these hidden factors of variation in RNA-Seq•The method is found to perform better than other competing approaches in several situations. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0888-7543 1089-8646 1089-8646 |
DOI: | 10.1016/j.ygeno.2018.05.018 |