Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data

RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples...

Full description

Saved in:
Bibliographic Details
Published inGenomics (San Diego, Calif.) Vol. 111; no. 4; pp. 893 - 898
Main Author Chakraborty, Sutirtha
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.07.2019
Subjects
Online AccessGet full text
ISSN0888-7543
1089-8646
1089-8646
DOI10.1016/j.ygeno.2018.05.018

Cover

More Information
Summary:RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package ‘SVAPLSseq’) to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. •RNA-Seq technology provides a deep resolution view of the transcriptomic expression pattern.•Unknown factors of Hidden Variation in RNA-Seq data confound the primary signals of differential expression between two sample types.•The R package SVAPLSseq provides two methods (supervised and unsupervised) to correct for these hidden factors of variation in RNA-Seq•The method is found to perform better than other competing approaches in several situations.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0888-7543
1089-8646
1089-8646
DOI:10.1016/j.ygeno.2018.05.018