Analyzing gene expression data in terms of gene sets: methodological issues

Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against di...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 23; no. 8; pp. 980 - 987
Main Authors Goeman, Jelle J., Bühlmann, Peter
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.04.2007
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1367-4811
1460-2059
DOI10.1093/bioinformatics/btm051

Cover

More Information
Summary:Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. Results: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing. Contact: j.j.goeman@lumc.nl
Bibliography:istex:79B785CB5B754E99BEA61404BD23BC87E97C29E5
To whom correspondence should be addressed.
Associate Editor: Trey Ideker
ark:/67375/HXZ-G4Q5W76K-M
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ObjectType-Undefined-3
ISSN:1367-4803
1367-4811
1367-4811
1460-2059
DOI:10.1093/bioinformatics/btm051