Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures

A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large databas...

Full description

Saved in:

Bibliographic Details
Published in	Genome research Vol. 15; no. 5; pp. 724 - 736
Main Authors	Natsoulis, Georges, El Ghaoui, Laurent, Lanckriet, Gert R.G., Tolley, Alexander M., Leroy, Fabrice, Dunlea, Shane, Eynon, Barrett P., Pearson, Cecelia I., Tugendreich, Stuart, Jarnagin, Kurt
Format	Journal Article
Language	English
Published	United States Cold Spring Harbor Laboratory Press 01.05.2005
Subjects	Algorithms Animals Bone Marrow - metabolism Classification - methods Dose-Response Relationship, Drug Gene Expression Regulation Kidney - metabolism Liver - metabolism Logistic Models Male Methods Myocardium - metabolism Oligonucleotide Array Sequence Analysis - methods Oligonucleotide Array Sequence Analysis - standards Pharmaceutical Preparations - metabolism Principal Component Analysis Rats Rats, Sprague-Dawley Reproducibility of Results RNA, Messenger - isolation & purification
Online Access	Get full text
ISSN	1088-9051 1549-5469 1549-5469
DOI	10.1101/gr.2807605

Cover

More Information
Summary:	A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2807605. Corresponding author.E-mail gnatsoulis@iconixpharm.com; fax (650) 567-5540. Supplemental material is available online at www.genome.org.
ISSN:	1088-9051 1549-5469 1549-5469
DOI:	10.1101/gr.2807605