Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructe...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 6; no. 9; p. e24233
Main Authors	Liu, Jiangang, Jolly, Robert A., Smith, Aaron T., Searfoss, George H., Goldstein, Keith M., Uversky, Vladimir N., Dunker, Keith, Li, Shuyu, Thomas, Craig E., Wei, Tao
Format	Journal Article
Language	English
Published	United States Public Library of Science 15.09.2011 Public Library of Science (PLoS)
Subjects	Algorithms Animals Bile Ducts - pathology Bioinformatics Biological effects Biology Biomarkers Biomarkers - metabolism Breast cancer Cancer therapies Case studies Classification Computational Biology - methods Computer Science Databases, Factual Gangrene Gene expression Genes Genomes Genomics Genomics - methods Homeostasis Humans Hyperplasia Inflammation Informatics Iterative methods Kinases Laboratories Mathematical models Medicine Metabolism Metabolites Modelling Models, Statistical Multiplexing Necrosis Oligonucleotide Array Sequence Analysis Pain Pharmacology Predictive Value of Tests Proportional Hazards Models Rats Regulatory approval Side effects Statistics as Topic Studies Technology, Pharmaceutical Toxicity Toxicology Transcription Indiana United States > US Indianapolis Indiana
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0024233

Cover

More Information
Summary:	Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Conceived and designed the experiments: JL TW SL CET VNU KD. Performed the experiments: JL TW RAJ ATS GHS KMG. Analyzed the data: JL TW SL CET VNU KD RAJ ATS. Wrote the paper: JL TW CET.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0024233