Proteogenomics: concepts, applications and computational strategies

A proteogenomic approach to analyzing mass spectrometry–based proteomic data enables the discovery of novel peptides, provides peptide-level evidence of gene expression, and assists in refining gene models. Strategies for building custom sequence databases, applications benefitting from a proteogeno...

Full description

Saved in:

Bibliographic Details
Published in	Nature methods Vol. 11; no. 11; pp. 1114 - 1125
Main Author	Nesvizhskii, Alexey I
Format	Journal Article
Language	English
Published	New York Nature Publishing Group US 01.11.2014 Nature Publishing Group
Subjects	631/114 631/61/212 631/61/475 Annotations Bioinformatics Biological Microscopy Biological Techniques Biomedical Engineering/Biotechnology Chromatography Databases, Nucleic Acid Databases, Protein Genetic screening Genetic Variation Genomes Genomics Genomics - methods High-Throughput Nucleotide Sequencing Innovations Life Sciences Mass Spectrometry Methods Mutation Peptides Protein Isoforms - genetics Proteins Proteome - genetics Proteomics Proteomics - methods review-article Ribonucleic acid RNA Scientific imaging Sequence Analysis, Protein - methods Ann Arbor Michigan United States > US Michigan
Online Access	Get full text
ISSN	1548-7091 1548-7105 1548-7105
DOI	10.1038/nmeth.3144

Cover

More Information
Summary:	A proteogenomic approach to analyzing mass spectrometry–based proteomic data enables the discovery of novel peptides, provides peptide-level evidence of gene expression, and assists in refining gene models. Strategies for building custom sequence databases, applications benefitting from a proteogenomic approach, and challenges in interpreting data are discussed in this Review. Also in this issue, Alfaro et al . discuss the use of proteogenomic approaches for studying cancer biology. Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry–based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry–based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Review-3 content type line 23
ISSN:	1548-7091 1548-7105 1548-7105
DOI:	10.1038/nmeth.3144