Proteome-wide copy-number estimation from transcriptomics

Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics and transcriptomics for 4366 genes in 369 cell lines. The...

Full description

Saved in:

Bibliographic Details
Published in	Molecular systems biology Vol. 20; no. 11; pp. 1230 - 1256
Main Authors	Sweatt, Andrew J, Griffiths, Cameron D, Groves, Sarah M, Paudel, B Bishal, Wang, Lixin, Kashatus, David F, Janes, Kevin A
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 04.11.2024 Springer Nature
Subjects	Biomedical and Life Sciences Breast Neoplasms - genetics Breast Neoplasms - metabolism CCLE Cell Line, Tumor CVB3 EMBO10 EMBO56 Female Gene Dosage Gene Expression Profiling - methods Gene Regulatory Networks Humans Life Sciences Method Pinferna Proteome - genetics Proteome - metabolism Proteomics - methods RNA, Messenger - genetics RNA, Messenger - metabolism SWATH Systems Biology TMT Transcriptome TMT CVB3 SWATH CCLE Pinferna
Online Access	Get full text
ISSN	1744-4292 1744-4292
DOI	10.1038/s44320-024-00064-3

Cover

More Information
Summary:	Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics and transcriptomics for 4366 genes in 369 cell lines. The approach starts with a protein’s median copy number and hierarchically appends mRNA–protein and mRNA–mRNA dependencies to define an optimal gene-specific model linking mRNAs to protein. For dozens of cell lines and primary samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, empirical mRNA-to-protein ratios, and a proteogenomic DREAM challenge winner. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein complexes, suggesting mechanistic relationships. We use the method to identify a viral-receptor abundance threshold for coxsackievirus B3 susceptibility from 1489 systems-biology infection models parameterized by protein inference. When applied to 796 RNA-seq profiles of breast cancer, inferred copy-number estimates collectively re-classify 26–29% of luminal tumors. By adopting a gene-centered perspective of mRNA–protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility of contemporary proteomics. Synopsis A simple, data-driven method links transcript abundance from RNA-seq to per-cell protein abundance from mass spectrometry. Pinferna outperforms existing approaches, accurately parameterizes systems biology models, and reclassifies tumor subtypes. Pinferna uses three quantitative formalisms that capture measured RNA-protein relationships for 4366 human genes in 369 cancer cell lines. Pinferna consistently yields a more accurate inferred proteome than random sampling of existing proteomes. Using Pinferna-derived initial conditions, a systems-biology model correctly predicts a viral-receptor abundance threshold for infectability. Pinferna reclusters canonical subtypes of breast cancer and predicts new abundance dependencies for a cyclin-dependent kinase that are experimentally validated. A simple, data-driven method links transcript abundance from RNA-seq to per-cell protein abundance from mass spectrometry. Pinferna outperforms existing approaches, accurately parameterizes systems biology models, and reclassifies tumor subtypes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1744-4292 1744-4292
DOI:	10.1038/s44320-024-00064-3