OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs

High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 order...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 35; no. 17; pp. 2974 - 2981
Main Authors Sethna, Zachary, Elhanati, Yuval, Callan, Curtis G, Walczak, Aleksandra M, Mora, Thierry
Format Journal Article
LanguageEnglish
Published England Oxford University Press (OUP) 01.09.2019
Oxford University Press
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1367-4811
DOI10.1093/bioinformatics/btz035

Cover

More Information
Summary:High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem. We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design. Source code is available at https://github.com/zsethna/OLGA. Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
The authors wish it to be known that, in their opinion, Zachary Sethna and Yuval Elhanati authors should be regarded as Joint Last Authors.
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btz035