Statistical classifiers for diagnosing disease from immune repertoires: a case study using multiple sclerosis

Background Deep sequencing of lymphocyte receptor repertoires has made it possible to comprehensively profile the clonal composition of lymphocyte populations. This opens the door for novel approaches to diagnose and prognosticate diseases with a driving immune component by identifying repertoire se...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 18; no. 1; pp. 401 - 10
Main Authors	Ostmeyer, Jared, Christley, Scott, Rounds, William H., Toby, Inimary, Greenberg, Benjamin M., Monson, Nancy L., Cowell, Lindsay G.
Format	Journal Article
Language	English
Published	London BioMed Central 07.09.2017 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Algorithms Amino Acid Sequence Antibodies Antibody Area Under Curve Autoimmune diseases B-Lymphocytes - metabolism Bioinformatics Biomedical and Life Sciences Case depth CDR3 Cerebrospinal fluid Classifiers Coding Codons Complementarity Complementarity Determining Regions - chemistry Complementarity Determining Regions - metabolism Complementarity-determining region 3 Computational biology Computational Biology/Bioinformatics Computer Appl. in Life Sciences Datasets Diagnosis Diagnostic systems Feasibility studies High-Throughput Nucleotide Sequencing Humans Immune repertoire Learning algorithms Life Sciences Lymphocytes Lymphocytes B Machine learning Microarrays Models, Statistical Molecular diagnostic techniques Multiple sclerosis Multiple Sclerosis, Relapsing-Remitting - classification Multiple Sclerosis, Relapsing-Remitting - diagnosis Multiple Sclerosis, Relapsing-Remitting - immunology Nervous System Diseases - classification Nervous System Diseases - diagnosis Nervous System Diseases - immunology Neurological diseases Ovarian cancer Patients Receptors Research Article ROC Curve Sequence analysis (applications) Statistical classifier Statistical methods Statistics T cell receptors Multiple sclerosis Antibody CDR3 Immune repertoire Statistical classifier Machine learning
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-017-1814-6

Cover

More Information
Summary:	Background Deep sequencing of lymphocyte receptor repertoires has made it possible to comprehensively profile the clonal composition of lymphocyte populations. This opens the door for novel approaches to diagnose and prognosticate diseases with a driving immune component by identifying repertoire sequence patterns associated with clinical phenotypes. Indeed, recent studies support the feasibility of this, demonstrating an association between repertoire-level summary statistics (e.g., diversity) and patient outcomes for several diseases. In our own prior work, we have shown that six codons in VH4-containing genes in B cells from the cerebrospinal fluid of patients with relapsing remitting multiple sclerosis (RRMS) have higher replacement mutation frequencies than observed in healthy controls or patients with other neurological diseases. However, prior methods to date have been limited to focusing on repertoire-level summary statistics, ignoring the vast amounts of information in the millions of individual immune receptors comprising a repertoire. We have developed a novel method that addresses this limitation by using innovative approaches for accommodating the extraordinary sequence diversity of immune receptors and widely used machine learning approaches. We applied our method to RRMS, an autoimmune disease that is notoriously difficult to diagnose. Results We use the biochemical features encoded by the complementarity determining region 3 of each B cell receptor heavy chain in every patient repertoire as input to a detector function, which is fit to give the correct diagnosis for each patient using maximum likelihood optimization methods. The resulting statistical classifier assigns patients to one of two diagnosis categories, RRMS or other neurological disease, with 87% accuracy by leave-one-out cross-validation on training data ( N = 23) and 72% accuracy on unused data from a separate study ( N = 102). Conclusions Our method is the first to apply statistical learning to immune repertoires to aid disease diagnosis, learning repertoire-level labels from the set of individual immune repertoire sequences. This method produced a repertoire-based statistical classifier for diagnosing RRMS that provides a high degree of diagnostic capability, rivaling the accuracy of diagnosis by a clinical expert. Additionally, this method points to a diagnostic biochemical motif in the antibodies of RRMS patients, which may offer insight into the disease process.
Bibliography:	ObjectType-Case Study-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Feature-4 ObjectType-Report-1 ObjectType-Article-3 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-017-1814-6