An introduction to deep learning on biological sequence data: examples and solutions

Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libr...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 33; no. 22; pp. 3685 - 3690
Main Authors Jurtz, Vanessa Isabell, Johansen, Alexander Rosenberg, Nielsen, Morten, Almagro Armenteros, Jose Juan, Nielsen, Henrik, Sønderby, Casper Kaae, Winther, Ole, Sønderby, Søren Kaae
Format Journal Article
LanguageEnglish
Published England Oxford University Press 15.11.2017
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1367-4811
DOI10.1093/bioinformatics/btx531

Cover

More Information
Summary:Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btx531