A deep learning genome-mining strategy for biosynthetic gene cluster prediction

Abstract Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). T...

Full description

Saved in:
Bibliographic Details
Published inNucleic acids research Vol. 47; no. 18; p. e110
Main Authors Hannigan, Geoffrey D, Prihoda, David, Palicka, Andrej, Soukup, Jindrich, Klempir, Ondrej, Rampula, Lena, Durcak, Jindrich, Wurst, Michael, Kotowski, Jakub, Chang, Dan, Wang, Rurun, Piizzi, Grazia, Temesi, Gergely, Hazuda, Daria J, Woelk, Christopher H, Bitton, Danny A
Format Journal Article
LanguageEnglish
Published England Oxford University Press 10.10.2019
Subjects
Online AccessGet full text
ISSN0305-1048
1362-4962
1362-4962
DOI10.1093/nar/gkz654

Cover

More Information
Summary:Abstract Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Equal senior author contribution.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
ISSN:0305-1048
1362-4962
1362-4962
DOI:10.1093/nar/gkz654