Multilingual translation for zero-shot biomedical classification using BioTranslator

Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual transl...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 14; no. 1; pp. 738 - 13
Main Authors Xu, Hanwen, Woicik, Addie, Poon, Hoifung, Altman, Russ B., Wang, Sheng
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 10.02.2023
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2041-1723
2041-1723
DOI10.1038/s41467-023-36476-2

Cover

More Information
Summary:Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea of BioTranslator is to develop a multilingual translation framework, where multiple modalities of biological data are all translated to text. We demonstrate how BioTranslator enables the identification of novel cell types using only a textual description and how BioTranslator can be further generalized to protein function prediction and drug target identification. Our tool frees scientists from limiting their analyses within predefined controlled vocabularies, enabling them to interact with biological data using free text. Here, the authors develop the cross-modal translation method BioTranslator to translate the textual description to non-text biological data. This approach frees scientists from limiting their analysis within predefined controlled vocabularies.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-023-36476-2