Concept placement using BERT trained by transforming and summarizing biomedical ontology structure

[Display omitted] •Model a new concept’s hierarchical position by identifying its IS-A relationships.•Transform the Immediate neighborhood network of a concept into text triples.•Predict IS-A relationships between concepts based on BERT.•Refine the training data by employing an ontology summarizatio...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical informatics Vol. 112; p. 103607
Main Authors	Liu, Hao, Perl, Yehoshua, Geller, James
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.12.2020
Subjects	BERT Biological Ontologies Language Machine learning Natural Language Processing Ontology placement Ontology summarization SNOMED CT Systematized Nomenclature of Medicine Ontology placement SNOMED CT Natural language processing BERT Ontology summarization Machine learning
Online Access	Get full text
ISSN	1532-0464 1532-0480 1532-0480
DOI	10.1016/j.jbi.2020.103607

Cover

More Information
Summary:	[Display omitted] •Model a new concept’s hierarchical position by identifying its IS-A relationships.•Transform the Immediate neighborhood network of a concept into text triples.•Predict IS-A relationships between concepts based on BERT.•Refine the training data by employing an ontology summarization technique. The comprehensive modeling and hierarchical positioning of a new concept in an ontology heavily relies on its set of proper subsumption relationships (IS-As) to other concepts. Identifying a concept’s IS-A relationships is a laborious task requiring curators to have both domain knowledge and terminology skills. In this work, we propose a method to automatically predict the presence of IS-A relationships between a new concept and pre-existing concepts based on the language representation model BERT. This method converts the neighborhood network of a concept into “sentences” and harnesses BERT’s Next Sentence Prediction (NSP) capability of predicting the adjacency of two sentences. To augment our method’s performance, we refined the training data by employing an ontology summarization technique. We trained our model with the two largest hierarchies of the SNOMED CT 2017 July release and applied it to predicting the parents of new concepts added in the SNOMED CT 2018 January release. The results showed that our method achieved an average F1 score of 0.88, and the average Recall score improves slightly from 0.94 to 0.96 by using the ontology summarization technique.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0464 1532-0480 1532-0480
DOI:	10.1016/j.jbi.2020.103607