Dual Triggered Correspondence Topic (DTCT)model for MeSH annotation

Accurate Medical Subject Headings (MeSH)annotation is an important issue for researchers in terms of effective information retrieval and knowledge discovery in the biomedical literature. We have developed a powerful dual triggered correspondence topic (DTCT)model for MeSH annotated articles. In our...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on computational biology and bioinformatics Vol. 19; no. 2; pp. 899 - 911
Main Authors	Kim, Seonho, Yoon, Juntae
Format	Journal Article
Language	English
Published	United States IEEE 01.03.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Annotations Biological system modeling Biomedical imaging Documents dual triggered correspondence topic model Finite element method Indexing Information processing Information retrieval Information Storage and Retrieval LDA Medical research Medical Subject Headings Medical Subject Headings-MeSH MEDLINE MeSH annotation Mesh generation multi-label classification phi-coefficient Probabilistic logic Probability Probability theory Semantics Statistical analysis Unified modeling language Vocabulary
Online Access	Get full text
ISSN	1545-5963 1557-9964 1557-9964
DOI	10.1109/TCBB.2020.3016355

Cover

More Information
Summary:	Accurate Medical Subject Headings (MeSH)annotation is an important issue for researchers in terms of effective information retrieval and knowledge discovery in the biomedical literature. We have developed a powerful dual triggered correspondence topic (DTCT)model for MeSH annotated articles. In our model, two types of data are assumed to be generated by the same latent topic factors and words in abstracts and titles serve as descriptions of the other type, MeSH terms. Our model allows the generation of MeSHs in abstracts to be triggered either by general document topics or by document-specific "special" word distributions in a probabilistic manner, allowing for a trade-off between the benefits of topic-based abstraction and specific word matching. In order to relax the topic influences of non-topical words or domain-frequent words in text description, we integrated the discriminative feature of Okapi BM25 into word sampling probability. This allows the model to choose keywords, which stand out from others, in order to generate MeSH terms. We further incorporate prior knowledge about relations between word and MeSH in DTCT with phi -coefficient to improve topic coherence. We demonstrated the model's usefulness in automatic MeSH annotation. Our model obtained 0.62 F-score 150,00 MEDLINE test set and showed a strength in recall rate. Specially, it yielded competitive performances in an integrated probabilistic environment without additional post-processing for filtering MeSHs.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1545-5963 1557-9964 1557-9964
DOI:	10.1109/TCBB.2020.3016355