Chunk-BERT: Boosted keyword extraction for long scientific literature via BERT with chunking capabilities

Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information proce...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML) pp. 385 - 392
Main Authors	Zheng, Yuan, Cai, Rihui, Maimaiti, Maihemuti, Abiderexiti, Kahaerjiang
Format	Conference Proceeding
Language	English
Published	IEEE 04.08.2023
Subjects	BERT Bidirectional control Feature extraction Information processing long scientific literature Machine learning Machine learning algorithms Semantics textual information-processing Transformers unsupervised methods
Online Access	Get full text
DOI	10.1109/PRML59573.2023.10348182

Cover

More Information
Summary:	Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information processing activity, tries to extract from texts words, keywords, or phrases that can sum up the subject matter. Bidirectional Encoder Representation from Transformers (BERT) is widely used for unsupervised keyword extraction tasks. However, when dealing with lengthy scientific literature, the BERT model's inherent input length limitation inevitably leads to the issue of missing semantic information, and the ability of local feature extraction is insufficient. To solve the two problems, we proposed Chunk-BERT model, which extracts the features of each position of the scientific literature as a block embedding, and graph-based algorithms are added to strengthen local information. We carried out textual information-processing experiments on SemEval2010, NUS, ACM database and F1 values respectively increased by the maximum of 6.87%, 1.03% and 1.39%, which indicating the effectiveness of the proposed Chunk-BERT.
DOI:	10.1109/PRML59573.2023.10348182