Chunk-BERT: Boosted keyword extraction for long scientific literature via BERT with chunking capabilities

Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information proce...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML) pp. 385 - 392
Main Authors Zheng, Yuan, Cai, Rihui, Maimaiti, Maihemuti, Abiderexiti, Kahaerjiang
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.08.2023
Subjects
Online AccessGet full text
DOI10.1109/PRML59573.2023.10348182

Cover

Abstract Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information processing activity, tries to extract from texts words, keywords, or phrases that can sum up the subject matter. Bidirectional Encoder Representation from Transformers (BERT) is widely used for unsupervised keyword extraction tasks. However, when dealing with lengthy scientific literature, the BERT model's inherent input length limitation inevitably leads to the issue of missing semantic information, and the ability of local feature extraction is insufficient. To solve the two problems, we proposed Chunk-BERT model, which extracts the features of each position of the scientific literature as a block embedding, and graph-based algorithms are added to strengthen local information. We carried out textual information-processing experiments on SemEval2010, NUS, ACM database and F1 values respectively increased by the maximum of 6.87%, 1.03% and 1.39%, which indicating the effectiveness of the proposed Chunk-BERT.
AbstractList Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information processing activity, tries to extract from texts words, keywords, or phrases that can sum up the subject matter. Bidirectional Encoder Representation from Transformers (BERT) is widely used for unsupervised keyword extraction tasks. However, when dealing with lengthy scientific literature, the BERT model's inherent input length limitation inevitably leads to the issue of missing semantic information, and the ability of local feature extraction is insufficient. To solve the two problems, we proposed Chunk-BERT model, which extracts the features of each position of the scientific literature as a block embedding, and graph-based algorithms are added to strengthen local information. We carried out textual information-processing experiments on SemEval2010, NUS, ACM database and F1 values respectively increased by the maximum of 6.87%, 1.03% and 1.39%, which indicating the effectiveness of the proposed Chunk-BERT.
Author Cai, Rihui
Abiderexiti, Kahaerjiang
Maimaiti, Maihemuti
Zheng, Yuan
Author_xml – sequence: 1
  givenname: Yuan
  surname: Zheng
  fullname: Zheng, Yuan
  email: zy_zy@stu.xju.edu.cn
  organization: Xinjiang University,School of information Science and Engineering,Urumqi,China
– sequence: 2
  givenname: Rihui
  surname: Cai
  fullname: Cai, Rihui
  email: 1837300689@qq.com
  organization: Xinjiang University,School of information Science and Engineering,Urumqi,China
– sequence: 3
  givenname: Maihemuti
  surname: Maimaiti
  fullname: Maimaiti, Maihemuti
  email: mahmutjan@xju.edu.cn
  organization: Xinjiang University,Xinjiang Key Laboratory of Multilingual Information Technology,Urumqi,China
– sequence: 4
  givenname: Kahaerjiang
  surname: Abiderexiti
  fullname: Abiderexiti, Kahaerjiang
  email: kaharjan@aliyun.com
  organization: Xinjiang University,Xinjiang Key Laboratory of Multilingual Information Technology,Urumqi,China
BookMark eNo10M1OAjEUBeCa6EKRNzCxLzDY259h6k4IoskYDcE16XTuyA04JZ0i8vYOUVcnOTn5FueKnbehRcZuQYwAhL17W7yUxpqxGkkh1QiE0gUU8owN7dgWyggltRLqktF0vW832WS2WN7zSQhdwppv8HgIseb4naLziULLmxD5NrQfvPOEbaKGPN9SwujSPiL_IsdPBj9QWnN_Mqkfe7dzFfU7wu6aXTRu2-HwLwfs_XG2nD5l5ev8efpQZgRgU2ZNk-cWoUJEaysnLVjpK6idFroSGnTe10UhNeC4QKw0GA_GCdUo6Y1RA3bz61IvrHaRPl08rv4fUD8sTVa7
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/PRML59573.2023.10348182
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350324303
EndPage 392
ExternalDocumentID 10348182
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i119t-95f669e1beee99ba29192cb1da404b0414699b88241e78eeb415c15a03f32c553
IEDL.DBID RIE
IngestDate Wed Jan 10 09:27:53 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-95f669e1beee99ba29192cb1da404b0414699b88241e78eeb415c15a03f32c553
PageCount 8
ParticipantIDs ieee_primary_10348182
PublicationCentury 2000
PublicationDate 2023-Aug.-4
PublicationDateYYYYMMDD 2023-08-04
PublicationDate_xml – month: 08
  year: 2023
  text: 2023-Aug.-4
  day: 04
PublicationDecade 2020
PublicationTitle 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML)
PublicationTitleAbbrev PRML
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.841991
Snippet Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth....
SourceID ieee
SourceType Publisher
StartPage 385
SubjectTerms BERT
Bidirectional control
Feature extraction
Information processing
long scientific literature
Machine learning
Machine learning algorithms
Semantics
textual information-processing
Transformers
unsupervised methods
Title Chunk-BERT: Boosted keyword extraction for long scientific literature via BERT with chunking capabilities
URI https://ieeexplore.ieee.org/document/10348182
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-6kycVJ36Tg9fUZE3axuPGxhA3xthgt5GkqY6NVkar4F_vS9pNFARvJS1JkzTvq7_fewjdx1RRkfKQhEmUEc6yhCRRpIgOqYpiKSz3oYvROBrO-dNCLBqyuufCWGs9-MwG7tL_y08LU7lQGZxwRxtNQOIewndWk7UazBaj8mEyHT0LKeIwcDXBg93TP-qmeLUxOEbj3YA1WmQdVKUOzOevXIz_fqMT1P5m6OHJXvecogObn6FV77XK16Tbn84ecbfw9A0Mh_QDHEwMQnhbkxgw2Kl4U-QvuGZDOrAQ3uzTK-P3lcKuD-xitNi4PmEMbECteiQt-NZtNB_0Z70haUopkBVjsiRSZFEkLdMwAym16kiw7IxmqeKUa8pBXkIzWNuc2TixVoNeN0woGmZhxwgRnqNWXuT2AuFUsETHPI3B0eFwK6GWmtiYFERnlgp1idpunZZvdbaM5W6Jrv5ov0ZHbrs8qI7foFa5rewtKPpS3_kN_gJG0qmd
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA4yD3pSceJvc_CamqxJ23jc2Ji6jTE22G0kaapjo5WxKvjX-5JuioLgraQlaRPyvrzX73sPoduYKipSHpIwiTLCWZaQJIoU0SFVUSyF5T500R9E3Ql_nIrpRqzutTDWWk8-s4G79P_y08KULlQGO9zJRhOwuLsA_JxXcq0Na4tReTcc9XtCijgMXFXwYPv8j8opHjg6B2iwHbLiiyyCcq0D8_ErG-O_3-kQ1b81enj4hT5HaMfmx2jeeinzBWm2R-N73Cy8gAPDNn0HFxODGV5VMgYMJ1W8LPJnXOkhHV0IL78SLOO3ucKuD-yitNi4PmEMbABYPZcWvOs6mnTa41aXbIopkDljck2kyKJIWqbhC6TUqiHhbGc0SxWnXFMOFhOa4bzNmY0TazUgu2FC0TALG0aI8ATV8iK3pwingiU65mkMrg6HWwm11MTGpGA8s1SoM1R38zR7rfJlzLZTdP5H-w3a6477vVnvYfB0gfbd0nmKHb9EtfWqtFcA-2t97Rf7E1MjrOk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+4th+International+Conference+on+Pattern+Recognition+and+Machine+Learning+%28PRML%29&rft.atitle=Chunk-BERT%3A+Boosted+keyword+extraction+for+long+scientific+literature+via+BERT+with+chunking+capabilities&rft.au=Zheng%2C+Yuan&rft.au=Cai%2C+Rihui&rft.au=Maimaiti%2C+Maihemuti&rft.au=Abiderexiti%2C+Kahaerjiang&rft.date=2023-08-04&rft.pub=IEEE&rft.spage=385&rft.epage=392&rft_id=info:doi/10.1109%2FPRML59573.2023.10348182&rft.externalDocID=10348182