Chunk-BERT: Boosted keyword extraction for long scientific literature via BERT with chunking capabilities
Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information proce...
Saved in:
Published in | 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML) pp. 385 - 392 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
04.08.2023
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/PRML59573.2023.10348182 |
Cover
Abstract | Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information processing activity, tries to extract from texts words, keywords, or phrases that can sum up the subject matter. Bidirectional Encoder Representation from Transformers (BERT) is widely used for unsupervised keyword extraction tasks. However, when dealing with lengthy scientific literature, the BERT model's inherent input length limitation inevitably leads to the issue of missing semantic information, and the ability of local feature extraction is insufficient. To solve the two problems, we proposed Chunk-BERT model, which extracts the features of each position of the scientific literature as a block embedding, and graph-based algorithms are added to strengthen local information. We carried out textual information-processing experiments on SemEval2010, NUS, ACM database and F1 values respectively increased by the maximum of 6.87%, 1.03% and 1.39%, which indicating the effectiveness of the proposed Chunk-BERT. |
---|---|
AbstractList | Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth. Therefore, keyword extraction technology has been placed on high hopes. Keyword extraction (KE), which is a fundamental textual information processing activity, tries to extract from texts words, keywords, or phrases that can sum up the subject matter. Bidirectional Encoder Representation from Transformers (BERT) is widely used for unsupervised keyword extraction tasks. However, when dealing with lengthy scientific literature, the BERT model's inherent input length limitation inevitably leads to the issue of missing semantic information, and the ability of local feature extraction is insufficient. To solve the two problems, we proposed Chunk-BERT model, which extracts the features of each position of the scientific literature as a block embedding, and graph-based algorithms are added to strengthen local information. We carried out textual information-processing experiments on SemEval2010, NUS, ACM database and F1 values respectively increased by the maximum of 6.87%, 1.03% and 1.39%, which indicating the effectiveness of the proposed Chunk-BERT. |
Author | Cai, Rihui Abiderexiti, Kahaerjiang Maimaiti, Maihemuti Zheng, Yuan |
Author_xml | – sequence: 1 givenname: Yuan surname: Zheng fullname: Zheng, Yuan email: zy_zy@stu.xju.edu.cn organization: Xinjiang University,School of information Science and Engineering,Urumqi,China – sequence: 2 givenname: Rihui surname: Cai fullname: Cai, Rihui email: 1837300689@qq.com organization: Xinjiang University,School of information Science and Engineering,Urumqi,China – sequence: 3 givenname: Maihemuti surname: Maimaiti fullname: Maimaiti, Maihemuti email: mahmutjan@xju.edu.cn organization: Xinjiang University,Xinjiang Key Laboratory of Multilingual Information Technology,Urumqi,China – sequence: 4 givenname: Kahaerjiang surname: Abiderexiti fullname: Abiderexiti, Kahaerjiang email: kaharjan@aliyun.com organization: Xinjiang University,Xinjiang Key Laboratory of Multilingual Information Technology,Urumqi,China |
BookMark | eNo10M1OAjEUBeCa6EKRNzCxLzDY259h6k4IoskYDcE16XTuyA04JZ0i8vYOUVcnOTn5FueKnbehRcZuQYwAhL17W7yUxpqxGkkh1QiE0gUU8owN7dgWyggltRLqktF0vW832WS2WN7zSQhdwppv8HgIseb4naLziULLmxD5NrQfvPOEbaKGPN9SwujSPiL_IsdPBj9QWnN_Mqkfe7dzFfU7wu6aXTRu2-HwLwfs_XG2nD5l5ev8efpQZgRgU2ZNk-cWoUJEaysnLVjpK6idFroSGnTe10UhNeC4QKw0GA_GCdUo6Y1RA3bz61IvrHaRPl08rv4fUD8sTVa7 |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/PRML59573.2023.10348182 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350324303 |
EndPage | 392 |
ExternalDocumentID | 10348182 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i119t-95f669e1beee99ba29192cb1da404b0414699b88241e78eeb415c15a03f32c553 |
IEDL.DBID | RIE |
IngestDate | Wed Jan 10 09:27:53 EST 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i119t-95f669e1beee99ba29192cb1da404b0414699b88241e78eeb415c15a03f32c553 |
PageCount | 8 |
ParticipantIDs | ieee_primary_10348182 |
PublicationCentury | 2000 |
PublicationDate | 2023-Aug.-4 |
PublicationDateYYYYMMDD | 2023-08-04 |
PublicationDate_xml | – month: 08 year: 2023 text: 2023-Aug.-4 day: 04 |
PublicationDecade | 2020 |
PublicationTitle | 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML) |
PublicationTitleAbbrev | PRML |
PublicationYear | 2023 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.841991 |
Snippet | Accurately obtaining the domain intellectual in scientific research literature is crucial in light of the academic research literature's fast growth.... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 385 |
SubjectTerms | BERT Bidirectional control Feature extraction Information processing long scientific literature Machine learning Machine learning algorithms Semantics textual information-processing Transformers unsupervised methods |
Title | Chunk-BERT: Boosted keyword extraction for long scientific literature via BERT with chunking capabilities |
URI | https://ieeexplore.ieee.org/document/10348182 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-6kycVJ36Tg9fUZE3axuPGxhA3xthgt5GkqY6NVkar4F_vS9pNFARvJS1JkzTvq7_fewjdx1RRkfKQhEmUEc6yhCRRpIgOqYpiKSz3oYvROBrO-dNCLBqyuufCWGs9-MwG7tL_y08LU7lQGZxwRxtNQOIewndWk7UazBaj8mEyHT0LKeIwcDXBg93TP-qmeLUxOEbj3YA1WmQdVKUOzOevXIz_fqMT1P5m6OHJXvecogObn6FV77XK16Tbn84ecbfw9A0Mh_QDHEwMQnhbkxgw2Kl4U-QvuGZDOrAQ3uzTK-P3lcKuD-xitNi4PmEMbECteiQt-NZtNB_0Z70haUopkBVjsiRSZFEkLdMwAym16kiw7IxmqeKUa8pBXkIzWNuc2TixVoNeN0woGmZhxwgRnqNWXuT2AuFUsETHPI3B0eFwK6GWmtiYFERnlgp1idpunZZvdbaM5W6Jrv5ov0ZHbrs8qI7foFa5rewtKPpS3_kN_gJG0qmd |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA4yD3pSceJvc_CamqxJ23jc2Ji6jTE22G0kaapjo5WxKvjX-5JuioLgraQlaRPyvrzX73sPoduYKipSHpIwiTLCWZaQJIoU0SFVUSyF5T500R9E3Ql_nIrpRqzutTDWWk8-s4G79P_y08KULlQGO9zJRhOwuLsA_JxXcq0Na4tReTcc9XtCijgMXFXwYPv8j8opHjg6B2iwHbLiiyyCcq0D8_ErG-O_3-kQ1b81enj4hT5HaMfmx2jeeinzBWm2R-N73Cy8gAPDNn0HFxODGV5VMgYMJ1W8LPJnXOkhHV0IL78SLOO3ucKuD-yitNi4PmEMbABYPZcWvOs6mnTa41aXbIopkDljck2kyKJIWqbhC6TUqiHhbGc0SxWnXFMOFhOa4bzNmY0TazUgu2FC0TALG0aI8ATV8iK3pwingiU65mkMrg6HWwm11MTGpGA8s1SoM1R38zR7rfJlzLZTdP5H-w3a6477vVnvYfB0gfbd0nmKHb9EtfWqtFcA-2t97Rf7E1MjrOk |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+4th+International+Conference+on+Pattern+Recognition+and+Machine+Learning+%28PRML%29&rft.atitle=Chunk-BERT%3A+Boosted+keyword+extraction+for+long+scientific+literature+via+BERT+with+chunking+capabilities&rft.au=Zheng%2C+Yuan&rft.au=Cai%2C+Rihui&rft.au=Maimaiti%2C+Maihemuti&rft.au=Abiderexiti%2C+Kahaerjiang&rft.date=2023-08-04&rft.pub=IEEE&rft.spage=385&rft.epage=392&rft_id=info:doi/10.1109%2FPRML59573.2023.10348182&rft.externalDocID=10348182 |