BatteryDataExtractor: battery-aware text-mining software embedded with BERT models

Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation...

Full description

Saved in:
Bibliographic Details
Published inChemical science (Cambridge) Vol. 13; no. 39; pp. 11487 - 11495
Main Authors Huang, Shu, Cole, Jacqueline M
Format Journal Article
LanguageEnglish
Published Cambridge Royal Society of Chemistry 12.10.2022
The Royal Society of Chemistry
Subjects
Online AccessGet full text
ISSN2041-6520
2041-6539
2041-6539
DOI10.1039/d2sc04322j

Cover

Abstract Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model.
AbstractList Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model.
Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.
Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model.
Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.
Author Cole, Jacqueline M
Huang, Shu
AuthorAffiliation Harwell Science and Innovation Campus
Rutherford Appleton Laboratory
Cavendish Laboratory
University of Cambridge
Department of Physics
ISIS Neutron and Muon Source
AuthorAffiliation_xml – name: Department of Physics
– name: University of Cambridge
– name: Harwell Science and Innovation Campus
– name: Rutherford Appleton Laboratory
– name: Cavendish Laboratory
– name: ISIS Neutron and Muon Source
Author_xml – sequence: 1
  givenname: Shu
  surname: Huang
  fullname: Huang, Shu
– sequence: 2
  givenname: Jacqueline M
  surname: Cole
  fullname: Cole, Jacqueline M
BackLink https://www.osti.gov/biblio/1889317$$D View this record in Osti.gov
BookMark eNpdkMtv1DAQxiNUJErphTtSBBcECviVVw9IdLu8VAmplLM1cSZdrxJ7sR22-9_jNNWWri-2xr_5Zr7veXJkrMEkeUnJB0p4_bFlXhHBGVs_SY4ZETQrcl4f7d-MPEtOvV-TeDinOSuPk6tzCAHd7gICLG-DAxWsO0ubuZrBFhymAW9DNmijzU3qbRfuijg02LbYplsdVun58uo6HWyLvX-RPO2g93h6f58kv78srxffssufX78vPl9mKu4YspYRgDJvcgFVUQBy4G3HKhCYY1EXCmnZKWCsBk5r0VQoUAjGqrwUVQHR2UnyftYdzQZ2W-h7uXF6ALeTlMgpEfmQSKQ_zfRmbAZsFZpo9qHDgpaPf4xeyRv7V9YFK0s6jXs9C1gftPRKB1QrZY1BFSStqprTMkJv76c4-2dEH-SgvcK-B4N29JKVXBS0FPmk9-YAXdvRmZhYpBivBK_IRJGZUs5677CTcTAEbacldb93esF-Le6c_ogt7w5aDmN5BL-aYefVnvsvtn-9ebnL
CitedBy_id crossref_primary_10_1039_D3DD00159H
crossref_primary_10_1021_acs_jcim_4c00063
crossref_primary_10_1063_5_0251325
crossref_primary_10_1039_D4CS00913D
crossref_primary_10_1016_j_jece_2023_111384
crossref_primary_10_1038_s43246_024_00449_9
crossref_primary_10_1016_j_ijinfomgt_2023_102725
crossref_primary_10_1039_D3DD00099K
crossref_primary_10_1021_acs_jcim_3c00422
crossref_primary_10_1038_s41467_024_45394_w
crossref_primary_10_1016_j_patter_2024_100955
crossref_primary_10_1039_D4DD00307A
crossref_primary_10_1021_acs_jcim_4c00816
crossref_primary_10_1002_adfm_202302630
crossref_primary_10_1016_j_ces_2024_120916
crossref_primary_10_1021_acs_chemmater_3c00788
crossref_primary_10_1038_s41524_025_01554_0
Cites_doi 10.1021/acs.accounts.9b00470
10.1016/j.cossms.2021.100975
10.1038/s41586-019-1335-8
10.1021/cm400893e
10.1021/acs.jcim.2c00035
10.3390/cryst9010054
10.1038/s41597-020-00602-2
10.1002/aic.16198
10.1039/D1DD00034A
10.1016/j.trechm.2020.12.003
10.1016/j.comptc.2021.113443
10.1038/s41597-019-0306-0
10.1038/s41524-019-0173-4
10.1016/j.rser.2019.03.036
10.1021/acs.jcim.1c00446
10.1021/acs.jcim.1c01198
10.3115/1119176.1119195
10.1088/2515-7639/ab3611
10.1186/1758-2946-7-S1-S1
10.1186/s12859-017-1776-8
10.1038/nature14539
10.1038/s41524-021-00695-2
10.1038/s41597-018-0005-2
10.1063/5.0064875
10.1016/j.patter.2022.100488
10.18653/v1/2020.coling-main.292
10.1002/aenm.201802820
10.1039/D1CP02963K
10.1162/neco.1997.9.8.1735
10.1021/acs.jcim.9b00470
10.1063/5.0021106
10.1002/batt.202000288
10.1021/acs.jcim.1c01199
10.1002/adfm.202201437
10.1039/C5EE00685F
10.1007/978-3-030-32381-3_16
10.1021/acs.jcim.6b00207
10.1021/acscatal.9b04952
10.12943/CNR.2018.00004
10.1088/1361-6463/aad926
10.1002/advs.201900808
10.1038/s41597-022-01295-5
10.1038/s41597-022-01294-6
10.1038/s41597-022-01321-6
10.1002/er.6776
ContentType Journal Article
Copyright Copyright Royal Society of Chemistry 2022
This journal is © The Royal Society of Chemistry.
This journal is © The Royal Society of Chemistry 2022 The Royal Society of Chemistry
Copyright_xml – notice: Copyright Royal Society of Chemistry 2022
– notice: This journal is © The Royal Society of Chemistry.
– notice: This journal is © The Royal Society of Chemistry 2022 The Royal Society of Chemistry
CorporateAuthor Argonne National Laboratory (ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF)
CorporateAuthor_xml – sequence: 0
  name: Argonne National Laboratory (ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF)
DBID AAYXX
CITATION
7SR
8BQ
8FD
JG9
7X8
OTOTI
5PM
ADTOC
UNPAY
DOI 10.1039/d2sc04322j
DatabaseName CrossRef
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
MEDLINE - Academic
OSTI.GOV
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
METADEX
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic


CrossRef
Materials Research Database
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
EISSN 2041-6539
EndPage 11495
ExternalDocumentID 10.1039/d2sc04322j
PMC9627715
1889317
10_1039_D2SC04322J
d2sc04322j
GrantInformation_xml – fundername: ;
  grantid: DE-AC02-06CH11357
– fundername: ;
  grantid: Unassigned
GroupedDBID -JG
0-7
0R~
705
7~J
AAEMU
AAGNR
AAIWI
AAJAE
AAPBV
ACGFS
ACIWK
ADBBV
ADMRA
AENEX
AFVBQ
AGRSR
AGSTE
ALMA_UNASSIGNED_HOLDINGS
ANUXI
AOIJS
AUDPV
AZFZN
BCNDV
BLAPV
BSQNT
C6K
CAG
D0L
EE0
EF-
F5P
GROUPED_DOAJ
H13
HYE
HZ~
H~N
O-G
O9-
OK1
R7C
R7D
RCNCU
RNS
RPM
RRC
RSCEA
RVUXY
SKA
SKF
SKH
SKJ
SKM
SKR
SKZ
SLC
SLF
SLH
SMJ
53G
AAFWJ
AARTK
AAXHV
AAYXX
ABEMK
ABIQK
ABPDG
ABXOH
AEFDR
AESAV
AFLYV
AFPKN
AHGCF
AKBGW
APEMP
CITATION
PGMZT
RAOCF
7SR
8BQ
8FD
AGEGJ
JG9
7X8
OTOTI
5PM
AAWGC
ABASK
ACRPL
ADNMO
ADTOC
AETIL
AFRZK
AGQPQ
AHGXI
AKMSF
ALSGL
ANBJS
ANLMG
ASPBG
AVWKF
COF
ECGLT
FEDTE
HVGLF
J3G
J3H
J3I
L-8
ROL
RPMJG
UNPAY
ID FETCH-LOGICAL-c432t-d20aa75b54a866ae3a3df28a4e5e696ce17fca229a3194b8e4e4422857486a653
IEDL.DBID UNPAY
ISSN 2041-6520
2041-6539
IngestDate Sun Oct 26 03:20:17 EDT 2025
Tue Sep 30 17:18:54 EDT 2025
Mon Dec 16 02:26:20 EST 2024
Fri Sep 05 09:10:02 EDT 2025
Sun Jul 13 04:44:07 EDT 2025
Thu Apr 24 23:06:18 EDT 2025
Wed Oct 01 02:14:32 EDT 2025
Thu Oct 13 15:33:45 EDT 2022
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 39
Language English
License cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c432t-d20aa75b54a866ae3a3df28a4e5e696ce17fca229a3194b8e4e4422857486a653
Notes Electronic supplementary information (ESI) available: Evaluation details of the part-of-speech (POS) tagging, chemical-named-entity recognition, and abbreviation-detection datasets. See
https://doi.org/10.1039/d2sc04322j
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
AC02-06CH11357
USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)
USDOE
ORCID 0000-0002-1901-8361
0000-0002-1552-8743
0000000219018361
0000000215528743
OpenAccessLink https://proxy.k.utb.cz/login?url=https://pubs.rsc.org/en/content/articlepdf/2022/sc/d2sc04322j
PQID 2723843805
PQPubID 2047492
PageCount 9
ParticipantIDs proquest_journals_2723843805
crossref_primary_10_1039_D2SC04322J
unpaywall_primary_10_1039_d2sc04322j
osti_scitechconnect_1889317
crossref_citationtrail_10_1039_D2SC04322J
pubmedcentral_primary_oai_pubmedcentral_nih_gov_9627715
proquest_miscellaneous_2734617455
rsc_primary_d2sc04322j
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-10-12
PublicationDateYYYYMMDD 2022-10-12
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-10-12
  day: 12
PublicationDecade 2020
PublicationPlace Cambridge
PublicationPlace_xml – name: Cambridge
– name: United States
PublicationTitle Chemical science (Cambridge)
PublicationYear 2022
Publisher Royal Society of Chemistry
The Royal Society of Chemistry
Publisher_xml – sequence: 0
  name: Royal Society of Chemistry
– name: Royal Society of Chemistry
– name: The Royal Society of Chemistry
References Zhang (D2SC04322J/cit33/1) 2022; 131
Cruse (D2SC04322J/cit20/1) 2022; 9
Tshitoyan (D2SC04322J/cit31/1) 2019; 571
He (D2SC04322J/cit32/1) 2021; 45
Okazaki (D2SC04322J/cit37/1) 2007
Cole (D2SC04322J/cit17/1) 2021; 3
Jacob (D2SC04322J/cit62/1) 2018
Kauwe (D2SC04322J/cit11/1) 2019; 9
Zhao (D2SC04322J/cit25/1) 2022; 9
Huang (D2SC04322J/cit44/1) 2022
Alberi (D2SC04322J/cit15/1) 2018; 52
Veyseh (D2SC04322J/cit55/1) 2020
Sun (D2SC04322J/cit48/1) 2019
Nie (D2SC04322J/cit8/1) 2022
Torayev (D2SC04322J/cit29/1) 2019; 2
Jin (D2SC04322J/cit10/1) 2021; 23
Gaultois (D2SC04322J/cit18/1) 2013; 25
El-Bousiydy (D2SC04322J/cit30/1) 2021; 4
Beltagy (D2SC04322J/cit45/1) 2019
Beard (D2SC04322J/cit23/1) 2019; 6
Marcus (D2SC04322J/cit57/1) 1993; 19
Cooper (D2SC04322J/cit3/1) 2019; 9
Ramshaw (D2SC04322J/cit59/1) 1995
Yan (D2SC04322J/cit9/1) 2021; 1205
Crichton (D2SC04322J/cit52/1) 2017; 18
Vaswani (D2SC04322J/cit41/1) 2017; 30
Huang (D2SC04322J/cit24/1) 2020; 7
N. S. T. C. (US) (D2SC04322J/cit12/1) 2011
Wang (D2SC04322J/cit22/1) 2022; 1
Lee (D2SC04322J/cit28/1) 2019; 7
Weston (D2SC04322J/cit50/1) 2019; 59
Dong (D2SC04322J/cit26/1) 2022; 9
Gupta (D2SC04322J/cit43/1) 2022; 8
Goldsmith (D2SC04322J/cit1/1) 2018; 64
Sanh (D2SC04322J/cit61/1) 2019
Masala (D2SC04322J/cit7/1) 2019; 8
Foscato (D2SC04322J/cit2/1) 2020; 10
Swain (D2SC04322J/cit34/1) 2016; 56
Friedrich (D2SC04322J/cit51/1) 2020
Ghadbeigi (D2SC04322J/cit19/1) 2015; 8
Olivetti (D2SC04322J/cit16/1) 2020; 7
Krallinger (D2SC04322J/cit49/1) 2015; 7
Li (D2SC04322J/cit53/1) 2019
Morgan (D2SC04322J/cit6/1) 2022; 26
Trewartha (D2SC04322J/cit42/1) 2022; 3
de Pablo (D2SC04322J/cit13/1) 2019; 5
Isazawa (D2SC04322J/cit46/1) 2022; 62
Cole (D2SC04322J/cit4/1) 2020; 53
Neumann (D2SC04322J/cit58/1) 2019
Kononova (D2SC04322J/cit21/1) 2019; 6
Wang (D2SC04322J/cit27/1) 2022; 8
Mavracic (D2SC04322J/cit35/1) 2021; 61
Zhu (D2SC04322J/cit36/1) 2022; 62
Hochreiter (D2SC04322J/cit39/1) 1997; 9
Zhang (D2SC04322J/cit5/1) 2019; 107
Himanen (D2SC04322J/cit14/1) 2019; 6
Devlin (D2SC04322J/cit40/1) 2018
Souza (D2SC04322J/cit47/1) 2019
Zilio (D2SC04322J/cit54/1) 2022
Tjong Kim Sang (D2SC04322J/cit56/1) 2003
LeCun (D2SC04322J/cit38/1) 2015; 521
References_xml – issn: 2020
  end-page: p 3285-3301
  publication-title: Proceedings of the 28th International Conference on Computational Linguistics
  doi: Veyseh Dernoncourt Tran Nguyen
– issn: 2020
  volume-title: The SOFC-exp corpus and neural approaches to information extraction in the materials science domain
  doi: Friedrich Adel Tomazic Hingerl Benteau Maruscyk Lange
– issn: 1995
  publication-title: Third Workshop on Very Large Corpora
  doi: Ramshaw Marcus
– issn: 2019
  volume-title: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
  doi: Sanh Debut Chaumond Wolf
– issn: 2018
  volume-title: Bert: Pre-training of deep bidirectional transformers for language understanding
  doi: Devlin Chang Lee Toutanova
– issn: 2018
  end-page: p 2704-2713
  publication-title: Proceedings of the IEEE conference on computer vision and pattern recognition
  doi: Jacob Kligys Chen Zhu Tang Howard Adam Kalenichenko
– issn: 2019
  volume-title: SciBERT: A pretrained language model for scientific text
  doi: Beltagy Lo Cohan
– issn: 2019
  volume-title: Entity-relation extraction as multi-turn question answering
  doi: Li Yin Sun Li Yuan Chai Zhou Li
– issn: 2003
  end-page: p 142-147
  publication-title: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003
  doi: Tjong Kim Sang De Meulder
– issn: 2019
  end-page: p 194-206
  publication-title: China national conference on Chinese computational linguistics
  doi: Sun Qiu Xu Huang
– issn: 2007
  volume-title: CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
  doi: Okazaki
– issn: 2022
  volume-title: PLOD: An Abbreviation Detection Dataset for Scientific Documents
  doi: Zilio Saadany Sharma Kanojia Orasan
– issn: 2019
  volume-title: ScispaCy: fast and robust models for biomedical natural language processing
  doi: Neumann King Beltagy Ammar
– issn: 2018
  publication-title: seqeval: A Python framework for sequence labeling evaluation
  doi: Nakayama
– issn: 2019
  volume-title: Portuguese named entity recognition using BERT-CRF
  doi: Souza Nogueira Lotufo
– issn: 2011
  publication-title: Materials genome initiative for global competitiveness, Executive Office of the President
  doi: N. S. T. C. (US)
– volume: 53
  start-page: 599
  year: 2020
  ident: D2SC04322J/cit4/1
  publication-title: Acc. Chem. Res.
  doi: 10.1021/acs.accounts.9b00470
– year: 2019
  ident: D2SC04322J/cit45/1
– volume: 26
  start-page: 100975
  year: 2022
  ident: D2SC04322J/cit6/1
  publication-title: Curr. Opin. Solid State Mater. Sci.
  doi: 10.1016/j.cossms.2021.100975
– volume: 571
  start-page: 95
  year: 2019
  ident: D2SC04322J/cit31/1
  publication-title: Nature
  doi: 10.1038/s41586-019-1335-8
– volume: 25
  start-page: 2911
  year: 2013
  ident: D2SC04322J/cit18/1
  publication-title: Chem. Mater.
  doi: 10.1021/cm400893e
– year: 2022
  ident: D2SC04322J/cit44/1
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.2c00035
– volume: 9
  start-page: 54
  year: 2019
  ident: D2SC04322J/cit11/1
  publication-title: Crystals
  doi: 10.3390/cryst9010054
– year: 2019
  ident: D2SC04322J/cit47/1
– year: 2007
  ident: D2SC04322J/cit37/1
– volume: 7
  start-page: 1
  year: 2020
  ident: D2SC04322J/cit24/1
  publication-title: Sci. Data
  doi: 10.1038/s41597-020-00602-2
– volume: 7
  start-page: 295
  year: 2019
  ident: D2SC04322J/cit28/1
  publication-title: Int. J. Adv. Cult. Technol.
– volume: 64
  start-page: 2311
  year: 2018
  ident: D2SC04322J/cit1/1
  publication-title: AIChE J.
  doi: 10.1002/aic.16198
– volume: 1
  start-page: 313
  year: 2022
  ident: D2SC04322J/cit22/1
  publication-title: Digit. Discov.
  doi: 10.1039/D1DD00034A
– volume: 3
  start-page: 111
  year: 2021
  ident: D2SC04322J/cit17/1
  publication-title: Trends Chem.
  doi: 10.1016/j.trechm.2020.12.003
– volume: 1205
  start-page: 113443
  year: 2021
  ident: D2SC04322J/cit9/1
  publication-title: Comput. Theor. Chem.
  doi: 10.1016/j.comptc.2021.113443
– volume: 6
  start-page: 1
  year: 2019
  ident: D2SC04322J/cit23/1
  publication-title: Sci. Data
  doi: 10.1038/s41597-019-0306-0
– start-page: 2704
  volume-title: Proceedings of the IEEE conference on computer vision and pattern recognition
  year: 2018
  ident: D2SC04322J/cit62/1
– volume: 5
  start-page: 1
  year: 2019
  ident: D2SC04322J/cit13/1
  publication-title: npj Comput. Mater.
  doi: 10.1038/s41524-019-0173-4
– volume: 107
  start-page: 554
  year: 2019
  ident: D2SC04322J/cit5/1
  publication-title: Renewable Sustainable Energy Rev.
  doi: 10.1016/j.rser.2019.03.036
– volume: 19
  start-page: 313
  year: 1993
  ident: D2SC04322J/cit57/1
  publication-title: Comput. Ling.
– volume: 61
  start-page: 4280
  year: 2021
  ident: D2SC04322J/cit35/1
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.1c00446
– volume: 62
  start-page: 1633
  year: 2022
  ident: D2SC04322J/cit36/1
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.1c01198
– start-page: 142
  volume-title: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003
  year: 2003
  ident: D2SC04322J/cit56/1
  doi: 10.3115/1119176.1119195
– volume: 2
  start-page: 044004
  year: 2019
  ident: D2SC04322J/cit29/1
  publication-title: JPhys Mater.
  doi: 10.1088/2515-7639/ab3611
– year: 2019
  ident: D2SC04322J/cit61/1
– volume: 7
  start-page: 1
  year: 2015
  ident: D2SC04322J/cit49/1
  publication-title: J. Cheminf.
  doi: 10.1186/1758-2946-7-S1-S1
– volume: 18
  start-page: 1
  year: 2017
  ident: D2SC04322J/cit52/1
  publication-title: BMC Bioinf.
  doi: 10.1186/s12859-017-1776-8
– volume: 521
  start-page: 436
  year: 2015
  ident: D2SC04322J/cit38/1
  publication-title: Nature
  doi: 10.1038/nature14539
– volume: 8
  start-page: 1
  year: 2022
  ident: D2SC04322J/cit43/1
  publication-title: npj Comput. Mater.
  doi: 10.1038/s41524-021-00695-2
– volume: 30
  start-page: 5998
  year: 2017
  ident: D2SC04322J/cit41/1
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 6
  start-page: 1
  year: 2019
  ident: D2SC04322J/cit21/1
  publication-title: Sci. Data
  doi: 10.1038/s41597-018-0005-2
– year: 2019
  ident: D2SC04322J/cit53/1
– volume: 131
  start-page: 064902
  year: 2022
  ident: D2SC04322J/cit33/1
  publication-title: J. Appl. Phys.
  doi: 10.1063/5.0064875
– volume: 8
  start-page: 1
  year: 2022
  ident: D2SC04322J/cit27/1
  publication-title: npj Comput. Mater.
  doi: 10.1038/s41524-021-00695-2
– volume: 3
  start-page: 100488
  year: 2022
  ident: D2SC04322J/cit42/1
  publication-title: Patterns
  doi: 10.1016/j.patter.2022.100488
– start-page: 3285
  volume-title: Proceedings of the 28th International Conference on Computational Linguistics
  year: 2020
  ident: D2SC04322J/cit55/1
  doi: 10.18653/v1/2020.coling-main.292
– volume: 9
  start-page: 1802820
  year: 2019
  ident: D2SC04322J/cit3/1
  publication-title: Adv. Energy Mater.
  doi: 10.1002/aenm.201802820
– volume: 23
  start-page: 21470
  year: 2021
  ident: D2SC04322J/cit10/1
  publication-title: Phys. Chem. Chem. Phys.
  doi: 10.1039/D1CP02963K
– volume: 9
  start-page: 1735
  year: 1997
  ident: D2SC04322J/cit39/1
  publication-title: Neural Comput.
  doi: 10.1162/neco.1997.9.8.1735
– volume: 59
  start-page: 3692
  year: 2019
  ident: D2SC04322J/cit50/1
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.9b00470
– volume: 7
  start-page: 041317
  year: 2020
  ident: D2SC04322J/cit16/1
  publication-title: Appl. Phys. Rev.
  doi: 10.1063/5.0021106
– volume: 4
  start-page: 758
  year: 2021
  ident: D2SC04322J/cit30/1
  publication-title: Batteries Supercaps
  doi: 10.1002/batt.202000288
– year: 2018
  ident: D2SC04322J/cit40/1
– volume: 62
  start-page: 1207
  year: 2022
  ident: D2SC04322J/cit46/1
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.1c01199
– start-page: 2201437
  year: 2022
  ident: D2SC04322J/cit8/1
  publication-title: Adv. Funct. Mater.
  doi: 10.1002/adfm.202201437
– volume-title: Materials genome initiative for global competitiveness, Executive Office of the President
  year: 2011
  ident: D2SC04322J/cit12/1
– volume-title: Third Workshop on Very Large Corpora
  year: 1995
  ident: D2SC04322J/cit59/1
– volume: 8
  start-page: 1640
  year: 2015
  ident: D2SC04322J/cit19/1
  publication-title: Energy Environ. Sci.
  doi: 10.1039/C5EE00685F
– start-page: 194
  volume-title: China national conference on Chinese computational linguistics
  year: 2019
  ident: D2SC04322J/cit48/1
  doi: 10.1007/978-3-030-32381-3_16
– volume: 56
  start-page: 1894
  year: 2016
  ident: D2SC04322J/cit34/1
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.6b00207
– volume: 10
  start-page: 2354
  year: 2020
  ident: D2SC04322J/cit2/1
  publication-title: ACS Catal.
  doi: 10.1021/acscatal.9b04952
– volume: 8
  start-page: 145
  year: 2019
  ident: D2SC04322J/cit7/1
  publication-title: CNL Nucl. Rev.
  doi: 10.12943/CNR.2018.00004
– volume: 52
  start-page: 013001
  year: 2018
  ident: D2SC04322J/cit15/1
  publication-title: J. Phys. D: Appl. Phys.
  doi: 10.1088/1361-6463/aad926
– volume: 6
  start-page: 1900808
  year: 2019
  ident: D2SC04322J/cit14/1
  publication-title: Adv. Sci.
  doi: 10.1002/advs.201900808
– year: 2019
  ident: D2SC04322J/cit58/1
– volume: 9
  start-page: 192
  year: 2022
  ident: D2SC04322J/cit25/1
  publication-title: Sci. Data
  doi: 10.1038/s41597-022-01295-5
– volume: 9
  start-page: 193
  year: 2022
  ident: D2SC04322J/cit26/1
  publication-title: Sci. Data
  doi: 10.1038/s41597-022-01294-6
– year: 2022
  ident: D2SC04322J/cit54/1
– year: 2020
  ident: D2SC04322J/cit51/1
– volume: 9
  start-page: 1
  year: 2022
  ident: D2SC04322J/cit20/1
  publication-title: Sci. Data
  doi: 10.1038/s41597-022-01321-6
– volume: 45
  start-page: 15521
  year: 2021
  ident: D2SC04322J/cit32/1
  publication-title: Int. J. Energy Res.
  doi: 10.1002/er.6776
SSID ssj0000331527
Score 2.4807312
Snippet Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text...
SourceID unpaywall
pubmedcentral
osti
proquest
crossref
rsc
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 11487
SubjectTerms Algorithms
Automation
Chemistry
Classification
Coders
Data mining
Embedding
ENERGY STORAGE
Machine learning
Natural language processing
Open source software
Performance evaluation
Scientific papers
Source code
Speech recognition
Toolkits
Transformers
Title BatteryDataExtractor: battery-aware text-mining software embedded with BERT models
URI https://www.proquest.com/docview/2723843805
https://www.proquest.com/docview/2734617455
https://www.osti.gov/biblio/1889317
https://pubmed.ncbi.nlm.nih.gov/PMC9627715
https://pubs.rsc.org/en/content/articlepdf/2022/sc/d2sc04322j
UnpaywallVersion publishedVersion
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals
  customDbUrl:
  eissn: 2041-6539
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000331527
  issn: 2041-6520
  databaseCode: DOA
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 2041-6539
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000331527
  issn: 2041-6520
  databaseCode: RPM
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVAUL
  databaseName: Royal Society of Chemistry Free Journals plus Gold OA Content 2021
  customDbUrl:
  eissn: 2041-6539
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000331527
  issn: 2041-6520
  databaseCode: RVUXY
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://pubs.rsc.org/
  providerName: Royal Society of Chemistry
– providerCode: PRVAUL
  databaseName: Royal Society of Chemistry Free Journals plus Gold OA Content 2023
  issn: 2041-6520
  databaseCode: AKBGW
  dateStart: 20150101
  customDbUrl: https://pubs.rsc.org
  isFulltext: true
  eissn: 2041-6539
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000331527
  providerName: Royal Society of Chemistry
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb9QwELXo9lA48F0RWqogeuGQ3caxnYRb2W5VVaJCpSuVUzR2ZsXHkl1tvCrl1zN2smm3cEDcomQijT3j5I0zeY-xfYMKTS4xIkiURQJiHUHu_gLJJzwzUpnY7-l-OFMnY3F6KS_b3hz3Lww5UfcXdUMRjFS-O46myg7aeZyXE1eu80FtBiWvjSOU49822KaSBMV7bHN89vHwsxOUOxBxpKSnZWyPk3xFT5rkt25eeyH1ZrSw1sDm3VbJDXLuAdtaVnO4voLp9Nbb6PhRI7nqx-GbUL73l1b3za87FI__PdDH7GGLU8PDxu4Ju4fVU7Y1XMnDPWPnDTPn9RFYGP20C6_b8y7UzdkIrmCBofV1tdegCGt64PuT-EMjPe7K0O0Bh-9H5xeh1-Opn7Px8ehieBK1Ag2RIXdsVPIDgFRqKSBTCjCBpKQQg0CJKlcG43RigPMcaKELnaFA4SjHZCoyBRSRbdarZhW-YGEmKEMMn1C5aQQSasqMpkoMUWjQ5FLA3q6iVJiWvdyJaEwL_xU9yYsj_mnoZ-k0YG8623nD2fFXqx0X7IKQhqPLNa6vyNgizgjBxWnAdlc5ULSrui64U2hzFP0yYK-7yzTv7iMLVDhbOptEECoUkmzStdzpfHGM3utXqq9fPLO3U0JKY7pzmzKjs7-Jf8D2u8T7Y2g3Zi__zWyH3XcpFvn-nF3Ws4slviKIZfWe35rYa5fTb71uJ0s
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwED6x7mHwwO-JsIGC2AsPaRfHdhLeRtdpmsSExiqNp8h2rhpQ0qpxNcZfz9lJs3XwgHiLkot09p2T75zL9wHsGZRocoERQaIs4irWkcrdXyD5hGVGSBP7Pd2Pp_J4zE8uxEXbm-P-hSEn6v6ibiiCkcp3x9FU2UE7j_Ny4sp1NqjNoGS1cYRy7NsGbEpBULwHm-PTTwdfnKDcPo8jKTwtY3uc5Ct60iS_dfPaC6k3o4W1BjbvtkpukHMPYGtZzdX1lZpOb72Njh41kqt-HL4J5Xt_aXXf_LpD8fjfA30MD1ucGh40dk_gHlZPYWu4kod7BmcNM-f1obJq9NMuvG7P-1A3ZyN1pRYYWl9Xew2KsKYHvj-JPzTS464M3R5w-GF0dh56PZ76OYyPRufD46gVaIgMuWOjku0rlQotuMqkVJiopKQQK44CZS4NxunEKMZyRQud6ww5ckc5JlKeSUUR2YZeNavwBYQZpwwxbELlpuFIqCkzmioxRK6VJpcCeLeKUmFa9nInojEt_Ff0JC8O2eehn6WTAN52tvOGs-OvVjsu2AUhDUeXa1xfkbFFnBGCi9MAdlc5ULSrui6YU2hzFP0igDfdZZp395FFVThbOpuEEyrkgmzStdzpfHGM3utXqq-XntnbKSGlMd25TZnR2d_EP4C9LvH-GNqN2ct_M9uB-y7FIt-fsws9u1jiK4JYVr9uF9Jv12gmVg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BatteryDataExtractor%3A+battery-aware+text-mining+software+embedded+with+BERT+models&rft.jtitle=Chemical+science+%28Cambridge%29&rft.au=Huang%2C+Shu&rft.au=Cole%2C+Jacqueline+M&rft.date=2022-10-12&rft.issn=2041-6520&rft.eissn=2041-6539&rft.volume=13&rft.issue=39&rft.spage=11487&rft.epage=11495&rft_id=info:doi/10.1039%2Fd2sc04322j&rft.externalDocID=d2sc04322j
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2041-6520&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2041-6520&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2041-6520&client=summon