BatteryDataExtractor: battery-aware text-mining software embedded with BERT models
Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation...
        Saved in:
      
    
          | Published in | Chemical science (Cambridge) Vol. 13; no. 39; pp. 11487 - 11495 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Cambridge
          Royal Society of Chemistry
    
        12.10.2022
     The Royal Society of Chemistry  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2041-6520 2041-6539 2041-6539  | 
| DOI | 10.1039/d2sc04322j | 
Cover
| Abstract | Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.
BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model. | 
    
|---|---|
| AbstractList | Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model. Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model. Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.  | 
    
| Author | Cole, Jacqueline M Huang, Shu  | 
    
| AuthorAffiliation | Harwell Science and Innovation Campus Rutherford Appleton Laboratory Cavendish Laboratory University of Cambridge Department of Physics ISIS Neutron and Muon Source  | 
    
| AuthorAffiliation_xml | – name: Department of Physics – name: University of Cambridge – name: Harwell Science and Innovation Campus – name: Rutherford Appleton Laboratory – name: Cavendish Laboratory – name: ISIS Neutron and Muon Source  | 
    
| Author_xml | – sequence: 1 givenname: Shu surname: Huang fullname: Huang, Shu – sequence: 2 givenname: Jacqueline M surname: Cole fullname: Cole, Jacqueline M  | 
    
| BackLink | https://www.osti.gov/biblio/1889317$$D View this record in Osti.gov | 
    
| BookMark | eNpdkMtv1DAQxiNUJErphTtSBBcECviVVw9IdLu8VAmplLM1cSZdrxJ7sR22-9_jNNWWri-2xr_5Zr7veXJkrMEkeUnJB0p4_bFlXhHBGVs_SY4ZETQrcl4f7d-MPEtOvV-TeDinOSuPk6tzCAHd7gICLG-DAxWsO0ubuZrBFhymAW9DNmijzU3qbRfuijg02LbYplsdVun58uo6HWyLvX-RPO2g93h6f58kv78srxffssufX78vPl9mKu4YspYRgDJvcgFVUQBy4G3HKhCYY1EXCmnZKWCsBk5r0VQoUAjGqrwUVQHR2UnyftYdzQZ2W-h7uXF6ALeTlMgpEfmQSKQ_zfRmbAZsFZpo9qHDgpaPf4xeyRv7V9YFK0s6jXs9C1gftPRKB1QrZY1BFSStqprTMkJv76c4-2dEH-SgvcK-B4N29JKVXBS0FPmk9-YAXdvRmZhYpBivBK_IRJGZUs5677CTcTAEbacldb93esF-Le6c_ogt7w5aDmN5BL-aYefVnvsvtn-9ebnL | 
    
| CitedBy_id | crossref_primary_10_1039_D3DD00159H crossref_primary_10_1021_acs_jcim_4c00063 crossref_primary_10_1063_5_0251325 crossref_primary_10_1039_D4CS00913D crossref_primary_10_1016_j_jece_2023_111384 crossref_primary_10_1038_s43246_024_00449_9 crossref_primary_10_1016_j_ijinfomgt_2023_102725 crossref_primary_10_1039_D3DD00099K crossref_primary_10_1021_acs_jcim_3c00422 crossref_primary_10_1038_s41467_024_45394_w crossref_primary_10_1016_j_patter_2024_100955 crossref_primary_10_1039_D4DD00307A crossref_primary_10_1021_acs_jcim_4c00816 crossref_primary_10_1002_adfm_202302630 crossref_primary_10_1016_j_ces_2024_120916 crossref_primary_10_1021_acs_chemmater_3c00788 crossref_primary_10_1038_s41524_025_01554_0  | 
    
| Cites_doi | 10.1021/acs.accounts.9b00470 10.1016/j.cossms.2021.100975 10.1038/s41586-019-1335-8 10.1021/cm400893e 10.1021/acs.jcim.2c00035 10.3390/cryst9010054 10.1038/s41597-020-00602-2 10.1002/aic.16198 10.1039/D1DD00034A 10.1016/j.trechm.2020.12.003 10.1016/j.comptc.2021.113443 10.1038/s41597-019-0306-0 10.1038/s41524-019-0173-4 10.1016/j.rser.2019.03.036 10.1021/acs.jcim.1c00446 10.1021/acs.jcim.1c01198 10.3115/1119176.1119195 10.1088/2515-7639/ab3611 10.1186/1758-2946-7-S1-S1 10.1186/s12859-017-1776-8 10.1038/nature14539 10.1038/s41524-021-00695-2 10.1038/s41597-018-0005-2 10.1063/5.0064875 10.1016/j.patter.2022.100488 10.18653/v1/2020.coling-main.292 10.1002/aenm.201802820 10.1039/D1CP02963K 10.1162/neco.1997.9.8.1735 10.1021/acs.jcim.9b00470 10.1063/5.0021106 10.1002/batt.202000288 10.1021/acs.jcim.1c01199 10.1002/adfm.202201437 10.1039/C5EE00685F 10.1007/978-3-030-32381-3_16 10.1021/acs.jcim.6b00207 10.1021/acscatal.9b04952 10.12943/CNR.2018.00004 10.1088/1361-6463/aad926 10.1002/advs.201900808 10.1038/s41597-022-01295-5 10.1038/s41597-022-01294-6 10.1038/s41597-022-01321-6 10.1002/er.6776  | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright Royal Society of Chemistry 2022 This journal is © The Royal Society of Chemistry. This journal is © The Royal Society of Chemistry 2022 The Royal Society of Chemistry  | 
    
| Copyright_xml | – notice: Copyright Royal Society of Chemistry 2022 – notice: This journal is © The Royal Society of Chemistry. – notice: This journal is © The Royal Society of Chemistry 2022 The Royal Society of Chemistry  | 
    
| CorporateAuthor | Argonne National Laboratory (ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF) | 
    
| CorporateAuthor_xml | – sequence: 0 name: Argonne National Laboratory (ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF)  | 
    
| DBID | AAYXX CITATION 7SR 8BQ 8FD JG9 7X8 OTOTI 5PM ADTOC UNPAY  | 
    
| DOI | 10.1039/d2sc04322j | 
    
| DatabaseName | CrossRef Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database MEDLINE - Academic OSTI.GOV PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall  | 
    
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database METADEX MEDLINE - Academic  | 
    
| DatabaseTitleList | MEDLINE - Academic CrossRef Materials Research Database  | 
    
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Chemistry | 
    
| EISSN | 2041-6539 | 
    
| EndPage | 11495 | 
    
| ExternalDocumentID | 10.1039/d2sc04322j PMC9627715 1889317 10_1039_D2SC04322J d2sc04322j  | 
    
| GrantInformation_xml | – fundername: ; grantid: DE-AC02-06CH11357 – fundername: ; grantid: Unassigned  | 
    
| GroupedDBID | -JG 0-7 0R~ 705 7~J AAEMU AAGNR AAIWI AAJAE AAPBV ACGFS ACIWK ADBBV ADMRA AENEX AFVBQ AGRSR AGSTE ALMA_UNASSIGNED_HOLDINGS ANUXI AOIJS AUDPV AZFZN BCNDV BLAPV BSQNT C6K CAG D0L EE0 EF- F5P GROUPED_DOAJ H13 HYE HZ~ H~N O-G O9- OK1 R7C R7D RCNCU RNS RPM RRC RSCEA RVUXY SKA SKF SKH SKJ SKM SKR SKZ SLC SLF SLH SMJ 53G AAFWJ AARTK AAXHV AAYXX ABEMK ABIQK ABPDG ABXOH AEFDR AESAV AFLYV AFPKN AHGCF AKBGW APEMP CITATION PGMZT RAOCF 7SR 8BQ 8FD AGEGJ JG9 7X8 OTOTI 5PM AAWGC ABASK ACRPL ADNMO ADTOC AETIL AFRZK AGQPQ AHGXI AKMSF ALSGL ANBJS ANLMG ASPBG AVWKF COF ECGLT FEDTE HVGLF J3G J3H J3I L-8 ROL RPMJG UNPAY  | 
    
| ID | FETCH-LOGICAL-c432t-d20aa75b54a866ae3a3df28a4e5e696ce17fca229a3194b8e4e4422857486a653 | 
    
| IEDL.DBID | UNPAY | 
    
| ISSN | 2041-6520 2041-6539  | 
    
| IngestDate | Sun Oct 26 03:20:17 EDT 2025 Tue Sep 30 17:18:54 EDT 2025 Mon Dec 16 02:26:20 EST 2024 Fri Sep 05 09:10:02 EDT 2025 Sun Jul 13 04:44:07 EDT 2025 Thu Apr 24 23:06:18 EDT 2025 Wed Oct 01 02:14:32 EDT 2025 Thu Oct 13 15:33:45 EDT 2022  | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 39 | 
    
| Language | English | 
    
| License | cc-by | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c432t-d20aa75b54a866ae3a3df28a4e5e696ce17fca229a3194b8e4e4422857486a653 | 
    
| Notes | Electronic supplementary information (ESI) available: Evaluation details of the part-of-speech (POS) tagging, chemical-named-entity recognition, and abbreviation-detection datasets. See https://doi.org/10.1039/d2sc04322j ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 AC02-06CH11357 USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF) USDOE  | 
    
| ORCID | 0000-0002-1901-8361 0000-0002-1552-8743 0000000219018361 0000000215528743  | 
    
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://pubs.rsc.org/en/content/articlepdf/2022/sc/d2sc04322j | 
    
| PQID | 2723843805 | 
    
| PQPubID | 2047492 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | proquest_journals_2723843805 crossref_primary_10_1039_D2SC04322J unpaywall_primary_10_1039_d2sc04322j osti_scitechconnect_1889317 crossref_citationtrail_10_1039_D2SC04322J pubmedcentral_primary_oai_pubmedcentral_nih_gov_9627715 proquest_miscellaneous_2734617455 rsc_primary_d2sc04322j  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2022-10-12 | 
    
| PublicationDateYYYYMMDD | 2022-10-12 | 
    
| PublicationDate_xml | – month: 10 year: 2022 text: 2022-10-12 day: 12  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | Cambridge | 
    
| PublicationPlace_xml | – name: Cambridge – name: United States  | 
    
| PublicationTitle | Chemical science (Cambridge) | 
    
| PublicationYear | 2022 | 
    
| Publisher | Royal Society of Chemistry The Royal Society of Chemistry  | 
    
| Publisher_xml | – sequence: 0 name: Royal Society of Chemistry – name: Royal Society of Chemistry – name: The Royal Society of Chemistry  | 
    
| References | Zhang (D2SC04322J/cit33/1) 2022; 131 Cruse (D2SC04322J/cit20/1) 2022; 9 Tshitoyan (D2SC04322J/cit31/1) 2019; 571 He (D2SC04322J/cit32/1) 2021; 45 Okazaki (D2SC04322J/cit37/1) 2007 Cole (D2SC04322J/cit17/1) 2021; 3 Jacob (D2SC04322J/cit62/1) 2018 Kauwe (D2SC04322J/cit11/1) 2019; 9 Zhao (D2SC04322J/cit25/1) 2022; 9 Huang (D2SC04322J/cit44/1) 2022 Alberi (D2SC04322J/cit15/1) 2018; 52 Veyseh (D2SC04322J/cit55/1) 2020 Sun (D2SC04322J/cit48/1) 2019 Nie (D2SC04322J/cit8/1) 2022 Torayev (D2SC04322J/cit29/1) 2019; 2 Jin (D2SC04322J/cit10/1) 2021; 23 Gaultois (D2SC04322J/cit18/1) 2013; 25 El-Bousiydy (D2SC04322J/cit30/1) 2021; 4 Beltagy (D2SC04322J/cit45/1) 2019 Beard (D2SC04322J/cit23/1) 2019; 6 Marcus (D2SC04322J/cit57/1) 1993; 19 Cooper (D2SC04322J/cit3/1) 2019; 9 Ramshaw (D2SC04322J/cit59/1) 1995 Yan (D2SC04322J/cit9/1) 2021; 1205 Crichton (D2SC04322J/cit52/1) 2017; 18 Vaswani (D2SC04322J/cit41/1) 2017; 30 Huang (D2SC04322J/cit24/1) 2020; 7 N. S. T. C. (US) (D2SC04322J/cit12/1) 2011 Wang (D2SC04322J/cit22/1) 2022; 1 Lee (D2SC04322J/cit28/1) 2019; 7 Weston (D2SC04322J/cit50/1) 2019; 59 Dong (D2SC04322J/cit26/1) 2022; 9 Gupta (D2SC04322J/cit43/1) 2022; 8 Goldsmith (D2SC04322J/cit1/1) 2018; 64 Sanh (D2SC04322J/cit61/1) 2019 Masala (D2SC04322J/cit7/1) 2019; 8 Foscato (D2SC04322J/cit2/1) 2020; 10 Swain (D2SC04322J/cit34/1) 2016; 56 Friedrich (D2SC04322J/cit51/1) 2020 Ghadbeigi (D2SC04322J/cit19/1) 2015; 8 Olivetti (D2SC04322J/cit16/1) 2020; 7 Krallinger (D2SC04322J/cit49/1) 2015; 7 Li (D2SC04322J/cit53/1) 2019 Morgan (D2SC04322J/cit6/1) 2022; 26 Trewartha (D2SC04322J/cit42/1) 2022; 3 de Pablo (D2SC04322J/cit13/1) 2019; 5 Isazawa (D2SC04322J/cit46/1) 2022; 62 Cole (D2SC04322J/cit4/1) 2020; 53 Neumann (D2SC04322J/cit58/1) 2019 Kononova (D2SC04322J/cit21/1) 2019; 6 Wang (D2SC04322J/cit27/1) 2022; 8 Mavracic (D2SC04322J/cit35/1) 2021; 61 Zhu (D2SC04322J/cit36/1) 2022; 62 Hochreiter (D2SC04322J/cit39/1) 1997; 9 Zhang (D2SC04322J/cit5/1) 2019; 107 Himanen (D2SC04322J/cit14/1) 2019; 6 Devlin (D2SC04322J/cit40/1) 2018 Souza (D2SC04322J/cit47/1) 2019 Zilio (D2SC04322J/cit54/1) 2022 Tjong Kim Sang (D2SC04322J/cit56/1) 2003 LeCun (D2SC04322J/cit38/1) 2015; 521  | 
    
| References_xml | – issn: 2020 end-page: p 3285-3301 publication-title: Proceedings of the 28th International Conference on Computational Linguistics doi: Veyseh Dernoncourt Tran Nguyen – issn: 2020 volume-title: The SOFC-exp corpus and neural approaches to information extraction in the materials science domain doi: Friedrich Adel Tomazic Hingerl Benteau Maruscyk Lange – issn: 1995 publication-title: Third Workshop on Very Large Corpora doi: Ramshaw Marcus – issn: 2019 volume-title: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter doi: Sanh Debut Chaumond Wolf – issn: 2018 volume-title: Bert: Pre-training of deep bidirectional transformers for language understanding doi: Devlin Chang Lee Toutanova – issn: 2018 end-page: p 2704-2713 publication-title: Proceedings of the IEEE conference on computer vision and pattern recognition doi: Jacob Kligys Chen Zhu Tang Howard Adam Kalenichenko – issn: 2019 volume-title: SciBERT: A pretrained language model for scientific text doi: Beltagy Lo Cohan – issn: 2019 volume-title: Entity-relation extraction as multi-turn question answering doi: Li Yin Sun Li Yuan Chai Zhou Li – issn: 2003 end-page: p 142-147 publication-title: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 doi: Tjong Kim Sang De Meulder – issn: 2019 end-page: p 194-206 publication-title: China national conference on Chinese computational linguistics doi: Sun Qiu Xu Huang – issn: 2007 volume-title: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) doi: Okazaki – issn: 2022 volume-title: PLOD: An Abbreviation Detection Dataset for Scientific Documents doi: Zilio Saadany Sharma Kanojia Orasan – issn: 2019 volume-title: ScispaCy: fast and robust models for biomedical natural language processing doi: Neumann King Beltagy Ammar – issn: 2018 publication-title: seqeval: A Python framework for sequence labeling evaluation doi: Nakayama – issn: 2019 volume-title: Portuguese named entity recognition using BERT-CRF doi: Souza Nogueira Lotufo – issn: 2011 publication-title: Materials genome initiative for global competitiveness, Executive Office of the President doi: N. S. T. C. (US) – volume: 53 start-page: 599 year: 2020 ident: D2SC04322J/cit4/1 publication-title: Acc. Chem. Res. doi: 10.1021/acs.accounts.9b00470 – year: 2019 ident: D2SC04322J/cit45/1 – volume: 26 start-page: 100975 year: 2022 ident: D2SC04322J/cit6/1 publication-title: Curr. Opin. Solid State Mater. Sci. doi: 10.1016/j.cossms.2021.100975 – volume: 571 start-page: 95 year: 2019 ident: D2SC04322J/cit31/1 publication-title: Nature doi: 10.1038/s41586-019-1335-8 – volume: 25 start-page: 2911 year: 2013 ident: D2SC04322J/cit18/1 publication-title: Chem. Mater. doi: 10.1021/cm400893e – year: 2022 ident: D2SC04322J/cit44/1 publication-title: J. Chem. Inf. Model. doi: 10.1021/acs.jcim.2c00035 – volume: 9 start-page: 54 year: 2019 ident: D2SC04322J/cit11/1 publication-title: Crystals doi: 10.3390/cryst9010054 – year: 2019 ident: D2SC04322J/cit47/1 – year: 2007 ident: D2SC04322J/cit37/1 – volume: 7 start-page: 1 year: 2020 ident: D2SC04322J/cit24/1 publication-title: Sci. Data doi: 10.1038/s41597-020-00602-2 – volume: 7 start-page: 295 year: 2019 ident: D2SC04322J/cit28/1 publication-title: Int. J. Adv. Cult. Technol. – volume: 64 start-page: 2311 year: 2018 ident: D2SC04322J/cit1/1 publication-title: AIChE J. doi: 10.1002/aic.16198 – volume: 1 start-page: 313 year: 2022 ident: D2SC04322J/cit22/1 publication-title: Digit. Discov. doi: 10.1039/D1DD00034A – volume: 3 start-page: 111 year: 2021 ident: D2SC04322J/cit17/1 publication-title: Trends Chem. doi: 10.1016/j.trechm.2020.12.003 – volume: 1205 start-page: 113443 year: 2021 ident: D2SC04322J/cit9/1 publication-title: Comput. Theor. Chem. doi: 10.1016/j.comptc.2021.113443 – volume: 6 start-page: 1 year: 2019 ident: D2SC04322J/cit23/1 publication-title: Sci. Data doi: 10.1038/s41597-019-0306-0 – start-page: 2704 volume-title: Proceedings of the IEEE conference on computer vision and pattern recognition year: 2018 ident: D2SC04322J/cit62/1 – volume: 5 start-page: 1 year: 2019 ident: D2SC04322J/cit13/1 publication-title: npj Comput. Mater. doi: 10.1038/s41524-019-0173-4 – volume: 107 start-page: 554 year: 2019 ident: D2SC04322J/cit5/1 publication-title: Renewable Sustainable Energy Rev. doi: 10.1016/j.rser.2019.03.036 – volume: 19 start-page: 313 year: 1993 ident: D2SC04322J/cit57/1 publication-title: Comput. Ling. – volume: 61 start-page: 4280 year: 2021 ident: D2SC04322J/cit35/1 publication-title: J. Chem. Inf. Model. doi: 10.1021/acs.jcim.1c00446 – volume: 62 start-page: 1633 year: 2022 ident: D2SC04322J/cit36/1 publication-title: J. Chem. Inf. Model. doi: 10.1021/acs.jcim.1c01198 – start-page: 142 volume-title: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 year: 2003 ident: D2SC04322J/cit56/1 doi: 10.3115/1119176.1119195 – volume: 2 start-page: 044004 year: 2019 ident: D2SC04322J/cit29/1 publication-title: JPhys Mater. doi: 10.1088/2515-7639/ab3611 – year: 2019 ident: D2SC04322J/cit61/1 – volume: 7 start-page: 1 year: 2015 ident: D2SC04322J/cit49/1 publication-title: J. Cheminf. doi: 10.1186/1758-2946-7-S1-S1 – volume: 18 start-page: 1 year: 2017 ident: D2SC04322J/cit52/1 publication-title: BMC Bioinf. doi: 10.1186/s12859-017-1776-8 – volume: 521 start-page: 436 year: 2015 ident: D2SC04322J/cit38/1 publication-title: Nature doi: 10.1038/nature14539 – volume: 8 start-page: 1 year: 2022 ident: D2SC04322J/cit43/1 publication-title: npj Comput. Mater. doi: 10.1038/s41524-021-00695-2 – volume: 30 start-page: 5998 year: 2017 ident: D2SC04322J/cit41/1 publication-title: Adv. Neural Inf. Process. Syst. – volume: 6 start-page: 1 year: 2019 ident: D2SC04322J/cit21/1 publication-title: Sci. Data doi: 10.1038/s41597-018-0005-2 – year: 2019 ident: D2SC04322J/cit53/1 – volume: 131 start-page: 064902 year: 2022 ident: D2SC04322J/cit33/1 publication-title: J. Appl. Phys. doi: 10.1063/5.0064875 – volume: 8 start-page: 1 year: 2022 ident: D2SC04322J/cit27/1 publication-title: npj Comput. Mater. doi: 10.1038/s41524-021-00695-2 – volume: 3 start-page: 100488 year: 2022 ident: D2SC04322J/cit42/1 publication-title: Patterns doi: 10.1016/j.patter.2022.100488 – start-page: 3285 volume-title: Proceedings of the 28th International Conference on Computational Linguistics year: 2020 ident: D2SC04322J/cit55/1 doi: 10.18653/v1/2020.coling-main.292 – volume: 9 start-page: 1802820 year: 2019 ident: D2SC04322J/cit3/1 publication-title: Adv. Energy Mater. doi: 10.1002/aenm.201802820 – volume: 23 start-page: 21470 year: 2021 ident: D2SC04322J/cit10/1 publication-title: Phys. Chem. Chem. Phys. doi: 10.1039/D1CP02963K – volume: 9 start-page: 1735 year: 1997 ident: D2SC04322J/cit39/1 publication-title: Neural Comput. doi: 10.1162/neco.1997.9.8.1735 – volume: 59 start-page: 3692 year: 2019 ident: D2SC04322J/cit50/1 publication-title: J. Chem. Inf. Model. doi: 10.1021/acs.jcim.9b00470 – volume: 7 start-page: 041317 year: 2020 ident: D2SC04322J/cit16/1 publication-title: Appl. Phys. Rev. doi: 10.1063/5.0021106 – volume: 4 start-page: 758 year: 2021 ident: D2SC04322J/cit30/1 publication-title: Batteries Supercaps doi: 10.1002/batt.202000288 – year: 2018 ident: D2SC04322J/cit40/1 – volume: 62 start-page: 1207 year: 2022 ident: D2SC04322J/cit46/1 publication-title: J. Chem. Inf. Model. doi: 10.1021/acs.jcim.1c01199 – start-page: 2201437 year: 2022 ident: D2SC04322J/cit8/1 publication-title: Adv. Funct. Mater. doi: 10.1002/adfm.202201437 – volume-title: Materials genome initiative for global competitiveness, Executive Office of the President year: 2011 ident: D2SC04322J/cit12/1 – volume-title: Third Workshop on Very Large Corpora year: 1995 ident: D2SC04322J/cit59/1 – volume: 8 start-page: 1640 year: 2015 ident: D2SC04322J/cit19/1 publication-title: Energy Environ. Sci. doi: 10.1039/C5EE00685F – start-page: 194 volume-title: China national conference on Chinese computational linguistics year: 2019 ident: D2SC04322J/cit48/1 doi: 10.1007/978-3-030-32381-3_16 – volume: 56 start-page: 1894 year: 2016 ident: D2SC04322J/cit34/1 publication-title: J. Chem. Inf. Model. doi: 10.1021/acs.jcim.6b00207 – volume: 10 start-page: 2354 year: 2020 ident: D2SC04322J/cit2/1 publication-title: ACS Catal. doi: 10.1021/acscatal.9b04952 – volume: 8 start-page: 145 year: 2019 ident: D2SC04322J/cit7/1 publication-title: CNL Nucl. Rev. doi: 10.12943/CNR.2018.00004 – volume: 52 start-page: 013001 year: 2018 ident: D2SC04322J/cit15/1 publication-title: J. Phys. D: Appl. Phys. doi: 10.1088/1361-6463/aad926 – volume: 6 start-page: 1900808 year: 2019 ident: D2SC04322J/cit14/1 publication-title: Adv. Sci. doi: 10.1002/advs.201900808 – year: 2019 ident: D2SC04322J/cit58/1 – volume: 9 start-page: 192 year: 2022 ident: D2SC04322J/cit25/1 publication-title: Sci. Data doi: 10.1038/s41597-022-01295-5 – volume: 9 start-page: 193 year: 2022 ident: D2SC04322J/cit26/1 publication-title: Sci. Data doi: 10.1038/s41597-022-01294-6 – year: 2022 ident: D2SC04322J/cit54/1 – year: 2020 ident: D2SC04322J/cit51/1 – volume: 9 start-page: 1 year: 2022 ident: D2SC04322J/cit20/1 publication-title: Sci. Data doi: 10.1038/s41597-022-01321-6 – volume: 45 start-page: 15521 year: 2021 ident: D2SC04322J/cit32/1 publication-title: Int. J. Energy Res. doi: 10.1002/er.6776  | 
    
| SSID | ssj0000331527 | 
    
| Score | 2.4807312 | 
    
| Snippet | Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text... | 
    
| SourceID | unpaywall pubmedcentral osti proquest crossref rsc  | 
    
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher  | 
    
| StartPage | 11487 | 
    
| SubjectTerms | Algorithms Automation Chemistry Classification Coders Data mining Embedding ENERGY STORAGE Machine learning Natural language processing Open source software Performance evaluation Scientific papers Source code Speech recognition Toolkits Transformers  | 
    
| Title | BatteryDataExtractor: battery-aware text-mining software embedded with BERT models | 
    
| URI | https://www.proquest.com/docview/2723843805 https://www.proquest.com/docview/2734617455 https://www.osti.gov/biblio/1889317 https://pubmed.ncbi.nlm.nih.gov/PMC9627715 https://pubs.rsc.org/en/content/articlepdf/2022/sc/d2sc04322j  | 
    
| UnpaywallVersion | publishedVersion | 
    
| Volume | 13 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Directory of Open Access Journals customDbUrl: eissn: 2041-6539 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000331527 issn: 2041-6520 databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 2041-6539 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000331527 issn: 2041-6520 databaseCode: RPM dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVAUL databaseName: Royal Society of Chemistry Free Journals plus Gold OA Content 2021 customDbUrl: eissn: 2041-6539 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000331527 issn: 2041-6520 databaseCode: RVUXY dateStart: 20150101 isFulltext: true titleUrlDefault: https://pubs.rsc.org/ providerName: Royal Society of Chemistry – providerCode: PRVAUL databaseName: Royal Society of Chemistry Free Journals plus Gold OA Content 2023 issn: 2041-6520 databaseCode: AKBGW dateStart: 20150101 customDbUrl: https://pubs.rsc.org isFulltext: true eissn: 2041-6539 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000331527 providerName: Royal Society of Chemistry  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb9QwELXo9lA48F0RWqogeuGQ3caxnYRb2W5VVaJCpSuVUzR2ZsXHkl1tvCrl1zN2smm3cEDcomQijT3j5I0zeY-xfYMKTS4xIkiURQJiHUHu_gLJJzwzUpnY7-l-OFMnY3F6KS_b3hz3Lww5UfcXdUMRjFS-O46myg7aeZyXE1eu80FtBiWvjSOU49822KaSBMV7bHN89vHwsxOUOxBxpKSnZWyPk3xFT5rkt25eeyH1ZrSw1sDm3VbJDXLuAdtaVnO4voLp9Nbb6PhRI7nqx-GbUL73l1b3za87FI__PdDH7GGLU8PDxu4Ju4fVU7Y1XMnDPWPnDTPn9RFYGP20C6_b8y7UzdkIrmCBofV1tdegCGt64PuT-EMjPe7K0O0Bh-9H5xeh1-Opn7Px8ehieBK1Ag2RIXdsVPIDgFRqKSBTCjCBpKQQg0CJKlcG43RigPMcaKELnaFA4SjHZCoyBRSRbdarZhW-YGEmKEMMn1C5aQQSasqMpkoMUWjQ5FLA3q6iVJiWvdyJaEwL_xU9yYsj_mnoZ-k0YG8623nD2fFXqx0X7IKQhqPLNa6vyNgizgjBxWnAdlc5ULSrui64U2hzFP0yYK-7yzTv7iMLVDhbOptEECoUkmzStdzpfHGM3utXqq9fPLO3U0JKY7pzmzKjs7-Jf8D2u8T7Y2g3Zi__zWyH3XcpFvn-nF3Ws4slviKIZfWe35rYa5fTb71uJ0s | 
    
| linkProvider | Unpaywall | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwED6x7mHwwO-JsIGC2AsPaRfHdhLeRtdpmsSExiqNp8h2rhpQ0qpxNcZfz9lJs3XwgHiLkot09p2T75zL9wHsGZRocoERQaIs4irWkcrdXyD5hGVGSBP7Pd2Pp_J4zE8uxEXbm-P-hSEn6v6ibiiCkcp3x9FU2UE7j_Ny4sp1NqjNoGS1cYRy7NsGbEpBULwHm-PTTwdfnKDcPo8jKTwtY3uc5Ct60iS_dfPaC6k3o4W1BjbvtkpukHMPYGtZzdX1lZpOb72Njh41kqt-HL4J5Xt_aXXf_LpD8fjfA30MD1ucGh40dk_gHlZPYWu4kod7BmcNM-f1obJq9NMuvG7P-1A3ZyN1pRYYWl9Xew2KsKYHvj-JPzTS464M3R5w-GF0dh56PZ76OYyPRufD46gVaIgMuWOjku0rlQotuMqkVJiopKQQK44CZS4NxunEKMZyRQud6ww5ckc5JlKeSUUR2YZeNavwBYQZpwwxbELlpuFIqCkzmioxRK6VJpcCeLeKUmFa9nInojEt_Ff0JC8O2eehn6WTAN52tvOGs-OvVjsu2AUhDUeXa1xfkbFFnBGCi9MAdlc5ULSrui6YU2hzFP0igDfdZZp395FFVThbOpuEEyrkgmzStdzpfHGM3utXqq-XntnbKSGlMd25TZnR2d_EP4C9LvH-GNqN2ct_M9uB-y7FIt-fsws9u1jiK4JYVr9uF9Jv12gmVg | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BatteryDataExtractor%3A+battery-aware+text-mining+software+embedded+with+BERT+models&rft.jtitle=Chemical+science+%28Cambridge%29&rft.au=Huang%2C+Shu&rft.au=Cole%2C+Jacqueline+M&rft.date=2022-10-12&rft.issn=2041-6520&rft.eissn=2041-6539&rft.volume=13&rft.issue=39&rft.spage=11487&rft.epage=11495&rft_id=info:doi/10.1039%2Fd2sc04322j&rft.externalDocID=d2sc04322j | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2041-6520&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2041-6520&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2041-6520&client=summon |