Natural language processing algorithms for domain-specific data extraction in material science: Reseractor
With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different databases with a high degree of generalizability, it often leads to a loss of specificity. Scientific pursuits need a tool to extract data from...
        Saved in:
      
    
          | Published in | Journal of materials science Vol. 59; no. 30; pp. 13856 - 13872 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        New York
          Springer US
    
        01.08.2024
     Springer Springer Nature B.V  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0022-2461 1573-4803  | 
| DOI | 10.1007/s10853-024-09980-z | 
Cover
| Abstract | With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different databases with a high degree of generalizability, it often leads to a loss of specificity. Scientific pursuits need a tool to extract data from selected resources for performing domain-specific tasks. Current algorithms and generalized tools lack specificity and are challenged by errors in analysing data from a bundle of specific documents selected eclectically. Current work addresses the need for such a tool, which focuses on specificity based on users' input keywords and phrases to find relevant information from bundles of articles from the web. Reseractor is based on a customized algorithm, Whitespace, in synergy with output from open-access tools for document image analysis and focused domain data extraction using NLP. The current tool is designed for the material science domain with the features of adopting various generalized and scientific corpora as layers. It is tested on two sets of different bundles of papers and gives an accuracy of 81.12% along with a recall of 78.38% and a precision of 84.06%. Owing to the simple and direct applicability of algorithms, users from other domains can directly use their corpora in algorithms and remodel the tool for their purpose. Current work fulfills the need for domain-specific experimental data extraction stored in organized and structured databases for upcoming computational researchers. | 
    
|---|---|
| AbstractList | With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different databases with a high degree of generalizability, it often leads to a loss of specificity. Scientific pursuits need a tool to extract data from selected resources for performing domain-specific tasks. Current algorithms and generalized tools lack specificity and are challenged by errors in analysing data from a bundle of specific documents selected eclectically. Current work addresses the need for such a tool, which focuses on specificity based on users' input keywords and phrases to find relevant information from bundles of articles from the web. Reseractor is based on a customized algorithm, Whitespace, in synergy with output from open-access tools for document image analysis and focused domain data extraction using NLP. The current tool is designed for the material science domain with the features of adopting various generalized and scientific corpora as layers. It is tested on two sets of different bundles of papers and gives an accuracy of 81.12% along with a recall of 78.38% and a precision of 84.06%. Owing to the simple and direct applicability of algorithms, users from other domains can directly use their corpora in algorithms and remodel the tool for their purpose. Current work fulfills the need for domain-specific experimental data extraction stored in organized and structured databases for upcoming computational researchers. | 
    
| Audience | Academic | 
    
| Author | Mittal, Divyansh Jha, Shikhar Krishn Goel, Ojsi Gupta, Antrakrate  | 
    
| Author_xml | – sequence: 1 givenname: Antrakrate orcidid: 0009-0002-9371-7408 surname: Gupta fullname: Gupta, Antrakrate organization: Department of Materials Science and Engineering, Indian Institute of Technology – sequence: 2 givenname: Divyansh surname: Mittal fullname: Mittal, Divyansh organization: Department of Materials Science and Engineering, Indian Institute of Technology – sequence: 3 givenname: Ojsi surname: Goel fullname: Goel, Ojsi organization: Department of Materials Science and Engineering, Indian Institute of Technology – sequence: 4 givenname: Shikhar Krishn orcidid: 0000-0003-1197-8795 surname: Jha fullname: Jha, Shikhar Krishn email: skjha@iitk.ac.in organization: Department of Materials Science and Engineering, Indian Institute of Technology  | 
    
| BookMark | eNp9kU1rFTEYhYNU8Lb6B1wF3Ohi2nxOZtyV4kehKNTuQ27mzZjLTHJNMqD99WYcobSLEkggPOdNzjmn6CTEAAi9peScEqIuMiWd5A1hoiF935Hm_gXaUal4IzrCT9COEMYaJlr6Cp3mfCCESMXoDh2-mbIkM-HJhHExI-BjihZy9mHEZhpj8uXnnLGLCQ9xNj40-QjWO2_xYIrB8LskY4uPAfuAZ1Mg-TouWw_Bwkd8CxlWIKbX6KUzU4Y3_88zdPf5093V1-bm-5frq8ubxgomS-PsnvfKMU73bqBSKsFaw4VwfJCsJyB6u5dOKip71bZOCUKVNa3pur3j1PAz9H4bW438WiAXPftsYaoGIS5Zcyq5qhvhFX33BD3EJYX6Oc1Jp4TgjPaVOt-o0UygfXBxdVzXALO3tQfn6_1lzbmVqm_XsR8eCSpTakyjWXLW1z9uH7PdxtoUc07gtPXFrHHWR_ykKdFrwXorWNeC9b-C9X2VsifSY_KzSX-eF_FNlCscRkgPlp9R_QWCVLq8 | 
    
| CitedBy_id | crossref_primary_10_1007_s10853_025_10772_2 | 
    
| Cites_doi | 10.1038/s41586-019-1335-8 10.1038/s41524-022-00784-w 10.18653/v1/D19-1371 10.1021/acs.jpcc.3c03106 10.1186/1751-0473-7-7 10.1007/s10660-022-09560-w 10.1186/1471-2105-4-20 10.1007/978-3-319-78503-5_6 10.1063/5.0021106 10.1088/1757-899X/768/7/072094 10.1109/ICDAR.2007.4376991 10.1016/j.mtla.2023.101803 10.1007/978-3-030-86549-8_9  | 
    
| ContentType | Journal Article | 
    
| Copyright | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. COPYRIGHT 2024 Springer  | 
    
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: COPYRIGHT 2024 Springer  | 
    
| DBID | AAYXX CITATION ISR 8FE 8FG ABJCF AFKRA BENPR BGLVJ CCPQU D1I DWQXO HCIFZ KB. L6V M7S PDBOC PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS 7S9 L.6  | 
    
| DOI | 10.1007/s10853-024-09980-z | 
    
| DatabaseName | CrossRef Gale In Context: Science ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central UK/Ireland ProQuest Central Technology Collection ProQuest One Community College ProQuest Materials Science Collection ProQuest Central SciTech Premium Collection Materials Science Database (Proquest) ProQuest Engineering Collection Engineering Database Materials Science Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection AGRICOLA AGRICOLA - Academic  | 
    
| DatabaseTitle | CrossRef ProQuest Materials Science Collection Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition Materials Science Collection SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection Materials Science Database ProQuest One Academic ProQuest Central (New) ProQuest One Academic (New) Engineering Collection AGRICOLA AGRICOLA - Academic  | 
    
| DatabaseTitleList | ProQuest Materials Science Collection AGRICOLA  | 
    
| Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering | 
    
| EISSN | 1573-4803 | 
    
| EndPage | 13872 | 
    
| ExternalDocumentID | A803657963 10_1007_s10853_024_09980_z  | 
    
| GrantInformation_xml | – fundername: Ministry of Education, India grantid: PMRF Fellowship funderid: http://dx.doi.org/10.13039/501100004541  | 
    
| GroupedDBID | -4Y -58 -5G -BR -EM -XW -Y2 -~C -~X .4S .86 .DC .VR 06C 06D 0R~ 0VY 199 1N0 1SB 2.D 203 29K 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 53G 5GY 5QI 5VS 67Z 6NX 6TJ 78A 8FE 8FG 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHBH AAHNG AAIAL AAIKT AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDEX ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJCF ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTAH ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFO ACGFS ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACREN ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADYOE ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEGXH AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFYQB AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AI. AIAGR AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMTXH AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BSONS CAG CCPQU COF CS3 CSCUP D-I D1I DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC G-Y G-Z GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IAO IFM IGS IHE IJ- IKXTQ ISR ITC ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KB. KDC KOV KOW L6V LAK LLZTM M4Y M7S MA- MK~ N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P0- P19 P2P P9N PDBOC PF- PKN PT4 PT5 PTHSS QF4 QM1 QN7 QO4 QOK QOR QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCG SCLPG SCM SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 T9H TAE TEORI TN5 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 W4F WH7 WJK WK8 YLTOR Z45 Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z83 Z85 Z86 Z87 Z88 Z8M Z8N Z8O Z8P Z8Q Z8R Z8S Z8T Z8W Z8Z Z91 Z92 ZE2 ZMTXR ZY4 ~02 ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT PQGLB PUEGO DWQXO PKEHL PQEST PQQKQ PQUKI PRINS 7S9 L.6  | 
    
| ID | FETCH-LOGICAL-c425t-fcb397f231bfd1557426a344f3d5290e49cb5f57159766f74017ca6a88bf31a3 | 
    
| IEDL.DBID | U2A | 
    
| ISSN | 0022-2461 | 
    
| IngestDate | Sat Sep 27 21:41:20 EDT 2025 Sat Aug 23 14:56:59 EDT 2025 Mon Oct 20 16:52:38 EDT 2025 Thu Oct 16 15:33:49 EDT 2025 Wed Oct 01 02:24:26 EDT 2025 Thu Apr 24 22:55:54 EDT 2025 Fri Feb 21 02:38:20 EST 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 30 | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c425t-fcb397f231bfd1557426a344f3d5290e49cb5f57159766f74017ca6a88bf31a3 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
    
| ORCID | 0000-0003-1197-8795 0009-0002-9371-7408  | 
    
| PQID | 3087443219 | 
    
| PQPubID | 2043599 | 
    
| PageCount | 17 | 
    
| ParticipantIDs | proquest_miscellaneous_3153731503 proquest_journals_3087443219 gale_infotracacademiconefile_A803657963 gale_incontextgauss_ISR_A803657963 crossref_citationtrail_10_1007_s10853_024_09980_z crossref_primary_10_1007_s10853_024_09980_z springer_journals_10_1007_s10853_024_09980_z  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20240800 2024-08-00 20240801  | 
    
| PublicationDateYYYYMMDD | 2024-08-01 | 
    
| PublicationDate_xml | – month: 8 year: 2024 text: 20240800  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | New York | 
    
| PublicationPlace_xml | – name: New York | 
    
| PublicationTitle | Journal of materials science | 
    
| PublicationTitleAbbrev | J Mater Sci | 
    
| PublicationYear | 2024 | 
    
| Publisher | Springer US Springer Springer Nature B.V  | 
    
| Publisher_xml | – name: Springer US – name: Springer – name: Springer Nature B.V  | 
    
| References | Raabe D Glossary of materials science RamakrishnanCPatniaAHovyEBurnsGALayout-aware text extraction from full-text PDF of scientific articlesSource Code Biol Med2012711010.1186/1751-0473-7-7 Research Gate. https://www.researchgate.net TshitoyanVDagdelenJWestonLUnsupervised word embeddings fcapture latent knowledge from materials science literatureNature201957195981:CAS:528:DC%2BC1MXhtlamurrK10.1038/s41586-019-1335-831270483 OlivettiEAColeJMKimEData-driven materials research enabled by natural language processing and information extractionAppl Phys Rev202071:CAS:528:DC%2BB3cXis1Olu77L10.1063/5.0021106 ChaurasiaNJhaSKSangalSA novel training methodology for phase segmentation of steel microstructures using a deep learning algorithmMaterialia2023301:CAS:528:DC%2BB3sXht1OnsrnN10.1016/j.mtla.2023.101803 OpenAI. (n.d.). ChatGPT — a model interacting in a conversational way, trained on more human feedback. Retrieved from https://openai.com/blog/chatgpt Grammarly. (n.d.). Writing suggestions across all your favorite websites. https://www.grammarly.com SmithRAn Overview of the Tesseract OCR EngineNinth International Conference on Document Analysis and Recognition (ICDAR 2007)2007262963310.1109/ICDAR.2007.4376991 Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Reseractor tool. https://github.com/ShikharJha/Reseractor Consensus. https://consensus.app Clarivate Analytics. (n.d.). Web of Science. https://clarivate.com/products/web-of-science GaoXTanRLiGResearch on text mining of material science based on natural language processingIOP Conf Ser Mater Sci Eng202076810.1088/1757-899X/768/7/072094 PDF.ai — a model interacting in a conversational way, trained on more human feedback for the user uploaded pdf. Retrieved from https://pdf.ai Choudhary K, Kelley ML, (2023) ChemNLP: a natural language processing based library for materials chemistry text data. arXiv:2209.08203 Shen Z, Zhang R, Dell M, et al (2021) LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv:2103.15348 Loshchilov I, Hutter F (2019) Decoupled weight decay regularization Crossref. https://www.crossref.org Google Vision API. https://cloud.google.com/vision/docs/apis BilalMAlmazroiAAEffectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviewsElectron Commer Res2023232737275710.1007/s10660-022-09560-w GuptaTZakiMKrishnanNMAMausamMatSciBERT: a materials domain language model for text mining and information extractionNpj Comput Mater2022810210.1038/s41524-022-00784-w Google LLC. (n.d.). Google Scholar. Retrieved from https://scholar.google.com QuillBot. (n.d.). Free paraphrasing tool - Best Article Rewriter. https://quillbot.com ShahPKPerez-IratxetaCBorkPAndradeMAInformation extraction from full text scientific articles: where are the keywords?BMC Bioinformatics200341910.1186/1471-2105-4-20 DalianisHEvaluation metrics and evaluationClinical Text Mining2018ChamSpringer International Publishing455310.1007/978-3-319-78503-5_6 National Center for Biotechnology Information. (Year, if available). PubMed. Retrieved from https://pubmed.ncbi.nlm.nih.gov Semantic Scholar. https://www.semanticscholar.org KayATesseract: an open-source optical character recognition engineLinux J200720071592 Elicit. https://elicit.com Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. arXiv:1903.10676 X Gao (9980_CR16) 2020; 768 9980_CR8 9980_CR9 9980_CR6 9980_CR25 9980_CR7 9980_CR23 9980_CR21 9980_CR20 A Kay (9980_CR17) 2007; 2007 T Gupta (9980_CR22) 2022; 8 PK Shah (9980_CR27) 2003; 4 V Tshitoyan (9980_CR26) 2019; 571 9980_CR19 9980_CR18 9980_CR15 9980_CR14 R Smith (9980_CR13) 2007; 2 9980_CR11 9980_CR10 9980_CR31 N Chaurasia (9980_CR30) 2023; 30 C Ramakrishnan (9980_CR29) 2012; 7 9980_CR4 9980_CR5 9980_CR2 M Bilal (9980_CR24) 2023; 23 9980_CR3 EA Olivetti (9980_CR12) 2020; 7 9980_CR1 H Dalianis (9980_CR28) 2018  | 
    
| References_xml | – reference: DalianisHEvaluation metrics and evaluationClinical Text Mining2018ChamSpringer International Publishing455310.1007/978-3-319-78503-5_6 – reference: Shen Z, Zhang R, Dell M, et al (2021) LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv:2103.15348 – reference: Semantic Scholar. https://www.semanticscholar.org/ – reference: National Center for Biotechnology Information. (Year, if available). PubMed. Retrieved from https://pubmed.ncbi.nlm.nih.gov/ – reference: PDF.ai — a model interacting in a conversational way, trained on more human feedback for the user uploaded pdf. Retrieved from https://pdf.ai/ – reference: RamakrishnanCPatniaAHovyEBurnsGALayout-aware text extraction from full-text PDF of scientific articlesSource Code Biol Med2012711010.1186/1751-0473-7-7 – reference: GaoXTanRLiGResearch on text mining of material science based on natural language processingIOP Conf Ser Mater Sci Eng202076810.1088/1757-899X/768/7/072094 – reference: Grammarly. (n.d.). Writing suggestions across all your favorite websites. https://www.grammarly.com/ – reference: SmithRAn Overview of the Tesseract OCR EngineNinth International Conference on Document Analysis and Recognition (ICDAR 2007)2007262963310.1109/ICDAR.2007.4376991 – reference: Choudhary K, Kelley ML, (2023) ChemNLP: a natural language processing based library for materials chemistry text data. arXiv:2209.08203 – reference: Google LLC. (n.d.). Google Scholar. Retrieved from https://scholar.google.com/ – reference: KayATesseract: an open-source optical character recognition engineLinux J200720071592 – reference: Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 – reference: ShahPKPerez-IratxetaCBorkPAndradeMAInformation extraction from full text scientific articles: where are the keywords?BMC Bioinformatics200341910.1186/1471-2105-4-20 – reference: Research Gate. https://www.researchgate.net/ – reference: GuptaTZakiMKrishnanNMAMausamMatSciBERT: a materials domain language model for text mining and information extractionNpj Comput Mater2022810210.1038/s41524-022-00784-w – reference: Loshchilov I, Hutter F (2019) Decoupled weight decay regularization – reference: Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. arXiv:1903.10676 – reference: Consensus. https://consensus.app/ – reference: OpenAI. (n.d.). ChatGPT — a model interacting in a conversational way, trained on more human feedback. Retrieved from https://openai.com/blog/chatgpt – reference: Clarivate Analytics. (n.d.). Web of Science. https://clarivate.com/products/web-of-science/ – reference: BilalMAlmazroiAAEffectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviewsElectron Commer Res2023232737275710.1007/s10660-022-09560-w – reference: Google Vision API. https://cloud.google.com/vision/docs/apis – reference: Raabe D Glossary of materials science – reference: Reseractor tool. https://github.com/ShikharJha/Reseractor – reference: Elicit. https://elicit.com/ – reference: ChaurasiaNJhaSKSangalSA novel training methodology for phase segmentation of steel microstructures using a deep learning algorithmMaterialia2023301:CAS:528:DC%2BB3sXht1OnsrnN10.1016/j.mtla.2023.101803 – reference: TshitoyanVDagdelenJWestonLUnsupervised word embeddings fcapture latent knowledge from materials science literatureNature201957195981:CAS:528:DC%2BC1MXhtlamurrK10.1038/s41586-019-1335-831270483 – reference: Crossref. https://www.crossref.org/ – reference: QuillBot. (n.d.). Free paraphrasing tool - Best Article Rewriter. https://quillbot.com/ – reference: OlivettiEAColeJMKimEData-driven materials research enabled by natural language processing and information extractionAppl Phys Rev202071:CAS:528:DC%2BB3cXis1Olu77L10.1063/5.0021106 – volume: 571 start-page: 95 year: 2019 ident: 9980_CR26 publication-title: Nature doi: 10.1038/s41586-019-1335-8 – ident: 9980_CR8 – volume: 8 start-page: 102 year: 2022 ident: 9980_CR22 publication-title: Npj Comput Mater doi: 10.1038/s41524-022-00784-w – volume: 2007 start-page: 2 issue: 159 year: 2007 ident: 9980_CR17 publication-title: Linux J – ident: 9980_CR4 – ident: 9980_CR6 – ident: 9980_CR2 – ident: 9980_CR14 – ident: 9980_CR20 doi: 10.18653/v1/D19-1371 – ident: 9980_CR18 – ident: 9980_CR1 doi: 10.1021/acs.jpcc.3c03106 – volume: 7 start-page: 1 year: 2012 ident: 9980_CR29 publication-title: Source Code Biol Med doi: 10.1186/1751-0473-7-7 – volume: 23 start-page: 2737 year: 2023 ident: 9980_CR24 publication-title: Electron Commer Res doi: 10.1007/s10660-022-09560-w – ident: 9980_CR21 – volume: 4 start-page: 1 year: 2003 ident: 9980_CR27 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-4-20 – start-page: 45 volume-title: Clinical Text Mining year: 2018 ident: 9980_CR28 doi: 10.1007/978-3-319-78503-5_6 – volume: 7 year: 2020 ident: 9980_CR12 publication-title: Appl Phys Rev doi: 10.1063/5.0021106 – ident: 9980_CR25 – volume: 768 year: 2020 ident: 9980_CR16 publication-title: IOP Conf Ser Mater Sci Eng doi: 10.1088/1757-899X/768/7/072094 – ident: 9980_CR23 – volume: 2 start-page: 629 year: 2007 ident: 9980_CR13 publication-title: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) doi: 10.1109/ICDAR.2007.4376991 – ident: 9980_CR9 – volume: 30 year: 2023 ident: 9980_CR30 publication-title: Materialia doi: 10.1016/j.mtla.2023.101803 – ident: 9980_CR31 – ident: 9980_CR10 – ident: 9980_CR7 – ident: 9980_CR5 – ident: 9980_CR3 – ident: 9980_CR11 – ident: 9980_CR19 – ident: 9980_CR15 doi: 10.1007/978-3-030-86549-8_9  | 
    
| SSID | ssj0005721 | 
    
| Score | 2.460126 | 
    
| Snippet | With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different... | 
    
| SourceID | proquest gale crossref springer  | 
    
| SourceType | Aggregation Database Enrichment Source Index Database Publisher  | 
    
| StartPage | 13856 | 
    
| SubjectTerms | Algorithms Characterization and Evaluation of Materials Chemistry and Materials Science Classical Mechanics Computation & Theory Computational linguistics Crystallography and Scattering Methods Data analysis Documents domain Equipment and supplies Image analysis Image processing Language processing Materials Science Natural language interfaces Natural language processing Polymer Sciences Software Solid Mechanics Webs  | 
    
| SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NT9wwEB3R5dIeKmhBXUqRWyH1QK1mEyd2KlVoW4EoUlcVBYmbZTvxAoJk2Y_L_vrOZB0WiuCSSyaJPB7PTOx5bwB2cyVMIVzOXS4dFzJ2XAknuRCFNTH-ypmSgMK_B9nRmTg-T89XYNBiYaissvWJjaMuakd75F-JuU6IBBfY_uiWU9coOl1tW2iY0Fqh-N5QjL2A1ZiYsTqw-uNg8OdkWfQh417LH05MagFGE8B0GLo4xiyOWZOK-PxBqPrfYT86OW0C0uEavA6ZJOsvpn4dVsrqDby6xy_4Fq4GpqHVYO2uJBstcAF4l5nrIY5venEzYZi4sqK-MZcVJ-QlVQ8xKh1l6LnHC-QDu6wYJreNvbIQNr8xqtsbNx17NuD08OD05xEPzRW4w2U65d5ZTEU8pnfWF6g0_EXOTCKET4o0zqNS5M6mPpWY7sgs89S4TzqTGaWsT3om2YROVVflO2BWRS5V6MBRVlgZKVPkXljlpc2kjOIu9Fo1aheIx6n_xbVeUiaT6jWqXjeq1_Mu7N09M1rQbjwr_YlmRxOfRUUFM0Mzm0z0r78nuq8wRBPeNunC5yDka9KdCfgDHARRYD2Q3G5nWYcVPdFL--vCx7vbuBbpgMVUZT1DGQwfEi8RvuJLax3LVzw9gK3nv_geXsaNXVLd4TZ0puNZ-QFzoandCQb-D3eLBf8 priority: 102 providerName: ProQuest  | 
    
| Title | Natural language processing algorithms for domain-specific data extraction in material science: Reseractor | 
    
| URI | https://link.springer.com/article/10.1007/s10853-024-09980-z https://www.proquest.com/docview/3087443219 https://www.proquest.com/docview/3153731503  | 
    
| Volume | 59 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Academic Search Ultimate (EBSCO) customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1573-4803 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: ABDBF dateStart: 20030401 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1573-4803 dateEnd: 20241102 omitProxy: false ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: ADMLS dateStart: 19970201 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1573-4803 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: AFBBN dateStart: 19660201 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1573-4803 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: BENPR dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 1573-4803 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: 8FG dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1573-4803 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1573-4803 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005721 issn: 0022-2461 databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1RT9swED4NeNkeEGOb1gGVmSbtYbOUJk7s8NZNLWzTqomBxJ4s24mBCRLUtC_8eu5Sh8IGk_aSPPhiyefz3ef47jPAu1wJUwiXc5dLx4WMHVfCSS5EYU2MWzlTUqHw90l2cCy-nqQnoSis6bLduyPJ1lPfKXbD0MIxpnBENSri1yuwlhKdF1rxcTxcJnbIeNBxhBNbWiiVebiPe-HoT6f81-loG3TGG7Ae0CIbLqb3OTwpq014dodD8AX8npiWOoN1fx7Z1SL3H1uZuTitcfd_dtkwBKesqC_NecWpupIyhBilhzL0ztNFdQM7rxgC2NYmWQiNe4xy86btrTwv4Wg8Ovp8wMMFCtzhUpxx7yzCDY8QzvoCgQNugzOTCOGTIo3zqBS5s6lPJUIamWWeLueTzmRGKeuTgUlewWpVV-VrYFZFLlXopFFWWBkpU-ReWOWlzaSM4h4MOjVqF8jF6Y6LC72kRSbVa1S9blWvr3vw4fabqwW1xj-l39LsaOKsqCgp5tTMm0Z_-XmohwrDMNXUJj14H4R8TbozocYAB0E0V_ckt7tZ1mHVNprYEYVI0In3YPe2GdcbHaKYqqznKIMhQuIjwi4-dtax7OLxAbz5P_EteBq3dkq5htuwOpvOyx3EPzPbhxU13u_D2nD_17cRvj-NJj8O--0iuAGENQEV | 
    
| linkProvider | Springer Nature | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB2V9gAcEJ9iSwGDQBzAIps4sYNUoQKtdmm7QmUr9WbZTlxatcmyH0L0v_HfmMk6XQqit15yieMo47HfOJ73BuBFroQphMu5y6XjQsaOK-EkF6KwJsatnCmJKLw7yHr74vNBerAEv1ouDKVVtmtis1AXtaN_5G9JuU6IBCfY-9F3TlWj6HS1LaFhQmmFYr2RGAvEju3y5w_cwk3W-59wvF_G8dbm8GOPhyoD3KG_Trl3FjHZY5xjfYFP414xM4kQPinSOI9KkTub-lQi7sss81TBTjqTGaWsT7omwW6vwYpIRI57v5UPm4Mve4scExl3W7lyEm4LrJ3A3UOk5AiRHIM0FfGzC8j4Nz78c1Db4N_WbbgVAle2Mfe0O7BUVnfh5h9yhvfgeGAaFQ_W_gRlozkNAe8yc3KI5px-O50wjJNZUZ-ao4oT0ZOSlRhlqjIEivGcaMGOKoaxdDM9WEDpd4zSBMdNgaD7MLwKKz-A5aquyofArIpcqhAvsK2wMlKmyL2wykubSRnFHei2ZtQu6JxTuY0TvVBoJtNrNL1uTK_POvD6_JnRXOXj0tbPaXQ0yWdUlJ9zaGaTie5_3dMbCiMCovcmHXgVGvmabGcC3QE_ghS3LrRca0dZhwVkohfu3oFn57dx6tN5jqnKeoZtEK0kXiLs4k3rHYsu_v8Bq5e_8Slc7w13d_ROf7D9CG7EjY9SyuMaLE_Hs_IxhmFT-yQ4OwN9xdPrN7PDQUs | 
    
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB5RkKr2gOhLLK-6VaUeWots4sQOtxXtCgpdVS1I3CzbiekiSFa72Qu_npk82KUvqZdcPLHk8djzJZ7vM8C7VAmTCZdyl0rHhQwdV8JJLkRmTYifciYnovDXUXJ0Lr5cxBdLLP662r07kmw4DaTSVFT7k8zvLxHfMM1wzC8cEY4K-O0jWBMklIARfR4OFkUeMux3euGknNbSZv7cx4PU9OsG_dtJaZ2Ahhuw3iJHNmim-hms5MVzeLqkJ_gCrkamltFg3V9INml4ANjKzPVlOR1XP29mDIEqy8obMy44MS2pWohRqSjDnXraMB3YuGAIZuv4ZG2aPGBUpzetb-h5CWfDz2eHR7y9TIE7XJYV984i9PAI56zPEETgJ3FiIiF8lMVhGuQidTb2sUR4I5PE00V90pnEKGV91DfRK1gtyiLfBGZV4GKFGzbaCisDZbLUC6u8tImUQdiDfudG7Vqhcbrv4lovJJLJ9Rpdr2vX69sefLh_Z9LIbPzT-i3Njib9ioIKZC7NfDbTxz--64HClEz82qgH71sjX5LvTMs3wEGQ5NUDy51ulnW7gmealBKFiHBD78Gb-2Zce3SgYoq8nKMNpguJjwC7-NhFx6KLvw9g6__MX8Pjb5-G-vR4dLINT8I6ZKkEcQdWq-k830VYVNm9OvLvADLJBCk | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Natural+language+processing+algorithms+for+domain-specific+data+extraction+in+material+science%3A+Reseractor&rft.jtitle=Journal+of+materials+science&rft.au=Gupta%2C+Antrakrate&rft.au=Mittal%2C+Divyansh&rft.au=Goel%2C+Ojsi&rft.au=Jha%2C+Shikhar+Krishn&rft.date=2024-08-01&rft.issn=0022-2461&rft.volume=59&rft.issue=30+p.13856-13872&rft.spage=13856&rft.epage=13872&rft_id=info:doi/10.1007%2Fs10853-024-09980-z&rft.externalDBID=NO_FULL_TEXT | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0022-2461&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0022-2461&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0022-2461&client=summon |