Natural language processing algorithms for domain-specific data extraction in material science: Reseractor

With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different databases with a high degree of generalizability, it often leads to a loss of specificity. Scientific pursuits need a tool to extract data from...

Full description

Saved in:
Bibliographic Details
Published inJournal of materials science Vol. 59; no. 30; pp. 13856 - 13872
Main Authors Gupta, Antrakrate, Mittal, Divyansh, Goel, Ojsi, Jha, Shikhar Krishn
Format Journal Article
LanguageEnglish
Published New York Springer US 01.08.2024
Springer
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0022-2461
1573-4803
DOI10.1007/s10853-024-09980-z

Cover

Abstract With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different databases with a high degree of generalizability, it often leads to a loss of specificity. Scientific pursuits need a tool to extract data from selected resources for performing domain-specific tasks. Current algorithms and generalized tools lack specificity and are challenged by errors in analysing data from a bundle of specific documents selected eclectically. Current work addresses the need for such a tool, which focuses on specificity based on users' input keywords and phrases to find relevant information from bundles of articles from the web. Reseractor is based on a customized algorithm, Whitespace, in synergy with output from open-access tools for document image analysis and focused domain data extraction using NLP. The current tool is designed for the material science domain with the features of adopting various generalized and scientific corpora as layers. It is tested on two sets of different bundles of papers and gives an accuracy of 81.12% along with a recall of 78.38% and a precision of 84.06%. Owing to the simple and direct applicability of algorithms, users from other domains can directly use their corpora in algorithms and remodel the tool for their purpose. Current work fulfills the need for domain-specific experimental data extraction stored in organized and structured databases for upcoming computational researchers.
AbstractList With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different databases with a high degree of generalizability, it often leads to a loss of specificity. Scientific pursuits need a tool to extract data from selected resources for performing domain-specific tasks. Current algorithms and generalized tools lack specificity and are challenged by errors in analysing data from a bundle of specific documents selected eclectically. Current work addresses the need for such a tool, which focuses on specificity based on users' input keywords and phrases to find relevant information from bundles of articles from the web. Reseractor is based on a customized algorithm, Whitespace, in synergy with output from open-access tools for document image analysis and focused domain data extraction using NLP. The current tool is designed for the material science domain with the features of adopting various generalized and scientific corpora as layers. It is tested on two sets of different bundles of papers and gives an accuracy of 81.12% along with a recall of 78.38% and a precision of 84.06%. Owing to the simple and direct applicability of algorithms, users from other domains can directly use their corpora in algorithms and remodel the tool for their purpose. Current work fulfills the need for domain-specific experimental data extraction stored in organized and structured databases for upcoming computational researchers.
Audience Academic
Author Mittal, Divyansh
Jha, Shikhar Krishn
Goel, Ojsi
Gupta, Antrakrate
Author_xml – sequence: 1
  givenname: Antrakrate
  orcidid: 0009-0002-9371-7408
  surname: Gupta
  fullname: Gupta, Antrakrate
  organization: Department of Materials Science and Engineering, Indian Institute of Technology
– sequence: 2
  givenname: Divyansh
  surname: Mittal
  fullname: Mittal, Divyansh
  organization: Department of Materials Science and Engineering, Indian Institute of Technology
– sequence: 3
  givenname: Ojsi
  surname: Goel
  fullname: Goel, Ojsi
  organization: Department of Materials Science and Engineering, Indian Institute of Technology
– sequence: 4
  givenname: Shikhar Krishn
  orcidid: 0000-0003-1197-8795
  surname: Jha
  fullname: Jha, Shikhar Krishn
  email: skjha@iitk.ac.in
  organization: Department of Materials Science and Engineering, Indian Institute of Technology
BookMark eNp9kU1rFTEYhYNU8Lb6B1wF3Ohi2nxOZtyV4kehKNTuQ27mzZjLTHJNMqD99WYcobSLEkggPOdNzjmn6CTEAAi9peScEqIuMiWd5A1hoiF935Hm_gXaUal4IzrCT9COEMYaJlr6Cp3mfCCESMXoDh2-mbIkM-HJhHExI-BjihZy9mHEZhpj8uXnnLGLCQ9xNj40-QjWO2_xYIrB8LskY4uPAfuAZ1Mg-TouWw_Bwkd8CxlWIKbX6KUzU4Y3_88zdPf5093V1-bm-5frq8ubxgomS-PsnvfKMU73bqBSKsFaw4VwfJCsJyB6u5dOKip71bZOCUKVNa3pur3j1PAz9H4bW438WiAXPftsYaoGIS5Zcyq5qhvhFX33BD3EJYX6Oc1Jp4TgjPaVOt-o0UygfXBxdVzXALO3tQfn6_1lzbmVqm_XsR8eCSpTakyjWXLW1z9uH7PdxtoUc07gtPXFrHHWR_ykKdFrwXorWNeC9b-C9X2VsifSY_KzSX-eF_FNlCscRkgPlp9R_QWCVLq8
CitedBy_id crossref_primary_10_1007_s10853_025_10772_2
Cites_doi 10.1038/s41586-019-1335-8
10.1038/s41524-022-00784-w
10.18653/v1/D19-1371
10.1021/acs.jpcc.3c03106
10.1186/1751-0473-7-7
10.1007/s10660-022-09560-w
10.1186/1471-2105-4-20
10.1007/978-3-319-78503-5_6
10.1063/5.0021106
10.1088/1757-899X/768/7/072094
10.1109/ICDAR.2007.4376991
10.1016/j.mtla.2023.101803
10.1007/978-3-030-86549-8_9
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
COPYRIGHT 2024 Springer
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
– notice: COPYRIGHT 2024 Springer
DBID AAYXX
CITATION
ISR
8FE
8FG
ABJCF
AFKRA
BENPR
BGLVJ
CCPQU
D1I
DWQXO
HCIFZ
KB.
L6V
M7S
PDBOC
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
7S9
L.6
DOI 10.1007/s10853-024-09980-z
DatabaseName CrossRef
Gale In Context: Science
ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central UK/Ireland
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central
SciTech Premium Collection
Materials Science Database (Proquest)
ProQuest Engineering Collection
Engineering Database
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
AGRICOLA
AGRICOLA - Academic
DatabaseTitle CrossRef
ProQuest Materials Science Collection
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition
Materials Science Collection
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
Materials Science Database
ProQuest One Academic
ProQuest Central (New)
ProQuest One Academic (New)
Engineering Collection
AGRICOLA
AGRICOLA - Academic
DatabaseTitleList ProQuest Materials Science Collection
AGRICOLA


Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1573-4803
EndPage 13872
ExternalDocumentID A803657963
10_1007_s10853_024_09980_z
GrantInformation_xml – fundername: Ministry of Education, India
  grantid: PMRF Fellowship
  funderid: http://dx.doi.org/10.13039/501100004541
GroupedDBID -4Y
-58
-5G
-BR
-EM
-XW
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06C
06D
0R~
0VY
199
1N0
1SB
2.D
203
29K
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
53G
5GY
5QI
5VS
67Z
6NX
6TJ
78A
8FE
8FG
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHBH
AAHNG
AAIAL
AAIKT
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDEX
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJCF
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTAH
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFO
ACGFS
ACHSB
ACHXU
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACREN
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADYOE
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEGXH
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AI.
AIAGR
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BENPR
BGLVJ
BGNMA
BSONS
CAG
CCPQU
COF
CS3
CSCUP
D-I
D1I
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
G-Y
G-Z
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IAO
IFM
IGS
IHE
IJ-
IKXTQ
ISR
ITC
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KB.
KDC
KOV
KOW
L6V
LAK
LLZTM
M4Y
M7S
MA-
MK~
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P0-
P19
P2P
P9N
PDBOC
PF-
PKN
PT4
PT5
PTHSS
QF4
QM1
QN7
QO4
QOK
QOR
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCG
SCLPG
SCM
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
T9H
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
W4F
WH7
WJK
WK8
YLTOR
Z45
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z83
Z85
Z86
Z87
Z88
Z8M
Z8N
Z8O
Z8P
Z8Q
Z8R
Z8S
Z8T
Z8W
Z8Z
Z91
Z92
ZE2
ZMTXR
ZY4
~02
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
PQGLB
PUEGO
DWQXO
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
7S9
L.6
ID FETCH-LOGICAL-c425t-fcb397f231bfd1557426a344f3d5290e49cb5f57159766f74017ca6a88bf31a3
IEDL.DBID U2A
ISSN 0022-2461
IngestDate Sat Sep 27 21:41:20 EDT 2025
Sat Aug 23 14:56:59 EDT 2025
Mon Oct 20 16:52:38 EDT 2025
Thu Oct 16 15:33:49 EDT 2025
Wed Oct 01 02:24:26 EDT 2025
Thu Apr 24 22:55:54 EDT 2025
Fri Feb 21 02:38:20 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 30
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c425t-fcb397f231bfd1557426a344f3d5290e49cb5f57159766f74017ca6a88bf31a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-1197-8795
0009-0002-9371-7408
PQID 3087443219
PQPubID 2043599
PageCount 17
ParticipantIDs proquest_miscellaneous_3153731503
proquest_journals_3087443219
gale_infotracacademiconefile_A803657963
gale_incontextgauss_ISR_A803657963
crossref_citationtrail_10_1007_s10853_024_09980_z
crossref_primary_10_1007_s10853_024_09980_z
springer_journals_10_1007_s10853_024_09980_z
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20240800
2024-08-00
20240801
PublicationDateYYYYMMDD 2024-08-01
PublicationDate_xml – month: 8
  year: 2024
  text: 20240800
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Journal of materials science
PublicationTitleAbbrev J Mater Sci
PublicationYear 2024
Publisher Springer US
Springer
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer
– name: Springer Nature B.V
References Raabe D Glossary of materials science
RamakrishnanCPatniaAHovyEBurnsGALayout-aware text extraction from full-text PDF of scientific articlesSource Code Biol Med2012711010.1186/1751-0473-7-7
Research Gate. https://www.researchgate.net
TshitoyanVDagdelenJWestonLUnsupervised word embeddings fcapture latent knowledge from materials science literatureNature201957195981:CAS:528:DC%2BC1MXhtlamurrK10.1038/s41586-019-1335-831270483
OlivettiEAColeJMKimEData-driven materials research enabled by natural language processing and information extractionAppl Phys Rev202071:CAS:528:DC%2BB3cXis1Olu77L10.1063/5.0021106
ChaurasiaNJhaSKSangalSA novel training methodology for phase segmentation of steel microstructures using a deep learning algorithmMaterialia2023301:CAS:528:DC%2BB3sXht1OnsrnN10.1016/j.mtla.2023.101803
OpenAI. (n.d.). ChatGPT — a model interacting in a conversational way, trained on more human feedback. Retrieved from https://openai.com/blog/chatgpt
Grammarly. (n.d.). Writing suggestions across all your favorite websites. https://www.grammarly.com
SmithRAn Overview of the Tesseract OCR EngineNinth International Conference on Document Analysis and Recognition (ICDAR 2007)2007262963310.1109/ICDAR.2007.4376991
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Reseractor tool. https://github.com/ShikharJha/Reseractor
Consensus. https://consensus.app
Clarivate Analytics. (n.d.). Web of Science. https://clarivate.com/products/web-of-science
GaoXTanRLiGResearch on text mining of material science based on natural language processingIOP Conf Ser Mater Sci Eng202076810.1088/1757-899X/768/7/072094
PDF.ai — a model interacting in a conversational way, trained on more human feedback for the user uploaded pdf. Retrieved from https://pdf.ai
Choudhary K, Kelley ML, (2023) ChemNLP: a natural language processing based library for materials chemistry text data. arXiv:2209.08203
Shen Z, Zhang R, Dell M, et al (2021) LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv:2103.15348
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization
Crossref. https://www.crossref.org
Google Vision API. https://cloud.google.com/vision/docs/apis
BilalMAlmazroiAAEffectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviewsElectron Commer Res2023232737275710.1007/s10660-022-09560-w
GuptaTZakiMKrishnanNMAMausamMatSciBERT: a materials domain language model for text mining and information extractionNpj Comput Mater2022810210.1038/s41524-022-00784-w
Google LLC. (n.d.). Google Scholar. Retrieved from https://scholar.google.com
QuillBot. (n.d.). Free paraphrasing tool - Best Article Rewriter. https://quillbot.com
ShahPKPerez-IratxetaCBorkPAndradeMAInformation extraction from full text scientific articles: where are the keywords?BMC Bioinformatics200341910.1186/1471-2105-4-20
DalianisHEvaluation metrics and evaluationClinical Text Mining2018ChamSpringer International Publishing455310.1007/978-3-319-78503-5_6
National Center for Biotechnology Information. (Year, if available). PubMed. Retrieved from https://pubmed.ncbi.nlm.nih.gov
Semantic Scholar. https://www.semanticscholar.org
KayATesseract: an open-source optical character recognition engineLinux J200720071592
Elicit. https://elicit.com
Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. arXiv:1903.10676
X Gao (9980_CR16) 2020; 768
9980_CR8
9980_CR9
9980_CR6
9980_CR25
9980_CR7
9980_CR23
9980_CR21
9980_CR20
A Kay (9980_CR17) 2007; 2007
T Gupta (9980_CR22) 2022; 8
PK Shah (9980_CR27) 2003; 4
V Tshitoyan (9980_CR26) 2019; 571
9980_CR19
9980_CR18
9980_CR15
9980_CR14
R Smith (9980_CR13) 2007; 2
9980_CR11
9980_CR10
9980_CR31
N Chaurasia (9980_CR30) 2023; 30
C Ramakrishnan (9980_CR29) 2012; 7
9980_CR4
9980_CR5
9980_CR2
M Bilal (9980_CR24) 2023; 23
9980_CR3
EA Olivetti (9980_CR12) 2020; 7
9980_CR1
H Dalianis (9980_CR28) 2018
References_xml – reference: DalianisHEvaluation metrics and evaluationClinical Text Mining2018ChamSpringer International Publishing455310.1007/978-3-319-78503-5_6
– reference: Shen Z, Zhang R, Dell M, et al (2021) LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv:2103.15348
– reference: Semantic Scholar. https://www.semanticscholar.org/
– reference: National Center for Biotechnology Information. (Year, if available). PubMed. Retrieved from https://pubmed.ncbi.nlm.nih.gov/
– reference: PDF.ai — a model interacting in a conversational way, trained on more human feedback for the user uploaded pdf. Retrieved from https://pdf.ai/
– reference: RamakrishnanCPatniaAHovyEBurnsGALayout-aware text extraction from full-text PDF of scientific articlesSource Code Biol Med2012711010.1186/1751-0473-7-7
– reference: GaoXTanRLiGResearch on text mining of material science based on natural language processingIOP Conf Ser Mater Sci Eng202076810.1088/1757-899X/768/7/072094
– reference: Grammarly. (n.d.). Writing suggestions across all your favorite websites. https://www.grammarly.com/
– reference: SmithRAn Overview of the Tesseract OCR EngineNinth International Conference on Document Analysis and Recognition (ICDAR 2007)2007262963310.1109/ICDAR.2007.4376991
– reference: Choudhary K, Kelley ML, (2023) ChemNLP: a natural language processing based library for materials chemistry text data. arXiv:2209.08203
– reference: Google LLC. (n.d.). Google Scholar. Retrieved from https://scholar.google.com/
– reference: KayATesseract: an open-source optical character recognition engineLinux J200720071592
– reference: Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
– reference: ShahPKPerez-IratxetaCBorkPAndradeMAInformation extraction from full text scientific articles: where are the keywords?BMC Bioinformatics200341910.1186/1471-2105-4-20
– reference: Research Gate. https://www.researchgate.net/
– reference: GuptaTZakiMKrishnanNMAMausamMatSciBERT: a materials domain language model for text mining and information extractionNpj Comput Mater2022810210.1038/s41524-022-00784-w
– reference: Loshchilov I, Hutter F (2019) Decoupled weight decay regularization
– reference: Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. arXiv:1903.10676
– reference: Consensus. https://consensus.app/
– reference: OpenAI. (n.d.). ChatGPT — a model interacting in a conversational way, trained on more human feedback. Retrieved from https://openai.com/blog/chatgpt
– reference: Clarivate Analytics. (n.d.). Web of Science. https://clarivate.com/products/web-of-science/
– reference: BilalMAlmazroiAAEffectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviewsElectron Commer Res2023232737275710.1007/s10660-022-09560-w
– reference: Google Vision API. https://cloud.google.com/vision/docs/apis
– reference: Raabe D Glossary of materials science
– reference: Reseractor tool. https://github.com/ShikharJha/Reseractor
– reference: Elicit. https://elicit.com/
– reference: ChaurasiaNJhaSKSangalSA novel training methodology for phase segmentation of steel microstructures using a deep learning algorithmMaterialia2023301:CAS:528:DC%2BB3sXht1OnsrnN10.1016/j.mtla.2023.101803
– reference: TshitoyanVDagdelenJWestonLUnsupervised word embeddings fcapture latent knowledge from materials science literatureNature201957195981:CAS:528:DC%2BC1MXhtlamurrK10.1038/s41586-019-1335-831270483
– reference: Crossref. https://www.crossref.org/
– reference: QuillBot. (n.d.). Free paraphrasing tool - Best Article Rewriter. https://quillbot.com/
– reference: OlivettiEAColeJMKimEData-driven materials research enabled by natural language processing and information extractionAppl Phys Rev202071:CAS:528:DC%2BB3cXis1Olu77L10.1063/5.0021106
– volume: 571
  start-page: 95
  year: 2019
  ident: 9980_CR26
  publication-title: Nature
  doi: 10.1038/s41586-019-1335-8
– ident: 9980_CR8
– volume: 8
  start-page: 102
  year: 2022
  ident: 9980_CR22
  publication-title: Npj Comput Mater
  doi: 10.1038/s41524-022-00784-w
– volume: 2007
  start-page: 2
  issue: 159
  year: 2007
  ident: 9980_CR17
  publication-title: Linux J
– ident: 9980_CR4
– ident: 9980_CR6
– ident: 9980_CR2
– ident: 9980_CR14
– ident: 9980_CR20
  doi: 10.18653/v1/D19-1371
– ident: 9980_CR18
– ident: 9980_CR1
  doi: 10.1021/acs.jpcc.3c03106
– volume: 7
  start-page: 1
  year: 2012
  ident: 9980_CR29
  publication-title: Source Code Biol Med
  doi: 10.1186/1751-0473-7-7
– volume: 23
  start-page: 2737
  year: 2023
  ident: 9980_CR24
  publication-title: Electron Commer Res
  doi: 10.1007/s10660-022-09560-w
– ident: 9980_CR21
– volume: 4
  start-page: 1
  year: 2003
  ident: 9980_CR27
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-4-20
– start-page: 45
  volume-title: Clinical Text Mining
  year: 2018
  ident: 9980_CR28
  doi: 10.1007/978-3-319-78503-5_6
– volume: 7
  year: 2020
  ident: 9980_CR12
  publication-title: Appl Phys Rev
  doi: 10.1063/5.0021106
– ident: 9980_CR25
– volume: 768
  year: 2020
  ident: 9980_CR16
  publication-title: IOP Conf Ser Mater Sci Eng
  doi: 10.1088/1757-899X/768/7/072094
– ident: 9980_CR23
– volume: 2
  start-page: 629
  year: 2007
  ident: 9980_CR13
  publication-title: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)
  doi: 10.1109/ICDAR.2007.4376991
– ident: 9980_CR9
– volume: 30
  year: 2023
  ident: 9980_CR30
  publication-title: Materialia
  doi: 10.1016/j.mtla.2023.101803
– ident: 9980_CR31
– ident: 9980_CR10
– ident: 9980_CR7
– ident: 9980_CR5
– ident: 9980_CR3
– ident: 9980_CR11
– ident: 9980_CR19
– ident: 9980_CR15
  doi: 10.1007/978-3-030-86549-8_9
SSID ssj0005721
Score 2.460126
Snippet With the advent of several tools and web engines trained for finding journal articles out of billions of research papers on millions of topics in different...
SourceID proquest
gale
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 13856
SubjectTerms Algorithms
Characterization and Evaluation of Materials
Chemistry and Materials Science
Classical Mechanics
Computation & Theory
Computational linguistics
Crystallography and Scattering Methods
Data analysis
Documents
domain
Equipment and supplies
Image analysis
Image processing
Language processing
Materials Science
Natural language interfaces
Natural language processing
Polymer Sciences
Software
Solid Mechanics
Webs
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NT9wwEB3R5dIeKmhBXUqRWyH1QK1mEyd2KlVoW4EoUlcVBYmbZTvxAoJk2Y_L_vrOZB0WiuCSSyaJPB7PTOx5bwB2cyVMIVzOXS4dFzJ2XAknuRCFNTH-ypmSgMK_B9nRmTg-T89XYNBiYaissvWJjaMuakd75F-JuU6IBBfY_uiWU9coOl1tW2iY0Fqh-N5QjL2A1ZiYsTqw-uNg8OdkWfQh417LH05MagFGE8B0GLo4xiyOWZOK-PxBqPrfYT86OW0C0uEavA6ZJOsvpn4dVsrqDby6xy_4Fq4GpqHVYO2uJBstcAF4l5nrIY5venEzYZi4sqK-MZcVJ-QlVQ8xKh1l6LnHC-QDu6wYJreNvbIQNr8xqtsbNx17NuD08OD05xEPzRW4w2U65d5ZTEU8pnfWF6g0_EXOTCKET4o0zqNS5M6mPpWY7sgs89S4TzqTGaWsT3om2YROVVflO2BWRS5V6MBRVlgZKVPkXljlpc2kjOIu9Fo1aheIx6n_xbVeUiaT6jWqXjeq1_Mu7N09M1rQbjwr_YlmRxOfRUUFM0Mzm0z0r78nuq8wRBPeNunC5yDka9KdCfgDHARRYD2Q3G5nWYcVPdFL--vCx7vbuBbpgMVUZT1DGQwfEi8RvuJLax3LVzw9gK3nv_geXsaNXVLd4TZ0puNZ-QFzoandCQb-D3eLBf8
  priority: 102
  providerName: ProQuest
Title Natural language processing algorithms for domain-specific data extraction in material science: Reseractor
URI https://link.springer.com/article/10.1007/s10853-024-09980-z
https://www.proquest.com/docview/3087443219
https://www.proquest.com/docview/3153731503
Volume 59
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Academic Search Ultimate (EBSCO)
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1573-4803
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: ABDBF
  dateStart: 20030401
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1573-4803
  dateEnd: 20241102
  omitProxy: false
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: ADMLS
  dateStart: 19970201
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1573-4803
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: AFBBN
  dateStart: 19660201
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1573-4803
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: BENPR
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  customDbUrl:
  eissn: 1573-4803
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: 8FG
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/technologycollection1
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1573-4803
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1573-4803
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005721
  issn: 0022-2461
  databaseCode: U2A
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1RT9swED4NeNkeEGOb1gGVmSbtYbOUJk7s8NZNLWzTqomBxJ4s24mBCRLUtC_8eu5Sh8IGk_aSPPhiyefz3ef47jPAu1wJUwiXc5dLx4WMHVfCSS5EYU2MWzlTUqHw90l2cCy-nqQnoSis6bLduyPJ1lPfKXbD0MIxpnBENSri1yuwlhKdF1rxcTxcJnbIeNBxhBNbWiiVebiPe-HoT6f81-loG3TGG7Ae0CIbLqb3OTwpq014dodD8AX8npiWOoN1fx7Z1SL3H1uZuTitcfd_dtkwBKesqC_NecWpupIyhBilhzL0ztNFdQM7rxgC2NYmWQiNe4xy86btrTwv4Wg8Ovp8wMMFCtzhUpxx7yzCDY8QzvoCgQNugzOTCOGTIo3zqBS5s6lPJUIamWWeLueTzmRGKeuTgUlewWpVV-VrYFZFLlXopFFWWBkpU-ReWOWlzaSM4h4MOjVqF8jF6Y6LC72kRSbVa1S9blWvr3vw4fabqwW1xj-l39LsaOKsqCgp5tTMm0Z_-XmohwrDMNXUJj14H4R8TbozocYAB0E0V_ckt7tZ1mHVNprYEYVI0In3YPe2GdcbHaKYqqznKIMhQuIjwi4-dtax7OLxAbz5P_EteBq3dkq5htuwOpvOyx3EPzPbhxU13u_D2nD_17cRvj-NJj8O--0iuAGENQEV
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB2V9gAcEJ9iSwGDQBzAIps4sYNUoQKtdmm7QmUr9WbZTlxatcmyH0L0v_HfmMk6XQqit15yieMo47HfOJ73BuBFroQphMu5y6XjQsaOK-EkF6KwJsatnCmJKLw7yHr74vNBerAEv1ouDKVVtmtis1AXtaN_5G9JuU6IBCfY-9F3TlWj6HS1LaFhQmmFYr2RGAvEju3y5w_cwk3W-59wvF_G8dbm8GOPhyoD3KG_Trl3FjHZY5xjfYFP414xM4kQPinSOI9KkTub-lQi7sss81TBTjqTGaWsT7omwW6vwYpIRI57v5UPm4Mve4scExl3W7lyEm4LrJ3A3UOk5AiRHIM0FfGzC8j4Nz78c1Db4N_WbbgVAle2Mfe0O7BUVnfh5h9yhvfgeGAaFQ_W_gRlozkNAe8yc3KI5px-O50wjJNZUZ-ao4oT0ZOSlRhlqjIEivGcaMGOKoaxdDM9WEDpd4zSBMdNgaD7MLwKKz-A5aquyofArIpcqhAvsK2wMlKmyL2wykubSRnFHei2ZtQu6JxTuY0TvVBoJtNrNL1uTK_POvD6_JnRXOXj0tbPaXQ0yWdUlJ9zaGaTie5_3dMbCiMCovcmHXgVGvmabGcC3QE_ghS3LrRca0dZhwVkohfu3oFn57dx6tN5jqnKeoZtEK0kXiLs4k3rHYsu_v8Bq5e_8Slc7w13d_ROf7D9CG7EjY9SyuMaLE_Hs_IxhmFT-yQ4OwN9xdPrN7PDQUs
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB5RkKr2gOhLLK-6VaUeWots4sQOtxXtCgpdVS1I3CzbiekiSFa72Qu_npk82KUvqZdcPLHk8djzJZ7vM8C7VAmTCZdyl0rHhQwdV8JJLkRmTYifciYnovDXUXJ0Lr5cxBdLLP662r07kmw4DaTSVFT7k8zvLxHfMM1wzC8cEY4K-O0jWBMklIARfR4OFkUeMux3euGknNbSZv7cx4PU9OsG_dtJaZ2Ahhuw3iJHNmim-hms5MVzeLqkJ_gCrkamltFg3V9INml4ANjKzPVlOR1XP29mDIEqy8obMy44MS2pWohRqSjDnXraMB3YuGAIZuv4ZG2aPGBUpzetb-h5CWfDz2eHR7y9TIE7XJYV984i9PAI56zPEETgJ3FiIiF8lMVhGuQidTb2sUR4I5PE00V90pnEKGV91DfRK1gtyiLfBGZV4GKFGzbaCisDZbLUC6u8tImUQdiDfudG7Vqhcbrv4lovJJLJ9Rpdr2vX69sefLh_Z9LIbPzT-i3Njib9ioIKZC7NfDbTxz--64HClEz82qgH71sjX5LvTMs3wEGQ5NUDy51ulnW7gmealBKFiHBD78Gb-2Zce3SgYoq8nKMNpguJjwC7-NhFx6KLvw9g6__MX8Pjb5-G-vR4dLINT8I6ZKkEcQdWq-k830VYVNm9OvLvADLJBCk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Natural+language+processing+algorithms+for+domain-specific+data+extraction+in+material+science%3A+Reseractor&rft.jtitle=Journal+of+materials+science&rft.au=Gupta%2C+Antrakrate&rft.au=Mittal%2C+Divyansh&rft.au=Goel%2C+Ojsi&rft.au=Jha%2C+Shikhar+Krishn&rft.date=2024-08-01&rft.issn=0022-2461&rft.volume=59&rft.issue=30+p.13856-13872&rft.spage=13856&rft.epage=13872&rft_id=info:doi/10.1007%2Fs10853-024-09980-z&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0022-2461&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0022-2461&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0022-2461&client=summon