Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Large auto-generated databases of magnetic materials properties have the potential for great utility in materials science research. This article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatu...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 5; no. 1; p. 180111
Main Authors Court, Callum J., Cole, Jacqueline M.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 19.06.2018
Nature Publishing Group
Subjects
Online AccessGet full text
ISSN2052-4463
2052-4463
DOI10.1038/sdata.2018.111

Cover

More Information
Summary:Large auto-generated databases of magnetic materials properties have the potential for great utility in materials science research. This article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤ 500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. This makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery. Design Type(s) data integration objective • database creation objective Measurement Type(s) physicochemical characterization Technology Type(s) data item extraction from journal article Factor Type(s) Machine-accessible metadata file describing the reported data (ISA-Tab format)
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
AC02-06CH11357
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Materials Sciences & Engineering Division
Engineering and Physical Sciences Research Council (EPSRC)
ISSN:2052-4463
2052-4463
DOI:10.1038/sdata.2018.111