GENERALIZED DATASET OF GEOLOGICAL AND GEOPHYSICAL INFORMATION ON THE EASTERN SECTOR OF THE RUSSIAN ARCTIC FOR MACHINE LEARNING-BASED ANALYSIS

The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the r...

Full description

Saved in:
Bibliographic Details
Published inRussian geology and geophysics Vol. 66; no. 2; pp. 210 - 223
Main Authors Lisenkov, I.A., Soloviev, A.A., Kuznetsov, V.A., Nikolova, Yu.I.
Format Journal Article
LanguageEnglish
Published 01.02.2025
Online AccessGet full text
ISSN1068-7971
DOI10.2113/RGG20244747

Cover

Abstract The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the results of surveys among specialists, this stage is viewed as major time and resource-consuming, amounting up to 80% in total volume of data analysis for a hypothesis testing project. The paper focuses on creating a consistent data set that integrates geological and geophysical information on a given region. We consider problems of different sources in the geodata representation to be related to their format (vector/raster), scale, type of attribute information (quantitative/qualitative) and their availability. The algorithm formalization and synthesis for combining geospatial data and converting them into quantitative vectors is a critical aspect. Combining various data draws on the concept of neighborhood fitting in with the data selection techniques and data consolidation strategy. The paper presents the general architecture of the software and hardware complex which includes a module for data collection and transformation in Python using the Pandas library, a data storage system based on the PostgreSQL DBMS (Database Management System) with the PostGIS extension. It is shown that for the considered class of problems in geophysics, it is sufficient to use a relational DBMS for data storing and processing. If the problem dimension increases, it is proposed to use the Big Data technology based on Apache Hadoop for scaling the system. A practical application of the proposed approach is demonstrated as results of data collection for the Caucasus region and eastern sector of the Russian Arctic. Based on the prepared data, experiments were carried out using machine learning models for recognition of locations of potential strong earthquakes and for sensitivity estimation of several geophysical features of these regions. The article presents the experimental results and evaluation of their efficiency.
AbstractList The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the results of surveys among specialists, this stage is viewed as major time and resource-consuming, amounting up to 80% in total volume of data analysis for a hypothesis testing project. The paper focuses on creating a consistent data set that integrates geological and geophysical information on a given region. We consider problems of different sources in the geodata representation to be related to their format (vector/raster), scale, type of attribute information (quantitative/qualitative) and their availability. The algorithm formalization and synthesis for combining geospatial data and converting them into quantitative vectors is a critical aspect. Combining various data draws on the concept of neighborhood fitting in with the data selection techniques and data consolidation strategy. The paper presents the general architecture of the software and hardware complex which includes a module for data collection and transformation in Python using the Pandas library, a data storage system based on the PostgreSQL DBMS (Database Management System) with the PostGIS extension. It is shown that for the considered class of problems in geophysics, it is sufficient to use a relational DBMS for data storing and processing. If the problem dimension increases, it is proposed to use the Big Data technology based on Apache Hadoop for scaling the system. A practical application of the proposed approach is demonstrated as results of data collection for the Caucasus region and eastern sector of the Russian Arctic. Based on the prepared data, experiments were carried out using machine learning models for recognition of locations of potential strong earthquakes and for sensitivity estimation of several geophysical features of these regions. The article presents the experimental results and evaluation of their efficiency.
Author Nikolova, Yu.I.
Soloviev, A.A.
Kuznetsov, V.A.
Lisenkov, I.A.
Author_xml – sequence: 1
  givenname: I.A.
  surname: Lisenkov
  fullname: Lisenkov, I.A.
– sequence: 2
  givenname: A.A.
  surname: Soloviev
  fullname: Soloviev, A.A.
– sequence: 3
  givenname: V.A.
  surname: Kuznetsov
  fullname: Kuznetsov, V.A.
– sequence: 4
  givenname: Yu.I.
  surname: Nikolova
  fullname: Nikolova, Yu.I.
BookMark eNpNUE9LwzAczWGC2_TkF8hdqkmaJusxtlkb6FJJsoNeSrs2oOgmrRc_hN_ZVHcQfvDjPXh_eCuwOJ6OAwA3GN0RjON7UxQEEUo55QuwxIhtIp5yfAlW0_SKEOGblC_BdyG1NKJSzzKHuXDCSgfrLSxkXdWFykQFhc5n-Fg-2V-s9LY2O-FUrWE4V0oohXXSaGhl5moz62fW7K1VQkNhMqcyGFRwJ7JSaQkrKYxWuogeQmAeIkQV3O0VuPDt2zRcn_8a7LfSZWV07hK1BLPPiIX2qGM47Wmf8ITFAfs0TlqOPGEtJ552Q0-GhAbKHzYdPRDiWe9xilo60HgNbv98D-NpmsbBNx_jy3s7fjUYNfN6zb_14h9LI1nw
Cites_doi 10.3390/rs14030538
10.1186/s40623-016-0404-6
10.1785/0320230039
10.1109/Geoinformatics.2012.6270347
10.1201/9780367816377-12
10.1186/s40537-020-00305-w
10.1134/S1069351322010037
10.5194/essd-14-4489-2022
10.1134/S1069351316040017
10.3390/app122311990
10.15372/RGG2019131
10.2113/RGG20234574
10.1134/S1069351316050141
10.1016/j.rgg.2013.07.007
10.31857/S0869587323060087
10.2113/RGG20204227
10.1093/bioinformatics/btq134
10.3390/land12101877
10.2113/RGG20234579
10.14778/2536222.2536227
10.1109/DICTA.2016.7797091
10.3390/app12105010
10.1134/S1028334X13060159
10.1051/matecconf/201713900222
10.1029/2021GL095147
10.1134/S0742046321020032
10.1088/1755-1315/48/1/012030
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.2113/RGG20244747
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Geology
EndPage 223
ExternalDocumentID 10_2113_RGG20244747
GroupedDBID --K
--M
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
7-5
71M
8P~
AAEDT
AAIKJ
AAOAW
AAYXX
ABQEM
ACGFS
ACLVX
ACSBN
ADBBV
ADEZE
AEKER
AENEX
AFTXZ
AGHFR
AGUBO
AGYEJ
ALMA_UNASSIGNED_HOLDINGS
BLXMC
CITATION
CS3
DU5
EO8
EO9
EP2
EP3
FDB
FEDTE
FIRID
FNPLU
GBLVA
HVGLF
IHE
IMUCA
J1W
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
PC.
Q38
RGW
RPZ
SDF
SDG
SES
~02
~HD
ID FETCH-LOGICAL-a216t-60270b619d4d57563027f935a70f26a72f4bed2e545a7fc8b4c22f6df190a4e43
ISSN 1068-7971
IngestDate Wed Oct 01 02:25:22 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a216t-60270b619d4d57563027f935a70f26a72f4bed2e545a7fc8b4c22f6df190a4e43
PageCount 14
ParticipantIDs crossref_primary_10_2113_RGG20244747
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-02-01
PublicationDateYYYYMMDD 2025-02-01
PublicationDate_xml – month: 02
  year: 2025
  text: 2025-02-01
  day: 01
PublicationDecade 2020
PublicationTitle Russian geology and geophysics
PublicationYear 2025
References 2025020502580369800_r39
2025020502580369800_r38
2025020502580369800_r37
2025020502580369800_r36
Shebalin (2025020502580369800_r48) 1997; 28
2025020502580369800_r35
Gvishiani (2025020502580369800_r23) 2021; 15
Gvishiani (2025020502580369800_r26) 2022; 12
2025020502580369800_r30
2025020502580369800_r9
Soloviev (2025020502580369800_r49) 2013; 450
Boehmke (2025020502580369800_r10) 2019
Gil’manova (2025020502580369800_r19) 2023; 64
Gvishiani (2025020502580369800_r22) 2016; 52
Golebiowski (2025020502580369800_r20) 2022; 14
Amante (2025020502580369800_r4) 2009
2025020502580369800_r28
Trugman (2025020502580369800_r53) 2023; 3
Altmann (2025020502580369800_r3) 2010; 26
Lesur (2025020502580369800_r33) 2016; 68
Thébault (2025020502580369800_r52) 2021; 48
Kondorskaya (2025020502580369800_r31) 1993
Gvishiani (2025020502580369800_r25) 2022; 12
2025020502580369800_r5
2025020502580369800_r7
2025020502580369800_r6
2025020502580369800_r2
Zelenin (2025020502580369800_r55) 2022; 14
Gel’fand (2025020502580369800_r18) 1976
2025020502580369800_r17
2025020502580369800_r13
Lisenkov (2025020502580369800_r34) 2024
2025020502580369800_r12
2025020502580369800_r56
2025020502580369800_r11
Gvishiani (2025020502580369800_r21) 1988
2025020502580369800_r54
Soloviev (2025020502580369800_r51) 2024; 25
Hancock (2025020502580369800_r29) 2020; 7
Roh (2025020502580369800_r46) 2019; 99
2025020502580369800_r47
Soloviev (2025020502580369800_r50) 2016; 52
Esin (2025020502580369800_r16) 2024; 65
2025020502580369800_r45
2025020502580369800_r44
2025020502580369800_r43
Koulakov (2025020502580369800_r32) 2013; 54
2025020502580369800_r42
2025020502580369800_r41
2025020502580369800_r40
Gvishiani (2025020502580369800_r24) 2022; 58
Gvishiani (2025020502580369800_r27) 2023; 93
Dobretsov (2025020502580369800_r15) 2021; 62
Dobretsov (2025020502580369800_r14) 2019; 60
Adesola (2025020502580369800_r1) 2023; 12
References_xml – volume: 14
  start-page: 538
  issue: 3
  year: 2022
  ident: 2025020502580369800_r20
  article-title: Application of сcomplex geophysical methods for the detection of unconsolidated zones in flood dikes
  publication-title: Remote Sensing
  doi: 10.3390/rs14030538
– ident: 2025020502580369800_r9
– volume: 68
  issue: 1
  year: 2016
  ident: 2025020502580369800_r33
  article-title: Building the second version of the World Digital Magnetic Anomaly Map (WDMAM)
  publication-title: Earth Planets and Space
  doi: 10.1186/s40623-016-0404-6
– volume: 3
  start-page: 322
  issue: 4
  year: 2023
  ident: 2025020502580369800_r53
  article-title: Coherent spatial variations in the productivity of earthquake sequences in California and Nevada
  publication-title: The Seismic Record
  doi: 10.1785/0320230039
– ident: 2025020502580369800_r17
– ident: 2025020502580369800_r42
– ident: 2025020502580369800_r56
  doi: 10.1109/Geoinformatics.2012.6270347
– ident: 2025020502580369800_r36
– ident: 2025020502580369800_r5
– ident: 2025020502580369800_r13
– start-page: 221
  volume-title: Hands-On Machine Learning with R
  year: 2019
  ident: 2025020502580369800_r10
  doi: 10.1201/9780367816377-12
– ident: 2025020502580369800_r28
– volume: 7
  issue: 1
  year: 2020
  ident: 2025020502580369800_r29
  article-title: Survey on categorical data for neural networks
  publication-title: Journal of Big Data
  doi: 10.1186/s40537-020-00305-w
– ident: 2025020502580369800_r6
– volume: 58
  start-page: 1
  issue: 1
  year: 2022
  ident: 2025020502580369800_r24
  article-title: Big data in geophysics and other Earth sciences
  publication-title: Izvestiya Physics of the Solid Earth
  doi: 10.1134/S1069351322010037
– volume: 14
  start-page: 4489
  issue: 10
  year: 2022
  ident: 2025020502580369800_r55
  article-title: The active faults of Eurasia database (AFEAD): the ontology and design behind the continental-scale dataset
  publication-title: Earth System Science Data
  doi: 10.5194/essd-14-4489-2022
– volume: 52
  start-page: 461
  issue: 4
  year: 2016
  ident: 2025020502580369800_r22
  article-title: FCaZm intelligent recognition system for locating areas prone to strong earthquakes in the Andean and Caucasian mountain belts
  publication-title: Izvestiya Physics of the Solid Earth
  doi: 10.1134/S1069351316040017
– volume: 12
  start-page: 11990
  issue: 23
  year: 2022
  ident: 2025020502580369800_r25
  article-title: Strong earthquakeprone areas in the eastern sector of the Arctic zone of the Russian Federation
  publication-title: Applied Sciences
  doi: 10.3390/app122311990
– ident: 2025020502580369800_r45
– volume: 60
  start-page: 1327
  issue: 12
  year: 2019
  ident: 2025020502580369800_r14
  article-title: Postglacial uplift: record in the gravity field and in Neogene–Quaternary structures
  publication-title: Russian Geology and Geophysics
  doi: 10.15372/RGG2019131
– volume: 64
  start-page: 1481
  issue: 12
  year: 2023
  ident: 2025020502580369800_r19
  article-title: Use of digital elevation models in metallogenic investigations on the example of the central part of the lower Amur province
  publication-title: Russian Geology and Geophysics
  doi: 10.2113/RGG20234574
– ident: 2025020502580369800_r39
– volume-title: Computational Seismology [in Russian]
  year: 1976
  ident: 2025020502580369800_r18
  article-title: Conditions for the Occurrence of Large Earthquakes: (California and Some Other Regions). The Study of Seismicity and Earth Models
– volume: 52
  start-page: 803
  issue: 6
  year: 2016
  ident: 2025020502580369800_r50
  article-title: Application of the data on the lithospheric magnetic anomalies in the problem of recognizing the earthquake prone areas
  publication-title: Izvestiya Physics of the Solid Earth
  doi: 10.1134/S1069351316050141
– volume: 54
  start-page: 859
  issue: 8
  year: 2013
  ident: 2025020502580369800_r32
  article-title: Plate reconstructions in the Arctic region based on joint analysis of gravity, magnetic, and seismic anomalies
  publication-title: Russian Geology and Geophysics
  doi: 10.1016/j.rgg.2013.07.007
– volume: 93
  start-page: 518
  issue: 6
  year: 2023
  ident: 2025020502580369800_r27
  article-title: System analysis of big data for Earth sciences
  publication-title: Vestnik Rossijskoj Akademii Nauk
  doi: 10.31857/S0869587323060087
– ident: 2025020502580369800_r35
– volume: 62
  start-page: 44
  issue: 1
  year: 2021
  ident: 2025020502580369800_r15
  article-title: First results and prospects of a new approach to the study of active geologic processes by space and ground instrumental measurements (by the example of Kamchatka and the Central Asian orogenic belt)
  publication-title: Russian Geology and Geophysics
  doi: 10.2113/RGG20204227
– volume: 26
  start-page: 1340
  issue: 10
  year: 2010
  ident: 2025020502580369800_r3
  article-title: Permutation importance: A corrected feature importance measure
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq134
– ident: 2025020502580369800_r7
– volume: 12
  start-page: 1877
  issue: 10
  year: 2023
  ident: 2025020502580369800_r1
  article-title: Groundwater potential zones assessment using geospatial models in semi-arid areas of south Africa
  publication-title: Land
  doi: 10.3390/land12101877
– volume: 28
  start-page: 201
  year: 1997
  ident: 2025020502580369800_r48
  article-title: NATO ASI Series. Series 2
  publication-title: Environment
– volume: 65
  start-page: 285
  issue: 2
  year: 2024
  ident: 2025020502580369800_r16
  article-title: Spatial correlations between the terrain features, gravitational field and seismic velocity anomalies in the central Kamchatka region
  publication-title: Russian Geology and Geophysics
  doi: 10.2113/RGG20234579
– ident: 2025020502580369800_r2
  doi: 10.14778/2536222.2536227
– volume: 99
  issue: 1–1
  year: 2019
  ident: 2025020502580369800_r46
  article-title: A survey on data collection for machine learning: a big data – AI integration perspective
  publication-title: IEEE Transactions on Knowledge and Data Engineering
– ident: 2025020502580369800_r41
– ident: 2025020502580369800_r54
  doi: 10.1109/DICTA.2016.7797091
– volume: 12
  start-page: 5010
  issue: 10
  year: 2022
  ident: 2025020502580369800_r26
  article-title: Integrated earthquake catalog of the eastern sector of Russian Arctic
  publication-title: Applied Sciences
  doi: 10.3390/app12105010
– ident: 2025020502580369800_r44
– volume: 450
  start-page: 658
  issue: 2
  year: 2013
  ident: 2025020502580369800_r49
  article-title: Recognition of potential sources of strong earthquakes in the Caucasus region using GIS technologies
  publication-title: Doklady Earth Sciences
  doi: 10.1134/S1028334X13060159
– ident: 2025020502580369800_r38
– ident: 2025020502580369800_r11
  doi: 10.1051/matecconf/201713900222
– volume: 48
  issue: 21
  year: 2021
  ident: 2025020502580369800_r52
  article-title: A spherical harmonic model of Earth’s lithospheric magnetic field up to degree 1050
  publication-title: Geophysical Research Letters
  doi: 10.1029/2021GL095147
– volume-title: Prediction of Earthquake Locations in Regions of Moderate Seismicity [in Russian]
  year: 1988
  ident: 2025020502580369800_r21
– ident: 2025020502580369800_r30
– volume-title: Seismisity and Seismic Zoning of Northern Eurasia [in Russian]. Collection of works
  year: 1993
  ident: 2025020502580369800_r31
– volume: 15
  start-page: 73
  issue: 2
  year: 2021
  ident: 2025020502580369800_r23
  article-title: Fuzzy sets of high seismicity intersections of morphostructural lineaments in the Caucasus and in the Altai–Sayan–Baikal Region
  publication-title: Journal of Volcanology and Seismology
  doi: 10.1134/S0742046321020032
– volume: 25
  start-page: 20
  issue: 2
  year: 2024
  ident: 2025020502580369800_r51
  article-title: Review and prospects of applying modern approaches to comprehensive geodata analysis for predicting the spatial distribution of geological and geophysical parameters [in Russian]
  publication-title: Geophysical Research
– ident: 2025020502580369800_r47
– ident: 2025020502580369800_r40
– ident: 2025020502580369800_r43
– ident: 2025020502580369800_r37
  doi: 10.1088/1755-1315/48/1/012030
– start-page: 19
  year: 2009
  ident: 2025020502580369800_r4
  article-title: ETOPO1 1 Arc-Minute Global Relief Model: Procedures, Data Sources and Analysis
  publication-title: NOAA Technical Memorandum NESDIS NGDC-24
– ident: 2025020502580369800_r12
– year: 2024
  ident: 2025020502580369800_r34
  article-title: Generalized dataset of geological and geophysical information of the eastern sector of the Russian Arctic, ver. 1.0 (2023)
  publication-title: Earth Science Database
SSID ssj0027897
Score 2.3459322
Snippet The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning...
SourceID crossref
SourceType Index Database
StartPage 210
Title GENERALIZED DATASET OF GEOLOGICAL AND GEOPHYSICAL INFORMATION ON THE EASTERN SECTOR OF THE RUSSIAN ARCTIC FOR MACHINE LEARNING-BASED ANALYSIS
Volume 66
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  issn: 1068-7971
  databaseCode: GBLVA
  dateStart: 20110101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0027897
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect
  issn: 1068-7971
  databaseCode: .~1
  dateStart: 20070101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0027897
  providerName: Elsevier
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb5tAEF65qSr1UvWppmmrPeSGoLBegzlubYJpbVIBrpJeLGCXKopEqtjuIYf-g_6P_szOsoDXUQ5pJYRg7GEt5tM81vNA6NjheclB-ZrUgRCFugDj3PPBkRv6JadlTnkl9yEXsTtb0k9no7PB4I-WtbTdFFZ5c2ddyf9IFWggV1kl-w-S7R8KBLgG-cIZJAzne8m4zTuLvgVTY8oylgaZzOIJg37EgWweBbdfZudpcx_FEPUt1KgdOGS-T8BS2RXXSINJdpp0WUDJEtQsiw2WTDIQFHAZCzaZRXFgzAOWxFEcmh9hwSkswebw9FR3c5PtuinO_C52LZ7gWm2j9F78_GIt6surn42espjV7_YAE5jrhs40-uftTS02a8XwVfsgvriULI0ffL61IkvfyyCjLv25V7-2OzY9Xw1l6fSzmsrS4pDsKVtbs9tE1S3fNgkQ4MrWFEkYwoKUeqq_537j7VsGsU9ThABJsq805gfoIQH7IYeEWL-cXWg_VmN8ut-vKkEl8weNWfN9NCcme4qetNEHZgpKz9BA1M_Ro1BJ6QX6rQEKt4DCpyd4BygMgMIaoLAGKAwHQAe3gMIKUJJfUltAYQUoDFy4BRTeBxTuAPUSLU-CbDIz27XNnDjuxnThPdgFROSccogCXPmXeOUPR7lnV8TNPVLRQnAiwGnPvaocF7QkpHJ5BU5pTgUdvkIH9VUtXiMsyjFQKoePaEErm_ilTxzu-LnMthC8PETH3Ztc_VBtWVZ3SOvN_b52hB7v0PgWHWyut-Id-Jqb4n0j5r_dombt
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GENERALIZED+DATASET+OF+GEOLOGICAL+AND+GEOPHYSICAL+INFORMATION+ON+THE+EASTERN+SECTOR+OF+THE+RUSSIAN+ARCTIC+FOR+MACHINE+LEARNING-BASED+ANALYSIS&rft.jtitle=Russian+geology+and+geophysics&rft.au=Lisenkov%2C+I.A.&rft.au=Soloviev%2C+A.A.&rft.au=Kuznetsov%2C+V.A.&rft.au=Nikolova%2C+Yu.I.&rft.date=2025-02-01&rft.issn=1068-7971&rft.volume=66&rft.issue=2&rft.spage=210&rft.epage=223&rft_id=info:doi/10.2113%2FRGG20244747&rft.externalDBID=n%2Fa&rft.externalDocID=10_2113_RGG20244747
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1068-7971&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1068-7971&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1068-7971&client=summon