GENERALIZED DATASET OF GEOLOGICAL AND GEOPHYSICAL INFORMATION ON THE EASTERN SECTOR OF THE RUSSIAN ARCTIC FOR MACHINE LEARNING-BASED ANALYSIS
The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the r...
Saved in:
| Published in | Russian geology and geophysics Vol. 66; no. 2; pp. 210 - 223 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
01.02.2025
|
| Online Access | Get full text |
| ISSN | 1068-7971 |
| DOI | 10.2113/RGG20244747 |
Cover
| Abstract | The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the results of surveys among specialists, this stage is viewed as major time and resource-consuming, amounting up to 80% in total volume of data analysis for a hypothesis testing project. The paper focuses on creating a consistent data set that integrates geological and geophysical information on a given region. We consider problems of different sources in the geodata representation to be related to their format (vector/raster), scale, type of attribute information (quantitative/qualitative) and their availability. The algorithm formalization and synthesis for combining geospatial data and converting them into quantitative vectors is a critical aspect. Combining various data draws on the concept of neighborhood fitting in with the data selection techniques and data consolidation strategy. The paper presents the general architecture of the software and hardware complex which includes a module for data collection and transformation in Python using the Pandas library, a data storage system based on the PostgreSQL DBMS (Database Management System) with the PostGIS extension. It is shown that for the considered class of problems in geophysics, it is sufficient to use a relational DBMS for data storing and processing. If the problem dimension increases, it is proposed to use the Big Data technology based on Apache Hadoop for scaling the system. A practical application of the proposed approach is demonstrated as results of data collection for the Caucasus region and eastern sector of the Russian Arctic. Based on the prepared data, experiments were carried out using machine learning models for recognition of locations of potential strong earthquakes and for sensitivity estimation of several geophysical features of these regions. The article presents the experimental results and evaluation of their efficiency. |
|---|---|
| AbstractList | The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the results of surveys among specialists, this stage is viewed as major time and resource-consuming, amounting up to 80% in total volume of data analysis for a hypothesis testing project. The paper focuses on creating a consistent data set that integrates geological and geophysical information on a given region. We consider problems of different sources in the geodata representation to be related to their format (vector/raster), scale, type of attribute information (quantitative/qualitative) and their availability. The algorithm formalization and synthesis for combining geospatial data and converting them into quantitative vectors is a critical aspect. Combining various data draws on the concept of neighborhood fitting in with the data selection techniques and data consolidation strategy. The paper presents the general architecture of the software and hardware complex which includes a module for data collection and transformation in Python using the Pandas library, a data storage system based on the PostgreSQL DBMS (Database Management System) with the PostGIS extension. It is shown that for the considered class of problems in geophysics, it is sufficient to use a relational DBMS for data storing and processing. If the problem dimension increases, it is proposed to use the Big Data technology based on Apache Hadoop for scaling the system. A practical application of the proposed approach is demonstrated as results of data collection for the Caucasus region and eastern sector of the Russian Arctic. Based on the prepared data, experiments were carried out using machine learning models for recognition of locations of potential strong earthquakes and for sensitivity estimation of several geophysical features of these regions. The article presents the experimental results and evaluation of their efficiency. |
| Author | Nikolova, Yu.I. Soloviev, A.A. Kuznetsov, V.A. Lisenkov, I.A. |
| Author_xml | – sequence: 1 givenname: I.A. surname: Lisenkov fullname: Lisenkov, I.A. – sequence: 2 givenname: A.A. surname: Soloviev fullname: Soloviev, A.A. – sequence: 3 givenname: V.A. surname: Kuznetsov fullname: Kuznetsov, V.A. – sequence: 4 givenname: Yu.I. surname: Nikolova fullname: Nikolova, Yu.I. |
| BookMark | eNpNUE9LwzAczWGC2_TkF8hdqkmaJusxtlkb6FJJsoNeSrs2oOgmrRc_hN_ZVHcQfvDjPXh_eCuwOJ6OAwA3GN0RjON7UxQEEUo55QuwxIhtIp5yfAlW0_SKEOGblC_BdyG1NKJSzzKHuXDCSgfrLSxkXdWFykQFhc5n-Fg-2V-s9LY2O-FUrWE4V0oohXXSaGhl5moz62fW7K1VQkNhMqcyGFRwJ7JSaQkrKYxWuogeQmAeIkQV3O0VuPDt2zRcn_8a7LfSZWV07hK1BLPPiIX2qGM47Wmf8ITFAfs0TlqOPGEtJ552Q0-GhAbKHzYdPRDiWe9xilo60HgNbv98D-NpmsbBNx_jy3s7fjUYNfN6zb_14h9LI1nw |
| Cites_doi | 10.3390/rs14030538 10.1186/s40623-016-0404-6 10.1785/0320230039 10.1109/Geoinformatics.2012.6270347 10.1201/9780367816377-12 10.1186/s40537-020-00305-w 10.1134/S1069351322010037 10.5194/essd-14-4489-2022 10.1134/S1069351316040017 10.3390/app122311990 10.15372/RGG2019131 10.2113/RGG20234574 10.1134/S1069351316050141 10.1016/j.rgg.2013.07.007 10.31857/S0869587323060087 10.2113/RGG20204227 10.1093/bioinformatics/btq134 10.3390/land12101877 10.2113/RGG20234579 10.14778/2536222.2536227 10.1109/DICTA.2016.7797091 10.3390/app12105010 10.1134/S1028334X13060159 10.1051/matecconf/201713900222 10.1029/2021GL095147 10.1134/S0742046321020032 10.1088/1755-1315/48/1/012030 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.2113/RGG20244747 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Geology |
| EndPage | 223 |
| ExternalDocumentID | 10_2113_RGG20244747 |
| GroupedDBID | --K --M .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 7-5 71M 8P~ AAEDT AAIKJ AAOAW AAYXX ABQEM ACGFS ACLVX ACSBN ADBBV ADEZE AEKER AENEX AFTXZ AGHFR AGUBO AGYEJ ALMA_UNASSIGNED_HOLDINGS BLXMC CITATION CS3 DU5 EO8 EO9 EP2 EP3 FDB FEDTE FIRID FNPLU GBLVA HVGLF IHE IMUCA J1W MO0 N9A O-L O9- OAUVE OZT P-8 P-9 PC. Q38 RGW RPZ SDF SDG SES ~02 ~HD |
| ID | FETCH-LOGICAL-a216t-60270b619d4d57563027f935a70f26a72f4bed2e545a7fc8b4c22f6df190a4e43 |
| ISSN | 1068-7971 |
| IngestDate | Wed Oct 01 02:25:22 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a216t-60270b619d4d57563027f935a70f26a72f4bed2e545a7fc8b4c22f6df190a4e43 |
| PageCount | 14 |
| ParticipantIDs | crossref_primary_10_2113_RGG20244747 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-02-01 |
| PublicationDateYYYYMMDD | 2025-02-01 |
| PublicationDate_xml | – month: 02 year: 2025 text: 2025-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Russian geology and geophysics |
| PublicationYear | 2025 |
| References | 2025020502580369800_r39 2025020502580369800_r38 2025020502580369800_r37 2025020502580369800_r36 Shebalin (2025020502580369800_r48) 1997; 28 2025020502580369800_r35 Gvishiani (2025020502580369800_r23) 2021; 15 Gvishiani (2025020502580369800_r26) 2022; 12 2025020502580369800_r30 2025020502580369800_r9 Soloviev (2025020502580369800_r49) 2013; 450 Boehmke (2025020502580369800_r10) 2019 Gil’manova (2025020502580369800_r19) 2023; 64 Gvishiani (2025020502580369800_r22) 2016; 52 Golebiowski (2025020502580369800_r20) 2022; 14 Amante (2025020502580369800_r4) 2009 2025020502580369800_r28 Trugman (2025020502580369800_r53) 2023; 3 Altmann (2025020502580369800_r3) 2010; 26 Lesur (2025020502580369800_r33) 2016; 68 Thébault (2025020502580369800_r52) 2021; 48 Kondorskaya (2025020502580369800_r31) 1993 Gvishiani (2025020502580369800_r25) 2022; 12 2025020502580369800_r5 2025020502580369800_r7 2025020502580369800_r6 2025020502580369800_r2 Zelenin (2025020502580369800_r55) 2022; 14 Gel’fand (2025020502580369800_r18) 1976 2025020502580369800_r17 2025020502580369800_r13 Lisenkov (2025020502580369800_r34) 2024 2025020502580369800_r12 2025020502580369800_r56 2025020502580369800_r11 Gvishiani (2025020502580369800_r21) 1988 2025020502580369800_r54 Soloviev (2025020502580369800_r51) 2024; 25 Hancock (2025020502580369800_r29) 2020; 7 Roh (2025020502580369800_r46) 2019; 99 2025020502580369800_r47 Soloviev (2025020502580369800_r50) 2016; 52 Esin (2025020502580369800_r16) 2024; 65 2025020502580369800_r45 2025020502580369800_r44 2025020502580369800_r43 Koulakov (2025020502580369800_r32) 2013; 54 2025020502580369800_r42 2025020502580369800_r41 2025020502580369800_r40 Gvishiani (2025020502580369800_r24) 2022; 58 Gvishiani (2025020502580369800_r27) 2023; 93 Dobretsov (2025020502580369800_r15) 2021; 62 Dobretsov (2025020502580369800_r14) 2019; 60 Adesola (2025020502580369800_r1) 2023; 12 |
| References_xml | – volume: 14 start-page: 538 issue: 3 year: 2022 ident: 2025020502580369800_r20 article-title: Application of сcomplex geophysical methods for the detection of unconsolidated zones in flood dikes publication-title: Remote Sensing doi: 10.3390/rs14030538 – ident: 2025020502580369800_r9 – volume: 68 issue: 1 year: 2016 ident: 2025020502580369800_r33 article-title: Building the second version of the World Digital Magnetic Anomaly Map (WDMAM) publication-title: Earth Planets and Space doi: 10.1186/s40623-016-0404-6 – volume: 3 start-page: 322 issue: 4 year: 2023 ident: 2025020502580369800_r53 article-title: Coherent spatial variations in the productivity of earthquake sequences in California and Nevada publication-title: The Seismic Record doi: 10.1785/0320230039 – ident: 2025020502580369800_r17 – ident: 2025020502580369800_r42 – ident: 2025020502580369800_r56 doi: 10.1109/Geoinformatics.2012.6270347 – ident: 2025020502580369800_r36 – ident: 2025020502580369800_r5 – ident: 2025020502580369800_r13 – start-page: 221 volume-title: Hands-On Machine Learning with R year: 2019 ident: 2025020502580369800_r10 doi: 10.1201/9780367816377-12 – ident: 2025020502580369800_r28 – volume: 7 issue: 1 year: 2020 ident: 2025020502580369800_r29 article-title: Survey on categorical data for neural networks publication-title: Journal of Big Data doi: 10.1186/s40537-020-00305-w – ident: 2025020502580369800_r6 – volume: 58 start-page: 1 issue: 1 year: 2022 ident: 2025020502580369800_r24 article-title: Big data in geophysics and other Earth sciences publication-title: Izvestiya Physics of the Solid Earth doi: 10.1134/S1069351322010037 – volume: 14 start-page: 4489 issue: 10 year: 2022 ident: 2025020502580369800_r55 article-title: The active faults of Eurasia database (AFEAD): the ontology and design behind the continental-scale dataset publication-title: Earth System Science Data doi: 10.5194/essd-14-4489-2022 – volume: 52 start-page: 461 issue: 4 year: 2016 ident: 2025020502580369800_r22 article-title: FCaZm intelligent recognition system for locating areas prone to strong earthquakes in the Andean and Caucasian mountain belts publication-title: Izvestiya Physics of the Solid Earth doi: 10.1134/S1069351316040017 – volume: 12 start-page: 11990 issue: 23 year: 2022 ident: 2025020502580369800_r25 article-title: Strong earthquakeprone areas in the eastern sector of the Arctic zone of the Russian Federation publication-title: Applied Sciences doi: 10.3390/app122311990 – ident: 2025020502580369800_r45 – volume: 60 start-page: 1327 issue: 12 year: 2019 ident: 2025020502580369800_r14 article-title: Postglacial uplift: record in the gravity field and in Neogene–Quaternary structures publication-title: Russian Geology and Geophysics doi: 10.15372/RGG2019131 – volume: 64 start-page: 1481 issue: 12 year: 2023 ident: 2025020502580369800_r19 article-title: Use of digital elevation models in metallogenic investigations on the example of the central part of the lower Amur province publication-title: Russian Geology and Geophysics doi: 10.2113/RGG20234574 – ident: 2025020502580369800_r39 – volume-title: Computational Seismology [in Russian] year: 1976 ident: 2025020502580369800_r18 article-title: Conditions for the Occurrence of Large Earthquakes: (California and Some Other Regions). The Study of Seismicity and Earth Models – volume: 52 start-page: 803 issue: 6 year: 2016 ident: 2025020502580369800_r50 article-title: Application of the data on the lithospheric magnetic anomalies in the problem of recognizing the earthquake prone areas publication-title: Izvestiya Physics of the Solid Earth doi: 10.1134/S1069351316050141 – volume: 54 start-page: 859 issue: 8 year: 2013 ident: 2025020502580369800_r32 article-title: Plate reconstructions in the Arctic region based on joint analysis of gravity, magnetic, and seismic anomalies publication-title: Russian Geology and Geophysics doi: 10.1016/j.rgg.2013.07.007 – volume: 93 start-page: 518 issue: 6 year: 2023 ident: 2025020502580369800_r27 article-title: System analysis of big data for Earth sciences publication-title: Vestnik Rossijskoj Akademii Nauk doi: 10.31857/S0869587323060087 – ident: 2025020502580369800_r35 – volume: 62 start-page: 44 issue: 1 year: 2021 ident: 2025020502580369800_r15 article-title: First results and prospects of a new approach to the study of active geologic processes by space and ground instrumental measurements (by the example of Kamchatka and the Central Asian orogenic belt) publication-title: Russian Geology and Geophysics doi: 10.2113/RGG20204227 – volume: 26 start-page: 1340 issue: 10 year: 2010 ident: 2025020502580369800_r3 article-title: Permutation importance: A corrected feature importance measure publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq134 – ident: 2025020502580369800_r7 – volume: 12 start-page: 1877 issue: 10 year: 2023 ident: 2025020502580369800_r1 article-title: Groundwater potential zones assessment using geospatial models in semi-arid areas of south Africa publication-title: Land doi: 10.3390/land12101877 – volume: 28 start-page: 201 year: 1997 ident: 2025020502580369800_r48 article-title: NATO ASI Series. Series 2 publication-title: Environment – volume: 65 start-page: 285 issue: 2 year: 2024 ident: 2025020502580369800_r16 article-title: Spatial correlations between the terrain features, gravitational field and seismic velocity anomalies in the central Kamchatka region publication-title: Russian Geology and Geophysics doi: 10.2113/RGG20234579 – ident: 2025020502580369800_r2 doi: 10.14778/2536222.2536227 – volume: 99 issue: 1–1 year: 2019 ident: 2025020502580369800_r46 article-title: A survey on data collection for machine learning: a big data – AI integration perspective publication-title: IEEE Transactions on Knowledge and Data Engineering – ident: 2025020502580369800_r41 – ident: 2025020502580369800_r54 doi: 10.1109/DICTA.2016.7797091 – volume: 12 start-page: 5010 issue: 10 year: 2022 ident: 2025020502580369800_r26 article-title: Integrated earthquake catalog of the eastern sector of Russian Arctic publication-title: Applied Sciences doi: 10.3390/app12105010 – ident: 2025020502580369800_r44 – volume: 450 start-page: 658 issue: 2 year: 2013 ident: 2025020502580369800_r49 article-title: Recognition of potential sources of strong earthquakes in the Caucasus region using GIS technologies publication-title: Doklady Earth Sciences doi: 10.1134/S1028334X13060159 – ident: 2025020502580369800_r38 – ident: 2025020502580369800_r11 doi: 10.1051/matecconf/201713900222 – volume: 48 issue: 21 year: 2021 ident: 2025020502580369800_r52 article-title: A spherical harmonic model of Earth’s lithospheric magnetic field up to degree 1050 publication-title: Geophysical Research Letters doi: 10.1029/2021GL095147 – volume-title: Prediction of Earthquake Locations in Regions of Moderate Seismicity [in Russian] year: 1988 ident: 2025020502580369800_r21 – ident: 2025020502580369800_r30 – volume-title: Seismisity and Seismic Zoning of Northern Eurasia [in Russian]. Collection of works year: 1993 ident: 2025020502580369800_r31 – volume: 15 start-page: 73 issue: 2 year: 2021 ident: 2025020502580369800_r23 article-title: Fuzzy sets of high seismicity intersections of morphostructural lineaments in the Caucasus and in the Altai–Sayan–Baikal Region publication-title: Journal of Volcanology and Seismology doi: 10.1134/S0742046321020032 – volume: 25 start-page: 20 issue: 2 year: 2024 ident: 2025020502580369800_r51 article-title: Review and prospects of applying modern approaches to comprehensive geodata analysis for predicting the spatial distribution of geological and geophysical parameters [in Russian] publication-title: Geophysical Research – ident: 2025020502580369800_r47 – ident: 2025020502580369800_r40 – ident: 2025020502580369800_r43 – ident: 2025020502580369800_r37 doi: 10.1088/1755-1315/48/1/012030 – start-page: 19 year: 2009 ident: 2025020502580369800_r4 article-title: ETOPO1 1 Arc-Minute Global Relief Model: Procedures, Data Sources and Analysis publication-title: NOAA Technical Memorandum NESDIS NGDC-24 – ident: 2025020502580369800_r12 – year: 2024 ident: 2025020502580369800_r34 article-title: Generalized dataset of geological and geophysical information of the eastern sector of the Russian Arctic, ver. 1.0 (2023) publication-title: Earth Science Database |
| SSID | ssj0027897 |
| Score | 2.3459322 |
| Snippet | The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning... |
| SourceID | crossref |
| SourceType | Index Database |
| StartPage | 210 |
| Title | GENERALIZED DATASET OF GEOLOGICAL AND GEOPHYSICAL INFORMATION ON THE EASTERN SECTOR OF THE RUSSIAN ARCTIC FOR MACHINE LEARNING-BASED ANALYSIS |
| Volume | 66 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) issn: 1068-7971 databaseCode: GBLVA dateStart: 20110101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0027897 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier ScienceDirect issn: 1068-7971 databaseCode: .~1 dateStart: 20070101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0027897 providerName: Elsevier |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb5tAEF65qSr1UvWppmmrPeSGoLBegzlubYJpbVIBrpJeLGCXKopEqtjuIYf-g_6P_szOsoDXUQ5pJYRg7GEt5tM81vNA6NjheclB-ZrUgRCFugDj3PPBkRv6JadlTnkl9yEXsTtb0k9no7PB4I-WtbTdFFZ5c2ddyf9IFWggV1kl-w-S7R8KBLgG-cIZJAzne8m4zTuLvgVTY8oylgaZzOIJg37EgWweBbdfZudpcx_FEPUt1KgdOGS-T8BS2RXXSINJdpp0WUDJEtQsiw2WTDIQFHAZCzaZRXFgzAOWxFEcmh9hwSkswebw9FR3c5PtuinO_C52LZ7gWm2j9F78_GIt6surn42espjV7_YAE5jrhs40-uftTS02a8XwVfsgvriULI0ffL61IkvfyyCjLv25V7-2OzY9Xw1l6fSzmsrS4pDsKVtbs9tE1S3fNgkQ4MrWFEkYwoKUeqq_537j7VsGsU9ThABJsq805gfoIQH7IYeEWL-cXWg_VmN8ut-vKkEl8weNWfN9NCcme4qetNEHZgpKz9BA1M_Ro1BJ6QX6rQEKt4DCpyd4BygMgMIaoLAGKAwHQAe3gMIKUJJfUltAYQUoDFy4BRTeBxTuAPUSLU-CbDIz27XNnDjuxnThPdgFROSccogCXPmXeOUPR7lnV8TNPVLRQnAiwGnPvaocF7QkpHJ5BU5pTgUdvkIH9VUtXiMsyjFQKoePaEErm_ilTxzu-LnMthC8PETH3Ztc_VBtWVZ3SOvN_b52hB7v0PgWHWyut-Id-Jqb4n0j5r_dombt |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GENERALIZED+DATASET+OF+GEOLOGICAL+AND+GEOPHYSICAL+INFORMATION+ON+THE+EASTERN+SECTOR+OF+THE+RUSSIAN+ARCTIC+FOR+MACHINE+LEARNING-BASED+ANALYSIS&rft.jtitle=Russian+geology+and+geophysics&rft.au=Lisenkov%2C+I.A.&rft.au=Soloviev%2C+A.A.&rft.au=Kuznetsov%2C+V.A.&rft.au=Nikolova%2C+Yu.I.&rft.date=2025-02-01&rft.issn=1068-7971&rft.volume=66&rft.issue=2&rft.spage=210&rft.epage=223&rft_id=info:doi/10.2113%2FRGG20244747&rft.externalDBID=n%2Fa&rft.externalDocID=10_2113_RGG20244747 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1068-7971&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1068-7971&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1068-7971&client=summon |