In silico machine learning–enabled detection of polycyclic aromatic hydrocarbons from contaminated soil

SignificanceSoil contamination by environmental pollutants, particularly polycyclic aromatic hydrocarbons (PAHs), can significantly affect human health due to their carcinogenic and mutagenic properties. In this work, we present an approach that integrates theoretical spectral calculations with mach...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the National Academy of Sciences - PNAS Vol. 122; no. 19; p. e2427069122
Main Authors Ju, Yilong, Neumann, Oara, Denison, Sara B., Jin, Peixuan, Sanchez-Alvarado, Andres B., Nordlander, Peter, Senftle, Thomas P., Alvarez, Pedro J. J., Patel, Ankit, Halas, Naomi J.
Format Journal Article
LanguageEnglish
Published United States National Academy of Sciences 13.05.2025
Subjects
Online AccessGet full text
ISSN0027-8424
1091-6490
1091-6490
DOI10.1073/pnas.2427069122

Cover

More Information
Summary:SignificanceSoil contamination by environmental pollutants, particularly polycyclic aromatic hydrocarbons (PAHs), can significantly affect human health due to their carcinogenic and mutagenic properties. In this work, we present an approach that integrates theoretical spectral calculations with machine learning to identify PAHs, an approach that can be extended straightforwardly to the thousands of lesser-known and virtually unstudied environmental pollutants that also pose public health risks. By extracting characteristic spectral features and training a detection model to differentiate between contaminated and as-collected/reference soil samples, this work offers a scalable solution to address widespread environmental health issues. The detection and identification of polycyclic aromatic hydrocarbons (PAHs) and their modified derivatives in contaminated soil is challenging due to the chemical and microbial complexity of soil organic matter. To address these challenges, we developed an innovative analytical approach that combines Surface-enhanced Raman spectroscopy with a Raman spectral library constructed in silico using density functional theory (DFT)-calculated spectra. This method overcomes several limitations associated with traditional experimental libraries, including spectral background interference, solvent effects, and commercially unavailable or challenging to synthesize compounds. Our methodology employs a physics-informed machine learning pipeline that operates in two stages: the characteristic peak extraction (CaPE) algorithm, which isolates distinctive spectral features, and the characteristic peak similarity (CaPSim) algorithm, which identifies analytes with high robustness to spectral shifts and amplitude variations. Validation of this approach showed strong similarity values (>0.6) between DFT-calculated and experimental Surface-enhanced Raman spectra for multiple PAHs, confirming its accuracy and discriminative capability. This study establishes the viability of DFT-calculated spectra as reliable references for identifying analytes that lack experimental reference spectra, including those formed through environmental modification of PAHs. This advancement addresses a critical gap in environmental monitoring, providing a valuable tool for assessing public health risks associated with these contaminants.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0027-8424
1091-6490
1091-6490
DOI:10.1073/pnas.2427069122