Choosing the best data mining algorithm in two different aquatic systems data mining in aquatic systems

It is crucial to have reasonable control over the production, transmission, storage, and distribution of water, through reliable hardware and software. Cyber sensors can extract water quality information. For obtaining awareness of water status, the collected data needs to be mined. Outlier mining i...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of environmental science and technology (Tehran) Vol. 19; no. 9; pp. 8783 - 8796
Main Authors Ghaemi, Elham, Tabesh, Massoud, Krampe, Joerg, Nazif, Sara
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2022
Subjects
Online AccessGet full text
ISSN1735-1472
1735-2630
DOI10.1007/s13762-022-04098-8

Cover

More Information
Summary:It is crucial to have reasonable control over the production, transmission, storage, and distribution of water, through reliable hardware and software. Cyber sensors can extract water quality information. For obtaining awareness of water status, the collected data needs to be mined. Outlier mining is responsible for finding patterns that are different from others in large datasets. In the water quality dataset, anomalies may be caused by technical problems or actual events. It is essential to use reliable data and eliminate technical outliers for the proper identification of actual events. The majority of studies in the water field are focused on identifying real events. Still, few studies have focused on pre-processing water quality datasets and identifying anomalies caused by technical errors that are not directly related to water pollution. The utilization of four approaches for monitoring water quality indicators in two water systems, a river, and a water distribution network, is presented in this paper. The applied methods for identifying outliers include K-Nearest Neighbor (KNN), Local Outlier Factor (LOF), Isolation Forest (iForest), and Anomaly Detection (an R package). The findings demonstrate that the KNN algorithm is suitable for finding outliers in datasets with global anomalies. The LOF does not perform adequately in finding local outliers. In contrast, the iForest method could detect local outliers properly. Anomaly Detection package is a good choice for identifying anomalies in datasets where the outliers are not merely local or global.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1735-1472
1735-2630
DOI:10.1007/s13762-022-04098-8