Choosing the best data mining algorithm in two different aquatic systems data mining in aquatic systems

It is crucial to have reasonable control over the production, transmission, storage, and distribution of water, through reliable hardware and software. Cyber sensors can extract water quality information. For obtaining awareness of water status, the collected data needs to be mined. Outlier mining i...

Full description

Saved in:

Bibliographic Details
Published in	International journal of environmental science and technology (Tehran) Vol. 19; no. 9; pp. 8783 - 8796
Main Authors	Ghaemi, Elham, Tabesh, Massoud, Krampe, Joerg, Nazif, Sara
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2022
Subjects	algorithms Aquatic Pollution computer software data collection Earth and Environmental Science Ecotoxicology Environment Environmental Chemistry Environmental Science and Engineering forests Original Paper rivers Soil Science & Conservation Waste Water Technology water distribution Water Management water pollution Water Pollution Control water quality Isolation forest Local outlier factor Anomaly detection K-nearest neighbor Water quality
Online Access	Get full text
ISSN	1735-1472 1735-2630
DOI	10.1007/s13762-022-04098-8

Cover

More Information
Summary:	It is crucial to have reasonable control over the production, transmission, storage, and distribution of water, through reliable hardware and software. Cyber sensors can extract water quality information. For obtaining awareness of water status, the collected data needs to be mined. Outlier mining is responsible for finding patterns that are different from others in large datasets. In the water quality dataset, anomalies may be caused by technical problems or actual events. It is essential to use reliable data and eliminate technical outliers for the proper identification of actual events. The majority of studies in the water field are focused on identifying real events. Still, few studies have focused on pre-processing water quality datasets and identifying anomalies caused by technical errors that are not directly related to water pollution. The utilization of four approaches for monitoring water quality indicators in two water systems, a river, and a water distribution network, is presented in this paper. The applied methods for identifying outliers include K-Nearest Neighbor (KNN), Local Outlier Factor (LOF), Isolation Forest (iForest), and Anomaly Detection (an R package). The findings demonstrate that the KNN algorithm is suitable for finding outliers in datasets with global anomalies. The LOF does not perform adequately in finding local outliers. In contrast, the iForest method could detect local outliers properly. Anomaly Detection package is a good choice for identifying anomalies in datasets where the outliers are not merely local or global.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1735-1472 1735-2630
DOI:	10.1007/s13762-022-04098-8