Choosing the best data mining algorithm in two different aquatic systems data mining in aquatic systems
It is crucial to have reasonable control over the production, transmission, storage, and distribution of water, through reliable hardware and software. Cyber sensors can extract water quality information. For obtaining awareness of water status, the collected data needs to be mined. Outlier mining i...
Saved in:
| Published in | International journal of environmental science and technology (Tehran) Vol. 19; no. 9; pp. 8783 - 8796 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.09.2022
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1735-1472 1735-2630 |
| DOI | 10.1007/s13762-022-04098-8 |
Cover
| Summary: | It is crucial to have reasonable control over the production, transmission, storage, and distribution of water, through reliable hardware and software. Cyber sensors can extract water quality information. For obtaining awareness of water status, the collected data needs to be mined. Outlier mining is responsible for finding patterns that are different from others in large datasets. In the water quality dataset, anomalies may be caused by technical problems or actual events. It is essential to use reliable data and eliminate technical outliers for the proper identification of actual events. The majority of studies in the water field are focused on identifying real events. Still, few studies have focused on pre-processing water quality datasets and identifying anomalies caused by technical errors that are not directly related to water pollution. The utilization of four approaches for monitoring water quality indicators in two water systems, a river, and a water distribution network, is presented in this paper. The applied methods for identifying outliers include K-Nearest Neighbor (KNN), Local Outlier Factor (LOF), Isolation Forest (iForest), and Anomaly Detection (an R package). The findings demonstrate that the KNN algorithm is suitable for finding outliers in datasets with global anomalies. The LOF does not perform adequately in finding local outliers. In contrast, the iForest method could detect local outliers properly. Anomaly Detection package is a good choice for identifying anomalies in datasets where the outliers are not merely local or global. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1735-1472 1735-2630 |
| DOI: | 10.1007/s13762-022-04098-8 |