Improving the classification of phishing websites using a hybrid algorithm
In this article, a hybrid algorithm has been proposed for the identification of phishing and legitimate websites. The dataset may have an imbalanced class distribution and may consist of irrelevant features. Therefore, in the data preprocessing, the adaptive synthetic sampling approach has been used...
Saved in:
Published in | Computational intelligence Vol. 38; no. 2; pp. 667 - 689 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Hoboken
Blackwell Publishing Ltd
01.04.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 0824-7935 1467-8640 |
DOI | 10.1111/coin.12494 |
Cover
Summary: | In this article, a hybrid algorithm has been proposed for the identification of phishing and legitimate websites. The dataset may have an imbalanced class distribution and may consist of irrelevant features. Therefore, in the data preprocessing, the adaptive synthetic sampling approach has been used to handle the imbalanced data. Irrelevant or redundant features are removed from the balanced data using the proposed binary version of Rao algorithms. The S‐shaped and V‐shaped transfer functions are applied for mapping continuous search space to discrete search space. Also, the results of these S‐shaped and V‐shaped transfer functions are analyzed for proposed algorithms. The performance is improved by optimizing the value of the k parameter in the kNN classifier. The dataset used in this article has been taken from the UCI machine‐learning repository. The performance of the proposed approach has been evaluated using the polygon area metric. The obtained classification accuracy is 97.044%. A comparison of the proposed hybrid algorithm with the other state‐of‐the‐art techniques is also made for validation. Moreover, the proposed approach has been compared with seven metaheuristic feature selection algorithms and six filter methods for performance analysis. Additionally, we have applied the proposed approach to URLs that are registered on the PhishTank website. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0824-7935 1467-8640 |
DOI: | 10.1111/coin.12494 |