Improving the classification of phishing websites using a hybrid algorithm

In this article, a hybrid algorithm has been proposed for the identification of phishing and legitimate websites. The dataset may have an imbalanced class distribution and may consist of irrelevant features. Therefore, in the data preprocessing, the adaptive synthetic sampling approach has been used...

Full description

Saved in:

Bibliographic Details
Published in	Computational intelligence Vol. 38; no. 2; pp. 667 - 689
Main Authors	Sharma, Suvita Rani, Singh, Birmohan, Kaur, Manpreet
Format	Journal Article
Language	English
Published	Hoboken Blackwell Publishing Ltd 01.04.2022
Subjects	Adaptive sampling Algorithms Classification Cybercrime data balancing Datasets features selection Heuristic methods Phishing phishing websites Rao algorithms Transfer functions Websites
Online Access	Get full text
ISSN	0824-7935 1467-8640
DOI	10.1111/coin.12494

Cover

More Information
Summary:	In this article, a hybrid algorithm has been proposed for the identification of phishing and legitimate websites. The dataset may have an imbalanced class distribution and may consist of irrelevant features. Therefore, in the data preprocessing, the adaptive synthetic sampling approach has been used to handle the imbalanced data. Irrelevant or redundant features are removed from the balanced data using the proposed binary version of Rao algorithms. The S‐shaped and V‐shaped transfer functions are applied for mapping continuous search space to discrete search space. Also, the results of these S‐shaped and V‐shaped transfer functions are analyzed for proposed algorithms. The performance is improved by optimizing the value of the k parameter in the kNN classifier. The dataset used in this article has been taken from the UCI machine‐learning repository. The performance of the proposed approach has been evaluated using the polygon area metric. The obtained classification accuracy is 97.044%. A comparison of the proposed hybrid algorithm with the other state‐of‐the‐art techniques is also made for validation. Moreover, the proposed approach has been compared with seven metaheuristic feature selection algorithms and six filter methods for performance analysis. Additionally, we have applied the proposed approach to URLs that are registered on the PhishTank website.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0824-7935 1467-8640
DOI:	10.1111/coin.12494