Improving the classification of phishing websites using a hybrid algorithm

In this article, a hybrid algorithm has been proposed for the identification of phishing and legitimate websites. The dataset may have an imbalanced class distribution and may consist of irrelevant features. Therefore, in the data preprocessing, the adaptive synthetic sampling approach has been used...

Full description

Saved in:
Bibliographic Details
Published inComputational intelligence Vol. 38; no. 2; pp. 667 - 689
Main Authors Sharma, Suvita Rani, Singh, Birmohan, Kaur, Manpreet
Format Journal Article
LanguageEnglish
Published Hoboken Blackwell Publishing Ltd 01.04.2022
Subjects
Online AccessGet full text
ISSN0824-7935
1467-8640
DOI10.1111/coin.12494

Cover

More Information
Summary:In this article, a hybrid algorithm has been proposed for the identification of phishing and legitimate websites. The dataset may have an imbalanced class distribution and may consist of irrelevant features. Therefore, in the data preprocessing, the adaptive synthetic sampling approach has been used to handle the imbalanced data. Irrelevant or redundant features are removed from the balanced data using the proposed binary version of Rao algorithms. The S‐shaped and V‐shaped transfer functions are applied for mapping continuous search space to discrete search space. Also, the results of these S‐shaped and V‐shaped transfer functions are analyzed for proposed algorithms. The performance is improved by optimizing the value of the k parameter in the kNN classifier. The dataset used in this article has been taken from the UCI machine‐learning repository. The performance of the proposed approach has been evaluated using the polygon area metric. The obtained classification accuracy is 97.044%. A comparison of the proposed hybrid algorithm with the other state‐of‐the‐art techniques is also made for validation. Moreover, the proposed approach has been compared with seven metaheuristic feature selection algorithms and six filter methods for performance analysis. Additionally, we have applied the proposed approach to URLs that are registered on the PhishTank website.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0824-7935
1467-8640
DOI:10.1111/coin.12494