Comparison of Classification Algorithms for Detection of Phishing Websites

Phishing activities remain a persistent security threat, with global losses exceeding 2.7 billion USD in 2018, according to the FBI’s Internet Crime Complaint Center. In literature, different generations of phishing websites detection methods have been observed. The oldest methods include manual bla...

Full description

Saved in:

Bibliographic Details
Published in	Informatica (Vilnius, Lithuania) Vol. 31; no. 1; pp. 143 - 160
Main Authors	Vaitkevicius, Paulius, Marcinkevicius, Virginijus
Format	Journal Article
Language	English
Published	London, England SAGE Publications 01.01.2020 IOS Press BV
Subjects	Algorithms Classification Crime Cybercrime Datasets Feature extraction Machine learning Neural networks Phishing Websites classification algorithms phishing detection phishing datasets
Online Access	Get full text
ISSN	0868-4952 1822-8844 1822-8844
DOI	10.15388/20-INFOR404

Cover

More Information
Summary:	Phishing activities remain a persistent security threat, with global losses exceeding 2.7 billion USD in 2018, according to the FBI’s Internet Crime Complaint Center. In literature, different generations of phishing websites detection methods have been observed. The oldest methods include manual blacklisting of known phishing websites’ URLs in the centralized database, but they have not been able to detect newly launched phishing websites. More recent studies have attempted to solve phishing websites detection as a supervised machine learning problem on phishing datasets, designed on features extracted from phishing websites’ URLs. These studies have shown some classification algorithms performing better than others on differently designed datasets but have not distinguished the best classification algorithm for the phishing websites detection problem in general. The purpose of this research is to compare classic supervised machine learning algorithms on all publicly available phishing datasets with predefined features and to distinguish the best performing algorithm for solving the problem of phishing websites detection, regardless of a specific dataset design. Eight widely used classification algorithms were configured in Python using the Scikit Learn library and tested for classification accuracy on all publicly available phishing datasets. Later, classification algorithms were ranked by accuracy on different datasets using three different ranking techniques while testing the results for a statistically significant difference using Welch’s T-Test. The comparison results are presented in this paper, showing ensembles and neural networks outperforming other classical algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0868-4952 1822-8844 1822-8844
DOI:	10.15388/20-INFOR404