Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection

Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration. Credit card transactions are increasing faster because of the advancement in internet technology which leads to high dependence over internet....

Full description

Saved in:
Bibliographic Details
Published inInternational journal of information technology (Singapore. Online) Vol. 13; no. 4; pp. 1503 - 1511
Main Authors Itoo, Fayaz, Meenakshi, Singh, Satwinder
Format Journal Article
LanguageEnglish
Published Singapore Springer Singapore 01.08.2021
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2511-2104
2511-2112
DOI10.1007/s41870-020-00430-y

Cover

Abstract Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration. Credit card transactions are increasing faster because of the advancement in internet technology which leads to high dependence over internet. With the up-gradation of technology and increase in usage of credit cards, fraud rates become challenge for economy. With inclusion of new security features in credit card transactions the fraudsters are also developing new patterns or loopholes to chase the transactions. As a result of which behavior of frauds and normal transactions change constantly. Also the problem with the credit card data is that it is highly skewed which leads to inefficient prediction of fraudulent transactions. In order to achieve the better result, imbalanced or skewed data is pre-processed with the re-sampling (over-sampling or under sampling) technique for better results. The three different proportions of datasets were used in this study and random under-sampling technique was used for skewed dataset. This work uses the three machine learning algorithms namely: logistic regression, Naïve Bayes and K-nearest neighbour. The performance of these algorithms is recorded with their comparative analysis. The work is implemented in python and the performance of the algorithms is measured based on accuracy, sensitivity, specificity, precision, F-measure and area under curve. On the basis these measurements logistic regression based model for prediction of fraudulent was found to be a better in comparison to other prediction models developed from Naïve Bayes and K-nearest neighbour. Better results are also seen by applying under sampling techniques over the data before developing the prediction model.
AbstractList Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration. Credit card transactions are increasing faster because of the advancement in internet technology which leads to high dependence over internet. With the up-gradation of technology and increase in usage of credit cards, fraud rates become challenge for economy. With inclusion of new security features in credit card transactions the fraudsters are also developing new patterns or loopholes to chase the transactions. As a result of which behavior of frauds and normal transactions change constantly. Also the problem with the credit card data is that it is highly skewed which leads to inefficient prediction of fraudulent transactions. In order to achieve the better result, imbalanced or skewed data is pre-processed with the re-sampling (over-sampling or under sampling) technique for better results. The three different proportions of datasets were used in this study and random under-sampling technique was used for skewed dataset. This work uses the three machine learning algorithms namely: logistic regression, Naïve Bayes and K-nearest neighbour. The performance of these algorithms is recorded with their comparative analysis. The work is implemented in python and the performance of the algorithms is measured based on accuracy, sensitivity, specificity, precision, F-measure and area under curve. On the basis these measurements logistic regression based model for prediction of fraudulent was found to be a better in comparison to other prediction models developed from Naïve Bayes and K-nearest neighbour. Better results are also seen by applying under sampling techniques over the data before developing the prediction model.
Author Singh, Satwinder
Meenakshi
Itoo, Fayaz
Author_xml – sequence: 1
  givenname: Fayaz
  surname: Itoo
  fullname: Itoo, Fayaz
  organization: Central University of Punjab
– sequence: 2
  surname: Meenakshi
  fullname: Meenakshi
  organization: A.P. Department of Computer Science and Technology, Central University of Punjab
– sequence: 3
  givenname: Satwinder
  orcidid: 0000-0001-8689-9878
  surname: Singh
  fullname: Singh, Satwinder
  email: satwindercse@gmail.com
  organization: A.P. Department of Computer Science and Technology, Central University of Punjab
BookMark eNp9kM1OHDEMx6MKpNKFF-AUiSvT5mMyH0e6KrQCLRc4R96MZ0g1m2ydWaS58UY8RF-sga2o1AMHy5bsn-3__xM7CDEgY6dSfJZC1F9SKZtaFELlEKUWxfyBHSkjZaGkVAdvtSg_spOU_FpoqSptannEnpZxswXyKQYOocsB45x84rHnYxx8mrzjhANhBmM45yv4_fyI_CvMmF6J69WKb8A9-IB8RKDgw8BhHCL56WGTeB-JO8LOT9wBdbwn2HW8wwndlDces8MexoQnf_OC3V9-u1t-L25ur34sL24Kp3Q5Fw7cWlVV3VZYr6u1Ame0MM6glL0ujan6GtvGldK0WOtWNlpAi6JpIffbrtQLdrbfu6X4a4dpsj_jjrLaZJUxRqpaGJWn1H7KUUyJsLdb8hug2UphX8y2e7NtNtu-mm3nDDX_Qc5P8CJuIvDj-6jeoynfCQPSv6_eof4AFOSX4g
CitedBy_id crossref_primary_10_1016_j_bar_2024_101441
crossref_primary_10_3233_JIFS_212873
crossref_primary_10_1016_j_procs_2023_01_231
crossref_primary_10_1007_s40031_022_00836_1
crossref_primary_10_1007_s41870_024_02224_y
crossref_primary_10_1007_s41870_025_02469_1
crossref_primary_10_1109_ACCESS_2022_3232287
crossref_primary_10_1109_ACCESS_2023_3321666
crossref_primary_10_1109_ACCESS_2024_3487298
crossref_primary_10_7717_peerj_cs_1278
crossref_primary_10_32604_jcs_2023_045422
crossref_primary_10_2139_ssrn_4185448
crossref_primary_10_1186_s12889_024_19083_8
crossref_primary_10_3233_JIFS_236392
crossref_primary_10_1016_j_procs_2024_03_014
crossref_primary_10_31127_tuje_1180931
crossref_primary_10_1007_s41870_024_01797_y
crossref_primary_10_1016_j_jksuci_2022_11_008
crossref_primary_10_1155_2022_8783783
crossref_primary_10_35414_akufemubid_1066453
crossref_primary_10_1051_itmconf_20246503006
crossref_primary_10_1007_s41870_024_01821_1
crossref_primary_10_1007_s41870_023_01709_6
crossref_primary_10_1186_s40537_024_01048_8
crossref_primary_10_21595_chs_2023_23193
crossref_primary_10_1007_s41870_024_02318_7
crossref_primary_10_3390_math9090916
crossref_primary_10_1007_s44230_022_00004_0
crossref_primary_10_1109_ACCESS_2022_3193935
crossref_primary_10_1002_spy2_500
crossref_primary_10_1109_ACCESS_2023_3262020
crossref_primary_10_1007_s41870_022_00987_w
crossref_primary_10_1007_s41870_024_02122_3
crossref_primary_10_1007_s41870_024_02350_7
crossref_primary_10_1007_s41870_024_02087_3
crossref_primary_10_3846_bmee_2025_20808
crossref_primary_10_14710_jtsiskom_2020_13734
crossref_primary_10_18185_erzifbed_954466
crossref_primary_10_47164_ijngc_v13i3_820
crossref_primary_10_1109_ACCESS_2024_3430109
crossref_primary_10_1007_s41870_023_01397_2
crossref_primary_10_1007_s11042_024_19508_x
crossref_primary_10_1109_ACCESS_2022_3190897
crossref_primary_10_48084_etasr_5950
crossref_primary_10_1016_j_ipm_2024_103881
crossref_primary_10_1109_ACCESS_2024_3440637
crossref_primary_10_1007_s11042_023_14365_6
crossref_primary_10_1007_s11760_023_02854_y
crossref_primary_10_1007_s41870_022_01151_0
crossref_primary_10_1007_s42979_023_02559_6
crossref_primary_10_1016_j_bar_2025_101560
crossref_primary_10_1186_s44147_025_00586_z
crossref_primary_10_3389_ffutr_2023_1070533
crossref_primary_10_3233_IDA_216460
Cites_doi 10.1109/TDSC.2009.11
10.1109/ICNSC.2018.8361343
10.1109/ICCNI.2017.8123782
10.1109/SmartWorld.2018.00051
10.1109/ICSPIS.2016.7869880
10.1109/AEEICB.2017.7972424
10.1109/SCEECS.2018.8546939
ContentType Journal Article
Copyright Bharati Vidyapeeth's Institute of Computer Applications and Management 2020
Bharati Vidyapeeth's Institute of Computer Applications and Management 2020.
Copyright_xml – notice: Bharati Vidyapeeth's Institute of Computer Applications and Management 2020
– notice: Bharati Vidyapeeth's Institute of Computer Applications and Management 2020.
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1007/s41870-020-00430-y
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2511-2112
EndPage 1511
ExternalDocumentID 10_1007_s41870_020_00430_y
GroupedDBID -EM
0R~
406
AACDK
AAHNG
AAIAL
AAJBT
AANZL
AASML
AATNV
AATVU
AAUYE
ABAKF
ABDZT
ABECU
ABFTV
ABJNI
ABJOX
ABKCH
ABMQK
ABQBU
ABTEG
ABTKH
ABTMW
ABXPI
ACAOD
ACDTI
ACGFS
ACHSB
ACMLO
ACOKC
ACPIV
ACZOJ
ADHHG
ADKNI
ADKPE
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFQL
AEJRE
AEMSY
AEOHA
AESKC
AEVLU
AEXYK
AFBBN
AFQWF
AGDGC
AGMZJ
AGQEE
AGRTI
AHSBF
AIAKS
AIGIU
AILAN
AITGF
AJRNO
AJZVZ
ALFXC
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMXSW
AMYLF
AMYQR
AXYYD
BGNMA
CSCUP
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
FERAY
FIGPU
FINBP
FNLPD
FSGXE
GGCAI
GJIRD
IKXTQ
IWAJR
J-C
JZLTJ
KOV
LLZTM
M4Y
NPVJJ
NQJWS
NU0
O9J
PT4
RLLFE
ROL
RSV
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
TSG
UOJIU
UTJUX
UZXMN
VFIZW
Z7Z
Z81
Z83
Z88
ZMTXR
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
AEZWR
AFDZB
AFHIU
AFKRA
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
BGLVJ
CCPQU
CITATION
K7-
PHGZM
PHGZT
PQGLB
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c234y-cacb266796e7b6b2ac5305c5e11f34556f7e98c4159e7391830a9e089a1f39d43
ISSN 2511-2104
IngestDate Wed Sep 17 23:57:27 EDT 2025
Wed Oct 01 02:38:18 EDT 2025
Thu Apr 24 22:58:18 EDT 2025
Fri Feb 21 02:48:07 EST 2025
IsPeerReviewed false
IsScholarly true
Issue 4
Keywords Random under-sampling
Logistic regression
Naïve Bayes
KNN
Credit card fraud
Fraud detection
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c234y-cacb266796e7b6b2ac5305c5e11f34556f7e98c4159e7391830a9e089a1f39d43
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-8689-9878
PQID 2555127052
PQPubID 2034493
PageCount 9
ParticipantIDs proquest_journals_2555127052
crossref_primary_10_1007_s41870_020_00430_y
crossref_citationtrail_10_1007_s41870_020_00430_y
springer_journals_10_1007_s41870_020_00430_y
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20210800
PublicationDateYYYYMMDD 2021-08-01
PublicationDate_xml – month: 8
  year: 2021
  text: 20210800
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
– name: Heidelberg
PublicationSubtitle An Official Journal of Bharati Vidyapeeth's Institute of Computer Applications and Management
PublicationTitle International journal of information technology (Singapore. Online)
PublicationTitleAbbrev Int. j. inf. tecnol
PublicationYear 2021
Publisher Springer Singapore
Springer Nature B.V
Publisher_xml – name: Springer Singapore
– name: Springer Nature B.V
References CR2
CR4
CR3
Banerjee, Bourla, Chen, Kashyap, Purohit, Battipaglia (CR10) 2018
CR6
Hordri, Yuhaniz, Firdaus, Azmi, Shamsuddin (CR12) 2018; 9
CR5
CR7
Kundu, Panigrahi, Sural, Majumdar (CR1) 2009; 6
Padvekar, Kangane, Jadhav (CR8) 2016; 5
Khare, Sait (CR9) 2018; 118
CR13
CR11
Hordri, Yuhaniz, Azmi, Shamsuddin (CR14) 2018; 9
430_CR11
430_CR7
430_CR6
430_CR13
430_CR5
NF Hordri (430_CR14) 2018; 9
430_CR4
430_CR3
430_CR2
NF Hordri (430_CR12) 2018; 9
SA Padvekar (430_CR8) 2016; 5
R Banerjee (430_CR10) 2018
A Kundu (430_CR1) 2009; 6
N Khare (430_CR9) 2018; 118
References_xml – start-page: 1
  year: 2018
  end-page: 10
  ident: CR10
  publication-title: Comparative analysis of machine learning algorithms through credit card fraud detection
– volume: 9
  start-page: 390
  issue: 11
  year: 2018
  end-page: 396
  ident: CR14
  article-title: Handling class imbalance in credit card fraud using resampling methods
  publication-title: Int J Adv Comput Sci Appl
– volume: 5
  start-page: 16183
  issue: 4
  year: 2016
  end-page: 16186
  ident: CR8
  article-title: Credit card fraud detection system
  publication-title: Int J Eng Comput Sci
– volume: 6
  start-page: 309
  issue: 4
  year: 2009
  end-page: 315
  ident: CR1
  article-title: BLAST-SSAHA hybridization for credit card fraud detection
  publication-title: IEEE Trans Dependable Secure Comput
  doi: 10.1109/TDSC.2009.11
– ident: CR3
– ident: CR4
– ident: CR2
– ident: CR13
– ident: CR11
– volume: 9
  start-page: 390
  issue: 11
  year: 2018
  end-page: 396
  ident: CR12
  article-title: Handling class imbalance in credit card fraud using resampling methods
  publication-title: Int J Adv Comput Sci Appl
– ident: CR6
– ident: CR5
– ident: CR7
– volume: 118
  start-page: 825
  issue: 20
  year: 2018
  end-page: 838
  ident: CR9
  article-title: Credit card fraud detection using machine learning models and collating machine learning models
  publication-title: Int J Pure Appl Math
– ident: 430_CR11
  doi: 10.1109/ICNSC.2018.8361343
– volume: 9
  start-page: 390
  issue: 11
  year: 2018
  ident: 430_CR12
  publication-title: Int J Adv Comput Sci Appl
– ident: 430_CR2
– volume: 5
  start-page: 16183
  issue: 4
  year: 2016
  ident: 430_CR8
  publication-title: Int J Eng Comput Sci
– volume: 6
  start-page: 309
  issue: 4
  year: 2009
  ident: 430_CR1
  publication-title: IEEE Trans Dependable Secure Comput
  doi: 10.1109/TDSC.2009.11
– ident: 430_CR13
  doi: 10.1109/ICCNI.2017.8123782
– volume: 9
  start-page: 390
  issue: 11
  year: 2018
  ident: 430_CR14
  publication-title: Int J Adv Comput Sci Appl
– ident: 430_CR6
  doi: 10.1109/SmartWorld.2018.00051
– ident: 430_CR3
  doi: 10.1109/ICSPIS.2016.7869880
– start-page: 1
  volume-title: Comparative analysis of machine learning algorithms through credit card fraud detection
  year: 2018
  ident: 430_CR10
– ident: 430_CR5
– ident: 430_CR7
  doi: 10.1109/AEEICB.2017.7972424
– volume: 118
  start-page: 825
  issue: 20
  year: 2018
  ident: 430_CR9
  publication-title: Int J Pure Appl Math
– ident: 430_CR4
  doi: 10.1109/SCEECS.2018.8546939
SSID ssib031263571
ssj0002710285
Score 2.5819998
Snippet Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration....
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1503
SubjectTerms Algorithms
Artificial Intelligence
Computer Imaging
Computer Science
Credit cards
Datasets
Fraud
Image Processing and Computer Vision
Impact analysis
Internet
Machine Learning
Original Research
Pattern Recognition and Graphics
Prediction models
Regression models
Sampling methods
Software Engineering
Vision
Title Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection
URI https://link.springer.com/article/10.1007/s41870-020-00430-y
https://www.proquest.com/docview/2555127052
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 2511-2112
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002710285
  issn: 2511-2104
  databaseCode: AFBBN
  dateStart: 20170301
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NbtQwELaW7QUO_BQQCwX5wM1NFSfO37FbsapA7IVW6i2yE2d3admtdrNF6Yk34iF4AR6JsZ04CbSIcokix_FuMl884_HMNwi9Fa4QhZ9TJ-KcOyyhwhHcE3DGcppLWrhCJQp_nIbHp-z9WXA2GPzsRC1tS3GQXd-YV_I_UoU2kKvKkr2DZO2g0ADnIF84goTh-E8yPmqLCBrO1ZZgxGT2aH7mmQl1NYlVXO-MT64kGfNKGoLmD9Mp-aKDKmVTRWJG-MVstV6Uc0PYQBSx6KJURNY5KdZ8m5NcljqMa9m1b_sOxh4thU2TJKV15yv79pMqyn2pyDRJl_fUALZcaVfuhFf82mIDlt78fDNfWO8QjDA37u3yq6J_XHd9GR61kXR9Xyaxv9zOhmop5Hh1reID2W2j_enc78CWdeZmMH39jp4HU4feqENM2MiG0VjV5PFU4j3zXadqNaaNY7Scz7pzCp1T3Tmt7qEdD_SMO0Q7h5PxeNrMcT5V7D-1Cf5Zb_wqE09F2tonrHO7dIbnH_-ibz-1i6Lf9vG1eXTyGD2s1zX40ID0CRrI5S561NQMwbUK2UUPOgSYT9G3FsEYkIgbBONVgRsE4xbB-3jKf3y_klhjV98B2MU1dnGDXdxiFwPqsMEuVtjFGrvYYvcZOp28Ozk6duqaIE7m-axyMp4JsCmjJJSRCIXHswA0VhZISgufBUFYRDKJMzBLExn5CSgslyfSjRMO15Oc-c_RcLlayhcIF27Ik1hQ6TLB_CjnXhYXcU5lHOcR43yEaPOm06wmzFd1Wy7S28U-QsTec2noYv7ae68RYFp_kZsU1viBCgcJvBHab4TaXr59tJd36_4K3W8_wT00LNdb-RoM7FK8qTH7CwN50CM
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparison+and+analysis+of+logistic+regression%2C+Na%C3%AFve+Bayes+and+KNN+machine+learning+algorithms+for+credit+card+fraud+detection&rft.jtitle=International+journal+of+information+technology+%28Singapore.+Online%29&rft.au=Itoo%2C+Fayaz&rft.au=Meenakshi&rft.au=Singh%2C+Satwinder&rft.date=2021-08-01&rft.pub=Springer+Singapore&rft.issn=2511-2104&rft.eissn=2511-2112&rft.volume=13&rft.issue=4&rft.spage=1503&rft.epage=1511&rft_id=info:doi/10.1007%2Fs41870-020-00430-y&rft.externalDocID=10_1007_s41870_020_00430_y
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2511-2104&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2511-2104&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2511-2104&client=summon