Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection
Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration. Credit card transactions are increasing faster because of the advancement in internet technology which leads to high dependence over internet....
        Saved in:
      
    
          | Published in | International journal of information technology (Singapore. Online) Vol. 13; no. 4; pp. 1503 - 1511 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Singapore
          Springer Singapore
    
        01.08.2021
     Springer Nature B.V  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2511-2104 2511-2112  | 
| DOI | 10.1007/s41870-020-00430-y | 
Cover
| Abstract | Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration. Credit card transactions are increasing faster because of the advancement in internet technology which leads to high dependence over internet. With the up-gradation of technology and increase in usage of credit cards, fraud rates become challenge for economy. With inclusion of new security features in credit card transactions the fraudsters are also developing new patterns or loopholes to chase the transactions. As a result of which behavior of frauds and normal transactions change constantly. Also the problem with the credit card data is that it is highly skewed which leads to inefficient prediction of fraudulent transactions. In order to achieve the better result, imbalanced or skewed data is pre-processed with the re-sampling (over-sampling or under sampling) technique for better results. The three different proportions of datasets were used in this study and random under-sampling technique was used for skewed dataset. This work uses the three machine learning algorithms namely: logistic regression, Naïve Bayes and K-nearest neighbour. The performance of these algorithms is recorded with their comparative analysis. The work is implemented in python and the performance of the algorithms is measured based on accuracy, sensitivity, specificity, precision, F-measure and area under curve. On the basis these measurements logistic regression based model for prediction of fraudulent was found to be a better in comparison to other prediction models developed from Naïve Bayes and K-nearest neighbour. Better results are also seen by applying under sampling techniques over the data before developing the prediction model. | 
    
|---|---|
| AbstractList | Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration. Credit card transactions are increasing faster because of the advancement in internet technology which leads to high dependence over internet. With the up-gradation of technology and increase in usage of credit cards, fraud rates become challenge for economy. With inclusion of new security features in credit card transactions the fraudsters are also developing new patterns or loopholes to chase the transactions. As a result of which behavior of frauds and normal transactions change constantly. Also the problem with the credit card data is that it is highly skewed which leads to inefficient prediction of fraudulent transactions. In order to achieve the better result, imbalanced or skewed data is pre-processed with the re-sampling (over-sampling or under sampling) technique for better results. The three different proportions of datasets were used in this study and random under-sampling technique was used for skewed dataset. This work uses the three machine learning algorithms namely: logistic regression, Naïve Bayes and K-nearest neighbour. The performance of these algorithms is recorded with their comparative analysis. The work is implemented in python and the performance of the algorithms is measured based on accuracy, sensitivity, specificity, precision, F-measure and area under curve. On the basis these measurements logistic regression based model for prediction of fraudulent was found to be a better in comparison to other prediction models developed from Naïve Bayes and K-nearest neighbour. Better results are also seen by applying under sampling techniques over the data before developing the prediction model. | 
    
| Author | Singh, Satwinder Meenakshi Itoo, Fayaz  | 
    
| Author_xml | – sequence: 1 givenname: Fayaz surname: Itoo fullname: Itoo, Fayaz organization: Central University of Punjab – sequence: 2 surname: Meenakshi fullname: Meenakshi organization: A.P. Department of Computer Science and Technology, Central University of Punjab – sequence: 3 givenname: Satwinder orcidid: 0000-0001-8689-9878 surname: Singh fullname: Singh, Satwinder email: satwindercse@gmail.com organization: A.P. Department of Computer Science and Technology, Central University of Punjab  | 
    
| BookMark | eNp9kM1OHDEMx6MKpNKFF-AUiSvT5mMyH0e6KrQCLRc4R96MZ0g1m2ydWaS58UY8RF-sga2o1AMHy5bsn-3__xM7CDEgY6dSfJZC1F9SKZtaFELlEKUWxfyBHSkjZaGkVAdvtSg_spOU_FpoqSptannEnpZxswXyKQYOocsB45x84rHnYxx8mrzjhANhBmM45yv4_fyI_CvMmF6J69WKb8A9-IB8RKDgw8BhHCL56WGTeB-JO8LOT9wBdbwn2HW8wwndlDces8MexoQnf_OC3V9-u1t-L25ur34sL24Kp3Q5Fw7cWlVV3VZYr6u1Ame0MM6glL0ujan6GtvGldK0WOtWNlpAi6JpIffbrtQLdrbfu6X4a4dpsj_jjrLaZJUxRqpaGJWn1H7KUUyJsLdb8hug2UphX8y2e7NtNtu-mm3nDDX_Qc5P8CJuIvDj-6jeoynfCQPSv6_eof4AFOSX4g | 
    
| CitedBy_id | crossref_primary_10_1016_j_bar_2024_101441 crossref_primary_10_3233_JIFS_212873 crossref_primary_10_1016_j_procs_2023_01_231 crossref_primary_10_1007_s40031_022_00836_1 crossref_primary_10_1007_s41870_024_02224_y crossref_primary_10_1007_s41870_025_02469_1 crossref_primary_10_1109_ACCESS_2022_3232287 crossref_primary_10_1109_ACCESS_2023_3321666 crossref_primary_10_1109_ACCESS_2024_3487298 crossref_primary_10_7717_peerj_cs_1278 crossref_primary_10_32604_jcs_2023_045422 crossref_primary_10_2139_ssrn_4185448 crossref_primary_10_1186_s12889_024_19083_8 crossref_primary_10_3233_JIFS_236392 crossref_primary_10_1016_j_procs_2024_03_014 crossref_primary_10_31127_tuje_1180931 crossref_primary_10_1007_s41870_024_01797_y crossref_primary_10_1016_j_jksuci_2022_11_008 crossref_primary_10_1155_2022_8783783 crossref_primary_10_35414_akufemubid_1066453 crossref_primary_10_1051_itmconf_20246503006 crossref_primary_10_1007_s41870_024_01821_1 crossref_primary_10_1007_s41870_023_01709_6 crossref_primary_10_1186_s40537_024_01048_8 crossref_primary_10_21595_chs_2023_23193 crossref_primary_10_1007_s41870_024_02318_7 crossref_primary_10_3390_math9090916 crossref_primary_10_1007_s44230_022_00004_0 crossref_primary_10_1109_ACCESS_2022_3193935 crossref_primary_10_1002_spy2_500 crossref_primary_10_1109_ACCESS_2023_3262020 crossref_primary_10_1007_s41870_022_00987_w crossref_primary_10_1007_s41870_024_02122_3 crossref_primary_10_1007_s41870_024_02350_7 crossref_primary_10_1007_s41870_024_02087_3 crossref_primary_10_3846_bmee_2025_20808 crossref_primary_10_14710_jtsiskom_2020_13734 crossref_primary_10_18185_erzifbed_954466 crossref_primary_10_47164_ijngc_v13i3_820 crossref_primary_10_1109_ACCESS_2024_3430109 crossref_primary_10_1007_s41870_023_01397_2 crossref_primary_10_1007_s11042_024_19508_x crossref_primary_10_1109_ACCESS_2022_3190897 crossref_primary_10_48084_etasr_5950 crossref_primary_10_1016_j_ipm_2024_103881 crossref_primary_10_1109_ACCESS_2024_3440637 crossref_primary_10_1007_s11042_023_14365_6 crossref_primary_10_1007_s11760_023_02854_y crossref_primary_10_1007_s41870_022_01151_0 crossref_primary_10_1007_s42979_023_02559_6 crossref_primary_10_1016_j_bar_2025_101560 crossref_primary_10_1186_s44147_025_00586_z crossref_primary_10_3389_ffutr_2023_1070533 crossref_primary_10_3233_IDA_216460  | 
    
| Cites_doi | 10.1109/TDSC.2009.11 10.1109/ICNSC.2018.8361343 10.1109/ICCNI.2017.8123782 10.1109/SmartWorld.2018.00051 10.1109/ICSPIS.2016.7869880 10.1109/AEEICB.2017.7972424 10.1109/SCEECS.2018.8546939  | 
    
| ContentType | Journal Article | 
    
| Copyright | Bharati Vidyapeeth's Institute of Computer Applications and Management 2020 Bharati Vidyapeeth's Institute of Computer Applications and Management 2020.  | 
    
| Copyright_xml | – notice: Bharati Vidyapeeth's Institute of Computer Applications and Management 2020 – notice: Bharati Vidyapeeth's Institute of Computer Applications and Management 2020.  | 
    
| DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D  | 
    
| DOI | 10.1007/s41870-020-00430-y | 
    
| DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitleList | Computer and Information Systems Abstracts  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering Computer Science  | 
    
| EISSN | 2511-2112 | 
    
| EndPage | 1511 | 
    
| ExternalDocumentID | 10_1007_s41870_020_00430_y | 
    
| GroupedDBID | -EM 0R~ 406 AACDK AAHNG AAIAL AAJBT AANZL AASML AATNV AATVU AAUYE ABAKF ABDZT ABECU ABFTV ABJNI ABJOX ABKCH ABMQK ABQBU ABTEG ABTKH ABTMW ABXPI ACAOD ACDTI ACGFS ACHSB ACMLO ACOKC ACPIV ACZOJ ADHHG ADKNI ADKPE ADTPH ADURQ ADYFF ADZKW AEBTG AEFQL AEJRE AEMSY AEOHA AESKC AEVLU AEXYK AFBBN AFQWF AGDGC AGMZJ AGQEE AGRTI AHSBF AIAKS AIGIU AILAN AITGF AJRNO AJZVZ ALFXC ALMA_UNASSIGNED_HOLDINGS AMKLP AMXSW AMYLF AMYQR AXYYD BGNMA CSCUP DNIVK DPUIP EBLON EBS EIOEI EJD FERAY FIGPU FINBP FNLPD FSGXE GGCAI GJIRD IKXTQ IWAJR J-C JZLTJ KOV LLZTM M4Y NPVJJ NQJWS NU0 O9J PT4 RLLFE ROL RSV SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE TSG UOJIU UTJUX UZXMN VFIZW Z7Z Z81 Z83 Z88 ZMTXR AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC AEZWR AFDZB AFHIU AFKRA AFOHR AHPBZ AHWEU AIXLP ATHPR AYFIA BGLVJ CCPQU CITATION K7- PHGZM PHGZT PQGLB 7SC 8FD JQ2 L7M L~C L~D  | 
    
| ID | FETCH-LOGICAL-c234y-cacb266796e7b6b2ac5305c5e11f34556f7e98c4159e7391830a9e089a1f39d43 | 
    
| ISSN | 2511-2104 | 
    
| IngestDate | Wed Sep 17 23:57:27 EDT 2025 Wed Oct 01 02:38:18 EDT 2025 Thu Apr 24 22:58:18 EDT 2025 Fri Feb 21 02:48:07 EST 2025  | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Issue | 4 | 
    
| Keywords | Random under-sampling Logistic regression Naïve Bayes KNN Credit card fraud Fraud detection  | 
    
| Language | English | 
    
| LinkModel | OpenURL | 
    
| MergedId | FETCHMERGED-LOGICAL-c234y-cacb266796e7b6b2ac5305c5e11f34556f7e98c4159e7391830a9e089a1f39d43 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
    
| ORCID | 0000-0001-8689-9878 | 
    
| PQID | 2555127052 | 
    
| PQPubID | 2034493 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | proquest_journals_2555127052 crossref_primary_10_1007_s41870_020_00430_y crossref_citationtrail_10_1007_s41870_020_00430_y springer_journals_10_1007_s41870_020_00430_y  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20210800 | 
    
| PublicationDateYYYYMMDD | 2021-08-01 | 
    
| PublicationDate_xml | – month: 8 year: 2021 text: 20210800  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | Singapore | 
    
| PublicationPlace_xml | – name: Singapore – name: Heidelberg  | 
    
| PublicationSubtitle | An Official Journal of Bharati Vidyapeeth's Institute of Computer Applications and Management | 
    
| PublicationTitle | International journal of information technology (Singapore. Online) | 
    
| PublicationTitleAbbrev | Int. j. inf. tecnol | 
    
| PublicationYear | 2021 | 
    
| Publisher | Springer Singapore Springer Nature B.V  | 
    
| Publisher_xml | – name: Springer Singapore – name: Springer Nature B.V  | 
    
| References | CR2 CR4 CR3 Banerjee, Bourla, Chen, Kashyap, Purohit, Battipaglia (CR10) 2018 CR6 Hordri, Yuhaniz, Firdaus, Azmi, Shamsuddin (CR12) 2018; 9 CR5 CR7 Kundu, Panigrahi, Sural, Majumdar (CR1) 2009; 6 Padvekar, Kangane, Jadhav (CR8) 2016; 5 Khare, Sait (CR9) 2018; 118 CR13 CR11 Hordri, Yuhaniz, Azmi, Shamsuddin (CR14) 2018; 9 430_CR11 430_CR7 430_CR6 430_CR13 430_CR5 NF Hordri (430_CR14) 2018; 9 430_CR4 430_CR3 430_CR2 NF Hordri (430_CR12) 2018; 9 SA Padvekar (430_CR8) 2016; 5 R Banerjee (430_CR10) 2018 A Kundu (430_CR1) 2009; 6 N Khare (430_CR9) 2018; 118  | 
    
| References_xml | – start-page: 1 year: 2018 end-page: 10 ident: CR10 publication-title: Comparative analysis of machine learning algorithms through credit card fraud detection – volume: 9 start-page: 390 issue: 11 year: 2018 end-page: 396 ident: CR14 article-title: Handling class imbalance in credit card fraud using resampling methods publication-title: Int J Adv Comput Sci Appl – volume: 5 start-page: 16183 issue: 4 year: 2016 end-page: 16186 ident: CR8 article-title: Credit card fraud detection system publication-title: Int J Eng Comput Sci – volume: 6 start-page: 309 issue: 4 year: 2009 end-page: 315 ident: CR1 article-title: BLAST-SSAHA hybridization for credit card fraud detection publication-title: IEEE Trans Dependable Secure Comput doi: 10.1109/TDSC.2009.11 – ident: CR3 – ident: CR4 – ident: CR2 – ident: CR13 – ident: CR11 – volume: 9 start-page: 390 issue: 11 year: 2018 end-page: 396 ident: CR12 article-title: Handling class imbalance in credit card fraud using resampling methods publication-title: Int J Adv Comput Sci Appl – ident: CR6 – ident: CR5 – ident: CR7 – volume: 118 start-page: 825 issue: 20 year: 2018 end-page: 838 ident: CR9 article-title: Credit card fraud detection using machine learning models and collating machine learning models publication-title: Int J Pure Appl Math – ident: 430_CR11 doi: 10.1109/ICNSC.2018.8361343 – volume: 9 start-page: 390 issue: 11 year: 2018 ident: 430_CR12 publication-title: Int J Adv Comput Sci Appl – ident: 430_CR2 – volume: 5 start-page: 16183 issue: 4 year: 2016 ident: 430_CR8 publication-title: Int J Eng Comput Sci – volume: 6 start-page: 309 issue: 4 year: 2009 ident: 430_CR1 publication-title: IEEE Trans Dependable Secure Comput doi: 10.1109/TDSC.2009.11 – ident: 430_CR13 doi: 10.1109/ICCNI.2017.8123782 – volume: 9 start-page: 390 issue: 11 year: 2018 ident: 430_CR14 publication-title: Int J Adv Comput Sci Appl – ident: 430_CR6 doi: 10.1109/SmartWorld.2018.00051 – ident: 430_CR3 doi: 10.1109/ICSPIS.2016.7869880 – start-page: 1 volume-title: Comparative analysis of machine learning algorithms through credit card fraud detection year: 2018 ident: 430_CR10 – ident: 430_CR5 – ident: 430_CR7 doi: 10.1109/AEEICB.2017.7972424 – volume: 118 start-page: 825 issue: 20 year: 2018 ident: 430_CR9 publication-title: Int J Pure Appl Math – ident: 430_CR4 doi: 10.1109/SCEECS.2018.8546939  | 
    
| SSID | ssib031263571 ssj0002710285  | 
    
| Score | 2.5819998 | 
    
| Snippet | Financial fraud is a threat which is increasing on a greater pace and has a very bad impact over the economy, collaborative institutions and administration.... | 
    
| SourceID | proquest crossref springer  | 
    
| SourceType | Aggregation Database Enrichment Source Index Database Publisher  | 
    
| StartPage | 1503 | 
    
| SubjectTerms | Algorithms Artificial Intelligence Computer Imaging Computer Science Credit cards Datasets Fraud Image Processing and Computer Vision Impact analysis Internet Machine Learning Original Research Pattern Recognition and Graphics Prediction models Regression models Sampling methods Software Engineering Vision  | 
    
| Title | Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection | 
    
| URI | https://link.springer.com/article/10.1007/s41870-020-00430-y https://www.proquest.com/docview/2555127052  | 
    
| Volume | 13 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 2511-2112 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002710285 issn: 2511-2104 databaseCode: AFBBN dateStart: 20170301 isFulltext: true providerName: Library Specific Holdings  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NbtQwELaW7QUO_BQQCwX5wM1NFSfO37FbsapA7IVW6i2yE2d3admtdrNF6Yk34iF4AR6JsZ04CbSIcokix_FuMl884_HMNwi9Fa4QhZ9TJ-KcOyyhwhHcE3DGcppLWrhCJQp_nIbHp-z9WXA2GPzsRC1tS3GQXd-YV_I_UoU2kKvKkr2DZO2g0ADnIF84goTh-E8yPmqLCBrO1ZZgxGT2aH7mmQl1NYlVXO-MT64kGfNKGoLmD9Mp-aKDKmVTRWJG-MVstV6Uc0PYQBSx6KJURNY5KdZ8m5NcljqMa9m1b_sOxh4thU2TJKV15yv79pMqyn2pyDRJl_fUALZcaVfuhFf82mIDlt78fDNfWO8QjDA37u3yq6J_XHd9GR61kXR9Xyaxv9zOhmop5Hh1reID2W2j_enc78CWdeZmMH39jp4HU4feqENM2MiG0VjV5PFU4j3zXadqNaaNY7Scz7pzCp1T3Tmt7qEdD_SMO0Q7h5PxeNrMcT5V7D-1Cf5Zb_wqE09F2tonrHO7dIbnH_-ibz-1i6Lf9vG1eXTyGD2s1zX40ID0CRrI5S561NQMwbUK2UUPOgSYT9G3FsEYkIgbBONVgRsE4xbB-3jKf3y_klhjV98B2MU1dnGDXdxiFwPqsMEuVtjFGrvYYvcZOp28Ozk6duqaIE7m-axyMp4JsCmjJJSRCIXHswA0VhZISgufBUFYRDKJMzBLExn5CSgslyfSjRMO15Oc-c_RcLlayhcIF27Ik1hQ6TLB_CjnXhYXcU5lHOcR43yEaPOm06wmzFd1Wy7S28U-QsTec2noYv7ae68RYFp_kZsU1viBCgcJvBHab4TaXr59tJd36_4K3W8_wT00LNdb-RoM7FK8qTH7CwN50CM | 
    
| linkProvider | Library Specific Holdings | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparison+and+analysis+of+logistic+regression%2C+Na%C3%AFve+Bayes+and+KNN+machine+learning+algorithms+for+credit+card+fraud+detection&rft.jtitle=International+journal+of+information+technology+%28Singapore.+Online%29&rft.au=Itoo%2C+Fayaz&rft.au=Meenakshi&rft.au=Singh%2C+Satwinder&rft.date=2021-08-01&rft.pub=Springer+Singapore&rft.issn=2511-2104&rft.eissn=2511-2112&rft.volume=13&rft.issue=4&rft.spage=1503&rft.epage=1511&rft_id=info:doi/10.1007%2Fs41870-020-00430-y&rft.externalDocID=10_1007_s41870_020_00430_y | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2511-2104&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2511-2104&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2511-2104&client=summon |