Enabling Fraud Prediction on Preliminary Data Through Information Density Booster

In online lending services, fraud prediction is an especially critical step to control loss risk and improve processing efficiency. Unfortunately, it is definitely challenging since the ex-ante prediction actually needs to be made only based on the most basic information of applicants. This work fig...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information forensics and security Vol. 18; p. 1
Main Authors Zhu, Hangyu, Wang, Cheng
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1556-6013
1556-6021
DOI10.1109/TIFS.2023.3300523

Cover

Abstract In online lending services, fraud prediction is an especially critical step to control loss risk and improve processing efficiency. Unfortunately, it is definitely challenging since the ex-ante prediction actually needs to be made only based on the most basic information of applicants. This work figures out that the essential difficulty here is the low information density of data associations which contain the useful information for fraud prediction. Accordingly, we propose a novel multi-stage data representation scheme, called AI2Vec (Applicant Information Vectoring), as an information density booster. It can gradually boost information density of associations by simultaneously decreasing the scale of information carriers and increasing the amount of useful information. The qualified performance of our AI2Vec is validated by the experiments over real-life data from a prestigious online lending platform. It can help commonly-used machine learning classifiers outperform the state-of-the-art methods, including the method of pilot platform with manual feature engineering by the subject matter experts.
AbstractList In online lending services, fraud prediction is an especially critical step to control loss risk and improve processing efficiency. Unfortunately, it is definitely challenging since the ex-ante prediction actually needs to be made only based on the most basic information of applicants. This work figures out that the essential difficulty here is the low information density of data associations which contain the useful information for fraud prediction. Accordingly, we propose a novel multi-stage data representation scheme, called AI2Vec (Applicant Information Vectoring), as an information density booster. It can gradually boost information density of associations by simultaneously decreasing the scale of information carriers and increasing the amount of useful information. The qualified performance of our AI2Vec is validated by the experiments over real-life data from a prestigious online lending platform. It can help commonly-used machine learning classifiers outperform the state-of-the-art methods, including the method of pilot platform with manual feature engineering by the subject matter experts.
Author Wang, Cheng
Zhu, Hangyu
Author_xml – sequence: 1
  givenname: Hangyu
  orcidid: 0000-0001-7221-9130
  surname: Zhu
  fullname: Zhu, Hangyu
  organization: Department of Computer Science and Engineering, Tongji University, China
– sequence: 2
  givenname: Cheng
  orcidid: 0000-0002-4752-0316
  surname: Wang
  fullname: Wang, Cheng
  organization: Department of Computer Science and Engineering, Tongji University, China
BookMark eNp9kFtLAzEQhYNUsK3-AMGHBZ-35rK72TxqL1ooqFifQzabtCnbpCa7D_33phdEfBAGZgbOmeF8A9CzzioAbhEcIQTZw3I--xhhiMmIEAhzTC5AH-V5kRYQo97PjMgVGISwgTDLUFH2wfvUiqoxdpXMvOjq5M2r2sjWOJvEiltjtsYKv08mohXJcu1dt1onc6ud34qjbqJsMO0-eXIutMpfg0stmqBuzn0IPmfT5fglXbw-z8ePi1RilrUpITWlqKwxY1rVAkGpiCCaSEZrlccQVU4rpqmGsqY5wlgwjaO4kgQJKCEZgvvT3Z13X50KLd-4ztv4kuOyoIRhglBUoZNKeheCV5rvvNnGPBxBfiDHD-T4gRw_k4se-scjTXvM2nphmn-ddyenUUr9-oRYScqMfANcwn2M
CODEN ITIFA6
CitedBy_id crossref_primary_10_1109_TSC_2024_3422880
Cites_doi 10.24963/ijcai.2018/438
10.1109/TIFS.2015.2482465
10.1109/TDSC.2013.3
10.1080/08839514.2018.1525517
10.1145/3357384.3358052
10.1109/DSAA.2019.00050
10.1145/3366423.3380159
10.1109/SP46214.2022.9833779
10.1145/2783258.2789985
10.1145/3308560.3316586
10.1145/3357384.3357804
10.1145/3331184.3331372
10.1109/ICDM.2019.00070
10.1109/TNNLS.2018.2870573
10.1109/TKDE.2020.2981333
10.1145/2939672.2939754
10.1109/ICIRCA54612.2022.9985674
10.1109/TIFS.2018.2883000
10.1145/3437963.3441743
10.1145/3292500.3330693
10.14722/ndss.2015.23260
10.1109/TDSC.2022.3151132
10.1109/TKDE.2017.2767599
10.1609/aaai.v32i1.11411
10.1109/ICPR.2018.8545474
10.1145/3289600.3290959
10.1145/2623330.2623732
10.1109/BigData52589.2021.9672028
10.1145/3308558.3313715
10.1145/3097983.3098036
10.1109/TKDE.2019.2924431
10.1609/aaai.v34i04.5906
10.1109/TIFS.2019.2930479
10.1109/COINS54846.2022.9854964
10.1109/ICDM.2018.00028
10.1145/2939672.2939753
10.1109/TKDE.2018.2819980
10.1109/ICDM50108.2020.00098
10.1609/aaai.v33i01.3301946
10.1145/3132847.3132953
10.1109/TDSC.2020.2991872
10.1145/3340531.3412724
10.1109/TKDE.2019.2912817
10.1145/3397271.3401253
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
KR7
L7M
L~C
L~D
DOI 10.1109/TIFS.2023.3300523
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Civil Engineering Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Civil Engineering Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1556-6021
EndPage 1
ExternalDocumentID 10_1109_TIFS_2023_3300523
10198384
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61972287
  funderid: 10.13039/501100001809
GroupedDBID 0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
5VS
AAYXX
AETIX
AGSQL
CITATION
EJD
7SC
7SP
7TB
8FD
FR3
JQ2
KR7
L7M
L~C
L~D
ID FETCH-LOGICAL-c294t-33d7718d299feda10ce3a3f3c97de5330b57b9f7f0cd75122a9f2299bc31a0c03
IEDL.DBID RIE
ISSN 1556-6013
IngestDate Mon Jun 30 02:57:16 EDT 2025
Wed Oct 01 02:18:32 EDT 2025
Thu Apr 24 22:59:52 EDT 2025
Wed Aug 27 02:57:15 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c294t-33d7718d299feda10ce3a3f3c97de5330b57b9f7f0cd75122a9f2299bc31a0c03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4752-0316
0000-0001-7221-9130
PQID 2867392311
PQPubID 85506
PageCount 1
ParticipantIDs crossref_primary_10_1109_TIFS_2023_3300523
proquest_journals_2867392311
crossref_citationtrail_10_1109_TIFS_2023_3300523
ieee_primary_10198384
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on information forensics and security
PublicationTitleAbbrev TIFS
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
levy (ref22) 2006
ref11
ref10
ref19
ref18
ref51
ref50
ref46
ref45
ref48
ref47
ref42
ref41
ref44
ref43
adams (ref27) 2016
ref49
ref7
ref9
ref4
ref3
ref6
ref5
ref40
ref35
ref34
ref31
van der heijden (ref37) 2019
ref30
ref33
ref32
ref2
ref1
ref39
ref38
yuan (ref16) 2018
ref24
ref23
ref26
ref25
ref20
ref21
ref28
din (ref8) 2020
tong (ref17) 2019
ref29
yang (ref52) 2015
mikolov (ref36) 2013
References_xml – ident: ref43
  doi: 10.24963/ijcai.2018/438
– ident: ref9
  doi: 10.1109/TIFS.2015.2482465
– ident: ref44
  doi: 10.1109/TDSC.2013.3
– ident: ref41
  doi: 10.1080/08839514.2018.1525517
– ident: ref20
  doi: 10.1145/3357384.3358052
– ident: ref32
  doi: 10.1109/DSAA.2019.00050
– start-page: 1309
  year: 2019
  ident: ref37
  article-title: Cognitive triaging of phishing attacks
  publication-title: Proc 28th USENIX Secur Symp (USENIX)
– ident: ref10
  doi: 10.1145/3366423.3380159
– start-page: 1571
  year: 2020
  ident: ref8
  article-title: Boxer: Preventing fraud by scanning credit cards
  publication-title: Proc 29th USENIX Secur Symp (USENIX)
– ident: ref26
  doi: 10.1109/SP46214.2022.9833779
– ident: ref13
  doi: 10.1145/2783258.2789985
– ident: ref1
  doi: 10.1145/3308560.3316586
– ident: ref15
  doi: 10.1145/3357384.3357804
– ident: ref48
  doi: 10.1145/3331184.3331372
– year: 2016
  ident: ref27
  publication-title: How Tala Mobile is Using Phone Data to Revolutionize Microfinance Forbes com
– ident: ref12
  doi: 10.1109/ICDM.2019.00070
– ident: ref19
  doi: 10.1109/TNNLS.2018.2870573
– ident: ref21
  doi: 10.1109/TKDE.2020.2981333
– ident: ref25
  doi: 10.1145/2939672.2939754
– ident: ref28
  doi: 10.1109/ICIRCA54612.2022.9985674
– ident: ref11
  doi: 10.1109/TIFS.2018.2883000
– ident: ref5
  doi: 10.1145/3437963.3441743
– ident: ref18
  doi: 10.1145/3292500.3330693
– ident: ref14
  doi: 10.14722/ndss.2015.23260
– start-page: 3111
  year: 2013
  ident: ref36
  article-title: Distributed representations of words and phrases and their compositionality
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 2111
  year: 2015
  ident: ref52
  article-title: Network representation learning with rich text information
  publication-title: Proc IJCAI
– ident: ref4
  doi: 10.1109/TDSC.2022.3151132
– ident: ref34
  doi: 10.1109/TKDE.2017.2767599
– ident: ref38
  doi: 10.1609/aaai.v32i1.11411
– ident: ref7
  doi: 10.1109/ICPR.2018.8545474
– ident: ref3
  doi: 10.1145/3289600.3290959
– ident: ref40
  doi: 10.1145/2623330.2623732
– ident: ref31
  doi: 10.1109/BigData52589.2021.9672028
– ident: ref24
  doi: 10.1145/3308558.3313715
– ident: ref51
  doi: 10.1145/3097983.3098036
– ident: ref2
  doi: 10.1109/TKDE.2019.2924431
– ident: ref47
  doi: 10.1609/aaai.v34i04.5906
– start-page: 849
  year: 2006
  ident: ref22
  article-title: Speakers optimize information density through syntactic reduction
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref45
  doi: 10.1109/TIFS.2019.2930479
– ident: ref29
  doi: 10.1109/COINS54846.2022.9854964
– ident: ref39
  doi: 10.1109/ICDM.2018.00028
– ident: ref50
  doi: 10.1145/2939672.2939753
– ident: ref42
  doi: 10.1109/TKDE.2018.2819980
– start-page: 285
  year: 2019
  ident: ref17
  article-title: Improving robustness of ML classifiers against realizable evasion attacks using conserved features
  publication-title: Proc 28th USENIX Secur Symp (USENIX)
– start-page: 1027
  year: 2018
  ident: ref16
  article-title: Reading thieves' cant: Automatically identifying and understanding dark jargons from cybercrime marketplaces
  publication-title: Proc 27th USENIX Secur Symp (USENIX)
– ident: ref30
  doi: 10.1109/ICDM50108.2020.00098
– ident: ref33
  doi: 10.1609/aaai.v33i01.3301946
– ident: ref23
  doi: 10.1145/3132847.3132953
– ident: ref46
  doi: 10.1109/TDSC.2020.2991872
– ident: ref6
  doi: 10.1145/3340531.3412724
– ident: ref49
  doi: 10.1109/TKDE.2019.2912817
– ident: ref35
  doi: 10.1145/3397271.3401253
SSID ssj0044168
Score 2.3614519
Snippet In online lending services, fraud prediction is an especially critical step to control loss risk and improve processing efficiency. Unfortunately, it is...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Boosting
Data mining
Density
Fraud
Fraud prediction
History
information density
Machine learning
network representation learning
online lending services
Representation learning
Risk management
Semantics
Task analysis
Title Enabling Fraud Prediction on Preliminary Data Through Information Density Booster
URI https://ieeexplore.ieee.org/document/10198384
https://www.proquest.com/docview/2867392311
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1556-6021
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0044168
  issn: 1556-6013
  databaseCode: RIE
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7oTnpwOhWnU3LwJLSmTds0R3WO6WEobuCt5Fcvjk1md5h_vUmaylQUoYcUXkrhS_Jekve9D-A8EzIjJSFBpBkNEoHTILdknzLJdRljWkrusi1G2XCS3D-nz56s7rgwWmuXfKZD23R3-Woul_aozMxws0UmebIJm5SymqzVLLvGrde8tzTNArPLIP4KM8Lscnw3eAqtTnhIiDsH_eKEnKrKj6XY-ZdBG0bNn9VpJS_hshKhfP9WtPHfv74LOz7SRFf10NiDDT3rQLtRcUB-Undge60k4T483loylWkiE9IuFXpY2Jscix4yj3mbOhmwxQr1ecXRuJb5QZ7V5Oz6Nie-WqHruSWQLA5gMrgd3wwDL7sQyJglVUCIosZjKeOoSq14hKUm3OApGVXaJqOKlApW0hJLRU28EHNWxsZYSBJxLDE5hNZsPtNHgKQkSrFccBaJhMRpnuUqk5gniRB5JngXcINDIX1NciuNMS3c3gSzwkJXWOgKD10XLj67vNYFOf4yPrBQrBnWKHSh16Bd-Dn7VsR5RomNd6PjX7qdwJb9en0C04NWtVjqUxOTVOLMjcUPzA7cqw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADZRVl9YETUoITJ3F8ZKtalgpEkbhF3nKhalFJD_D12I6DWARCysGRxrKlZ3tm7HkzAIeZkBkpCQkizWiQCJwGuSX7lEmuyxjTUnIXbdHPug_J5WP66MnqjgujtXbBZzq0TfeWr8Zyaq_KzA43LjLJk1mYT41bQWu6VnPwGsVeM9_SNAuMn0H8I2aE2fGg17kPbaXwkBB3E_pFDbm6Kj8OY6dhOi3oN3OrA0uewmklQvn2LW3jvye_Asve1kQn9eJYhRk9WoNWU8cB-W29BkufkhKuw92FpVOZJjJG7VSh24l9y7H4IfOZv6ErBDZ5Ree84mhQF_pBntfk5M5tVHz1ik7HlkIy2YCHzsXgrBv4wguBjFlSBYQoanSWMqqq1IpHWGrCDaKSUaVtOKpIqWAlLbFU1FgMMWdlbISFJBHHEpNNmBuNR3oLkJREKZYLziKRkDjNs1xlEvMkESLPBG8DbnAopM9KbotjDAvnnWBWWOgKC13hoWvD0UeX5zolx1_CGxaKT4I1Cm3YbdAu_K59KeI8o8RavNH2L90OYKE7uLkurnv9qx1YtCPV9zG7MFdNpnrPWCiV2Hfr8h2Nxt_8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enabling+Fraud+Prediction+on+Preliminary+Data+Through+Information+Density+Booster&rft.jtitle=IEEE+transactions+on+information+forensics+and+security&rft.au=Zhu%2C+Hangyu&rft.au=Wang%2C+Cheng&rft.date=2023-01-01&rft.pub=IEEE&rft.issn=1556-6013&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTIFS.2023.3300523&rft.externalDocID=10198384
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1556-6013&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1556-6013&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1556-6013&client=summon