DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 80; no. 12; pp. 17760 - 17789
Main Authors Li, Xinqi, Liu, Qicheng
Format Journal Article
LanguageEnglish
Published New York Springer US 01.08.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0920-8542
1573-0484
DOI10.1007/s11227-024-06132-7

Cover

Abstract Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most renowned methods for handling imbalanced data. However, both SMOTE and its variants have limitations due to their insufficient consideration of data distribution, leading to the generation of incorrect and unnecessary samples. This paper, therefore, introduces a novel oversampling algorithm called data distribution and spectral clustering-based SMOTE (DDSC-SMOTE). This algorithm addresses the shortcomings of SMOTE by introducing three innovative data distribution-based improvement strategies: adaptive allocation of synthetic sample quantities strategy, seed sample adaptive selection strategy, and synthetic sample improvement strategy. First, we use the k -nearest neighbor sample labels and the local outlier factor algorithm to remove noisy and outlier samples. Next, we leverage spectral clustering to identify clusters within the minority class and propose a dual-weight factor that considers inter-cluster and intra-cluster distances to allocate the number of synthetic samples effectively, addressing interclass and intraclass imbalances. Furthermore, we introduce a relative position weight coefficient to determine the probability of selecting seed samples within the subcluster, ensuring that important minority samples have higher chances of being sampled. Finally, we improve the SMOTE sample synthesis formula for safer generation. Extensive comparisons on real datasets from the UCI repository demonstrate that DDSC-SMOTE outperforms seven state-of-the-art oversampling algorithms significantly in terms of G -mean and F 1-score, presenting a data distribution-focused solution for addressing imbalanced data challenges.
AbstractList Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most renowned methods for handling imbalanced data. However, both SMOTE and its variants have limitations due to their insufficient consideration of data distribution, leading to the generation of incorrect and unnecessary samples. This paper, therefore, introduces a novel oversampling algorithm called data distribution and spectral clustering-based SMOTE (DDSC-SMOTE). This algorithm addresses the shortcomings of SMOTE by introducing three innovative data distribution-based improvement strategies: adaptive allocation of synthetic sample quantities strategy, seed sample adaptive selection strategy, and synthetic sample improvement strategy. First, we use the k -nearest neighbor sample labels and the local outlier factor algorithm to remove noisy and outlier samples. Next, we leverage spectral clustering to identify clusters within the minority class and propose a dual-weight factor that considers inter-cluster and intra-cluster distances to allocate the number of synthetic samples effectively, addressing interclass and intraclass imbalances. Furthermore, we introduce a relative position weight coefficient to determine the probability of selecting seed samples within the subcluster, ensuring that important minority samples have higher chances of being sampled. Finally, we improve the SMOTE sample synthesis formula for safer generation. Extensive comparisons on real datasets from the UCI repository demonstrate that DDSC-SMOTE outperforms seven state-of-the-art oversampling algorithms significantly in terms of G -mean and F 1-score, presenting a data distribution-focused solution for addressing imbalanced data challenges.
Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most renowned methods for handling imbalanced data. However, both SMOTE and its variants have limitations due to their insufficient consideration of data distribution, leading to the generation of incorrect and unnecessary samples. This paper, therefore, introduces a novel oversampling algorithm called data distribution and spectral clustering-based SMOTE (DDSC-SMOTE). This algorithm addresses the shortcomings of SMOTE by introducing three innovative data distribution-based improvement strategies: adaptive allocation of synthetic sample quantities strategy, seed sample adaptive selection strategy, and synthetic sample improvement strategy. First, we use the k-nearest neighbor sample labels and the local outlier factor algorithm to remove noisy and outlier samples. Next, we leverage spectral clustering to identify clusters within the minority class and propose a dual-weight factor that considers inter-cluster and intra-cluster distances to allocate the number of synthetic samples effectively, addressing interclass and intraclass imbalances. Furthermore, we introduce a relative position weight coefficient to determine the probability of selecting seed samples within the subcluster, ensuring that important minority samples have higher chances of being sampled. Finally, we improve the SMOTE sample synthesis formula for safer generation. Extensive comparisons on real datasets from the UCI repository demonstrate that DDSC-SMOTE outperforms seven state-of-the-art oversampling algorithms significantly in terms of G-mean and F1-score, presenting a data distribution-focused solution for addressing imbalanced data challenges.
Author Liu, Qicheng
Li, Xinqi
Author_xml – sequence: 1
  givenname: Xinqi
  surname: Li
  fullname: Li, Xinqi
  organization: School of Computer and Control Engineering, Yantai University
– sequence: 2
  givenname: Qicheng
  surname: Liu
  fullname: Liu, Qicheng
  email: ytliuqc@163.com
  organization: School of Computer and Control Engineering, Yantai University
BookMark eNp9kEtLAzEYRYMo2Fb_gKuA69FMHvNwJ219QKWL1nX4JpNpU-ZlkhH896aOILgohAQ-7klyzxSdt12rEbqJyV1MSHrv4pjSNCKURySJGY3SMzSJRcoiwjN-jiYkpyTKBKeXaOrcgRDCWcomaL9YbObR5m29XT5gaLFpCqihVbrEJXjA3ae2Dpq-Nu0OQ73rrPH7BhfgQqJrx1BpnLemGLwJE2hL7HqtvIUaq3pwXtsAX6GLCmqnr3_PGXp_Wm7nL9Fq_fw6f1xFisW5D7tSGUmKqigZVZUARiHPSQJZrkOFIqyU8pwRYKIqwyTTRQm8UpTySoiKzdDteG9vu49BOy8P3WDb8KRkVCQiZ1lCQoqOKWU756yuZG9NA_ZLxkQejcrRqAxG5Y9RmQYo-wcp4-FYOnQ19WmUjajrjzK0_fvVCeob9s2NJg
CitedBy_id crossref_primary_10_1016_j_patter_2024_101073
Cites_doi 10.1016/j.inffus.2019.07.006
10.1109/TR.2020.3020238
10.1016/j.ins.2020.07.014
10.1016/j.ins.2021.04.017
10.1109/TIT.1967.1053964
10.1007/s13748-016-0094-0
10.1109/TKDE.2008.239
10.1145/1007730.1007735
10.1016/j.ins.2022.02.038
10.1016/j.ins.2019.02.062
10.1007/s11222-007-9033-z
10.1007/BF00994018
10.1016/j.ins.2017.04.044
10.1613/jair.953
10.1016/j.inffus.2023.102150
10.1016/j.knosys.2020.105845
10.1016/j.patrec.2020.03.016
10.1016/j.ins.2018.06.056
10.1007/s10515-021-00311-z
10.1109/TKDE.2012.232
10.1016/j.asoc.2021.108288
10.1016/j.knosys.2022.108296
10.1016/j.eswa.2021.116213
10.1016/j.ins.2020.10.013
10.1016/j.neucom.2022.05.017
10.1007/s10489-021-02341-2
10.1007/s10489-022-03512-5
10.1016/j.asoc.2022.108618
10.1016/j.eswa.2021.115297
10.1007/978-3-642-24958-7_85
10.1007/11538059_91
10.1155/2022/3068199
10.1007/978-3-642-01307-2_43
10.1109/TFUZZ.2023.3287193
10.1145/342009.335388
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
DBID AAYXX
CITATION
8FE
8FG
ABJCF
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.1007/s11227-024-06132-7
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest SciTech Premium Collection Technology Collection Materials Science & Engineering Database
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials - QC
ProQuest Central
ProQuest Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Engineering Collection
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Computer Science Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 17789
ExternalDocumentID 10_1007_s11227_024_06132_7
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
8FE
8FG
ABJCF
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
M7S
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-LOGICAL-c319t-c3cc806bfbd32cf5a32a9906a89e920b20b724930a35fd9208ebda4fc224f55f3
IEDL.DBID BENPR
ISSN 0920-8542
IngestDate Mon Oct 06 18:31:56 EDT 2025
Thu Apr 24 23:06:47 EDT 2025
Wed Oct 01 03:43:58 EDT 2025
Fri Feb 21 02:39:56 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 12
Keywords Imbalanced data
Spectral clustering
SMOTE
Oversampling
Data preprocessing
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c319t-c3cc806bfbd32cf5a32a9906a89e920b20b724930a35fd9208ebda4fc224f55f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 3256593860
PQPubID 2043774
PageCount 30
ParticipantIDs proquest_journals_3256593860
crossref_primary_10_1007_s11227_024_06132_7
crossref_citationtrail_10_1007_s11227_024_06132_7
springer_journals_10_1007_s11227_024_06132_7
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20240800
2024-08-00
20240801
PublicationDateYYYYMMDD 2024-08-01
PublicationDate_xml – month: 8
  year: 2024
  text: 20240800
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2024
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Chen, Yan, Han, Wang, Peng, Wang, Yang (CR9) 2018; 433
Yan, Jiang, Zheng, Yu, Zhang, Zhang (CR37) 2022; 191
Vo, Nguyen, Vo, Le (CR19) 2021; 51
Tao, Li, Guo, Ren, Li, Liu, Zou (CR36) 2019; 487
Chawla, Bowyer, Hall, Kegelmeyer (CR10) 2002; 16
Barua, Islam, Yao, Murase (CR28) 2012; 26
Sun, Zhang, Chen, Ge (CR3) 2022; 34
Dai, Song, Si, Yang, Hu, Wang (CR24) 2021; 569
Sun, Li, Fujita, Fu, Ai (CR5) 2020; 54
CR14
CR12
Douzas, Bacao, Last (CR26) 2018; 465
Krawczyk (CR6) 2016; 5
CR11
CR33
Soltanzadeh, Hashemzadeh (CR29) 2021; 542
Chen, Xia, Chen, Wang, Wang (CR15) 2020; 553
Liu (CR18) 2023; 53
CR2
Islam, Belhaouari, Rehman, Bensmail (CR35) 2022; 115
Ren, Zhu, Kang, Fu, Niu, Gao, Yan, Hong (CR8) 2022; 241
Chakraborty, Chakraborty (CR4) 2020; 70
Yin, Chen, Wan, Zhang, Horng, Li (CR16) 2024; 104
CR25
Zhang, Yu, Huan, Yang, Zheng, Gao (CR38) 2022; 595
Cover, Hart (CR32) 1967; 13
He, Garcia (CR17) 2009; 21
CR21
Balaram, Vasundra (CR7) 2022; 29
CR20
Meng, Li (CR23) 2022; 120
Huang, Huang, Fang, Xu, Qu, Zhai, Li (CR1) 2020; 133
Batista, Prati, Monard (CR34) 2004; 6
Chen, Zhang, Huang, Wu, Luo (CR22) 2022; 498
Dudjak, Martinović (CR13) 2021; 182
Von Luxburg (CR27) 2007; 17
Liang, Jiang, Li, Xue, Wang (CR30) 2020; 196
Cortes, Vapnik (CR31) 1995; 20
A Balaram (6132_CR7) 2022; 29
6132_CR14
6132_CR12
G Douzas (6132_CR26) 2018; 465
6132_CR11
Q Chen (6132_CR22) 2022; 498
S Barua (6132_CR28) 2012; 26
6132_CR33
ZX Chen (6132_CR9) 2018; 433
B Krawczyk (6132_CR6) 2016; 5
MT Vo (6132_CR19) 2021; 51
C Cortes (6132_CR31) 1995; 20
A Islam (6132_CR35) 2022; 115
Y Sun (6132_CR3) 2022; 34
M Dudjak (6132_CR13) 2021; 182
GE Batista (6132_CR34) 2004; 6
J Sun (6132_CR5) 2020; 54
H He (6132_CR17) 2009; 21
P Soltanzadeh (6132_CR29) 2021; 542
NV Chawla (6132_CR10) 2002; 16
U Von Luxburg (6132_CR27) 2007; 17
Y Yan (6132_CR37) 2022; 191
6132_CR21
B Chen (6132_CR15) 2020; 553
6132_CR20
F Dai (6132_CR24) 2021; 569
6132_CR25
T Cover (6132_CR32) 1967; 13
Z Ren (6132_CR8) 2022; 241
T Yin (6132_CR16) 2024; 104
T Chakraborty (6132_CR4) 2020; 70
R Liu (6132_CR18) 2023; 53
D Meng (6132_CR23) 2022; 120
X Tao (6132_CR36) 2019; 487
C Huang (6132_CR1) 2020; 133
6132_CR2
A Zhang (6132_CR38) 2022; 595
XW Liang (6132_CR30) 2020; 196
References_xml – volume: 54
  start-page: 128
  year: 2020
  end-page: 144
  ident: CR5
  article-title: Class-imbalanced dynamic financial distress prediction based on adaboost-SVM ensemble combined with SMOTE and time weighting
  publication-title: Inf Fusion
  doi: 10.1016/j.inffus.2019.07.006
– volume: 70
  start-page: 481
  issue: 2
  year: 2020
  end-page: 494
  ident: CR4
  article-title: Hellinger net: a hybrid imbalance learning model to improve software defect prediction
  publication-title: IEEE Trans Reliab
  doi: 10.1109/TR.2020.3020238
– ident: CR14
– ident: CR2
– ident: CR12
– volume: 542
  start-page: 92
  year: 2021
  end-page: 111
  ident: CR29
  article-title: RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2020.07.014
– volume: 569
  start-page: 70
  year: 2021
  end-page: 89
  ident: CR24
  article-title: Improved CBSO: a distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2021.04.017
– volume: 13
  start-page: 21
  issue: 1
  year: 1967
  end-page: 27
  ident: CR32
  article-title: Nearest neighbor pattern classification
  publication-title: IEEE Trans Inf Theory
  doi: 10.1109/TIT.1967.1053964
– volume: 5
  start-page: 221
  issue: 4
  year: 2016
  end-page: 232
  ident: CR6
  article-title: Learning from imbalanced data: open challenges and future directions
  publication-title: Prog Artif Intell
  doi: 10.1007/s13748-016-0094-0
– ident: CR33
– volume: 21
  start-page: 1263
  issue: 9
  year: 2009
  end-page: 1284
  ident: CR17
  article-title: Learning from imbalanced data
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2008.239
– volume: 6
  start-page: 20
  issue: 1
  year: 2004
  end-page: 29
  ident: CR34
  article-title: A study of the behavior of several methods for balancing machine learning training data
  publication-title: ACM SIGKDD Explor Newsl
  doi: 10.1145/1007730.1007735
– volume: 595
  start-page: 70
  year: 2022
  end-page: 88
  ident: CR38
  article-title: SMOTE-RkNN: a hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2022.02.038
– volume: 487
  start-page: 31
  year: 2019
  end-page: 56
  ident: CR36
  article-title: Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2019.02.062
– ident: CR25
– ident: CR21
– volume: 17
  start-page: 395
  year: 2007
  end-page: 416
  ident: CR27
  article-title: A tutorial on spectral clustering
  publication-title: Stat Comput
  doi: 10.1007/s11222-007-9033-z
– volume: 20
  start-page: 273
  year: 1995
  end-page: 297
  ident: CR31
  article-title: Support-vector networks
  publication-title: Mach Learn
  doi: 10.1007/BF00994018
– volume: 433
  start-page: 346
  year: 2018
  end-page: 364
  ident: CR9
  article-title: Ma-chine learning based mobile malware detection using highly imbalanced network traffic
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2017.04.044
– volume: 16
  start-page: 321
  year: 2002
  end-page: 357
  ident: CR10
  article-title: SMOTE: synthetic minority over-sampling technique
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.953
– volume: 104
  start-page: 102150
  year: 2024
  ident: CR16
  article-title: Exploiting feature multi-correlations for multilabel feature selection in robust multi-neighborhood fuzzy β covering space
  publication-title: Inf Fusion
  doi: 10.1016/j.inffus.2023.102150
– volume: 196
  start-page: 105845
  year: 2020
  ident: CR30
  article-title: LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2020.105845
– volume: 133
  start-page: 280
  year: 2020
  end-page: 286
  ident: CR1
  article-title: Sample imbalance disease classification model based on association rule feature selection
  publication-title: Pattern Recogn Lett
  doi: 10.1016/j.patrec.2020.03.016
– volume: 34
  start-page: 105
  issue: 06
  year: 2022
  end-page: 113
  ident: CR3
  article-title: Power data anomaly detection algorithm based on multi-domain feature extraction
  publication-title: Proc CSU-EPSA
– volume: 465
  start-page: 1
  year: 2018
  end-page: 20
  ident: CR26
  article-title: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2018.06.056
– volume: 29
  start-page: 6
  issue: 1
  year: 2022
  ident: CR7
  article-title: Prediction of software fault–prone classes using ensemble random forest with adaptive synthetic sampling algorithm
  publication-title: Autom Softw Eng
  doi: 10.1007/s10515-021-00311-z
– ident: CR11
– volume: 26
  start-page: 405
  issue: 2
  year: 2012
  end-page: 425
  ident: CR28
  article-title: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2012.232
– volume: 115
  start-page: 108288
  year: 2022
  ident: CR35
  article-title: KNNOR: an oversampling technique for imbalanced datasets
  publication-title: Appl Soft Comput
  doi: 10.1016/j.asoc.2021.108288
– volume: 241
  start-page: 108296
  year: 2022
  ident: CR8
  article-title: Adaptive cost-sensitive learning: improving the conver-gence of intelligent diagnosis models under imbalanced data
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2022.108296
– volume: 191
  start-page: 116213
  year: 2022
  ident: CR37
  article-title: LDAS: local density-based adaptive sampling for imbalanced data classification
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2021.116213
– volume: 553
  start-page: 397
  year: 2020
  end-page: 428
  ident: CR15
  article-title: RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2020.10.013
– volume: 498
  start-page: 75
  year: 2022
  end-page: 88
  ident: CR22
  article-title: PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2022.05.017
– volume: 51
  start-page: 7827
  year: 2021
  end-page: 7836
  ident: CR19
  article-title: Noise-adaptive synthetic oversampling technique
  publication-title: Appl Intell
  doi: 10.1007/s10489-021-02341-2
– volume: 53
  start-page: 786
  issue: 1
  year: 2023
  end-page: 803
  ident: CR18
  article-title: A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification
  publication-title: Appl Intell
  doi: 10.1007/s10489-022-03512-5
– volume: 120
  start-page: 108618
  year: 2022
  ident: CR23
  article-title: An imbalanced learning method by combining SMOTE with Center Offset Factor
  publication-title: Appl Soft Comput
  doi: 10.1016/j.asoc.2022.108618
– volume: 182
  start-page: 115297
  year: 2021
  ident: CR13
  article-title: An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2021.115297
– ident: CR20
– ident: 6132_CR21
  doi: 10.1007/978-3-642-24958-7_85
– volume: 465
  start-page: 1
  year: 2018
  ident: 6132_CR26
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2018.06.056
– ident: 6132_CR11
  doi: 10.1007/11538059_91
– volume: 196
  start-page: 105845
  year: 2020
  ident: 6132_CR30
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2020.105845
– volume: 115
  start-page: 108288
  year: 2022
  ident: 6132_CR35
  publication-title: Appl Soft Comput
  doi: 10.1016/j.asoc.2021.108288
– volume: 133
  start-page: 280
  year: 2020
  ident: 6132_CR1
  publication-title: Pattern Recogn Lett
  doi: 10.1016/j.patrec.2020.03.016
– ident: 6132_CR33
– volume: 191
  start-page: 116213
  year: 2022
  ident: 6132_CR37
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2021.116213
– volume: 34
  start-page: 105
  issue: 06
  year: 2022
  ident: 6132_CR3
  publication-title: Proc CSU-EPSA
– ident: 6132_CR12
– volume: 542
  start-page: 92
  year: 2021
  ident: 6132_CR29
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2020.07.014
– volume: 487
  start-page: 31
  year: 2019
  ident: 6132_CR36
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2019.02.062
– volume: 21
  start-page: 1263
  issue: 9
  year: 2009
  ident: 6132_CR17
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2008.239
– volume: 13
  start-page: 21
  issue: 1
  year: 1967
  ident: 6132_CR32
  publication-title: IEEE Trans Inf Theory
  doi: 10.1109/TIT.1967.1053964
– volume: 104
  start-page: 102150
  year: 2024
  ident: 6132_CR16
  publication-title: Inf Fusion
  doi: 10.1016/j.inffus.2023.102150
– volume: 29
  start-page: 6
  issue: 1
  year: 2022
  ident: 6132_CR7
  publication-title: Autom Softw Eng
  doi: 10.1007/s10515-021-00311-z
– volume: 16
  start-page: 321
  year: 2002
  ident: 6132_CR10
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.953
– volume: 26
  start-page: 405
  issue: 2
  year: 2012
  ident: 6132_CR28
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2012.232
– volume: 20
  start-page: 273
  year: 1995
  ident: 6132_CR31
  publication-title: Mach Learn
  doi: 10.1007/BF00994018
– volume: 5
  start-page: 221
  issue: 4
  year: 2016
  ident: 6132_CR6
  publication-title: Prog Artif Intell
  doi: 10.1007/s13748-016-0094-0
– volume: 569
  start-page: 70
  year: 2021
  ident: 6132_CR24
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2021.04.017
– volume: 53
  start-page: 786
  issue: 1
  year: 2023
  ident: 6132_CR18
  publication-title: Appl Intell
  doi: 10.1007/s10489-022-03512-5
– volume: 498
  start-page: 75
  year: 2022
  ident: 6132_CR22
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2022.05.017
– volume: 54
  start-page: 128
  year: 2020
  ident: 6132_CR5
  publication-title: Inf Fusion
  doi: 10.1016/j.inffus.2019.07.006
– ident: 6132_CR2
  doi: 10.1155/2022/3068199
– volume: 433
  start-page: 346
  year: 2018
  ident: 6132_CR9
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2017.04.044
– volume: 70
  start-page: 481
  issue: 2
  year: 2020
  ident: 6132_CR4
  publication-title: IEEE Trans Reliab
  doi: 10.1109/TR.2020.3020238
– ident: 6132_CR20
  doi: 10.1007/978-3-642-01307-2_43
– volume: 553
  start-page: 397
  year: 2020
  ident: 6132_CR15
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2020.10.013
– volume: 51
  start-page: 7827
  year: 2021
  ident: 6132_CR19
  publication-title: Appl Intell
  doi: 10.1007/s10489-021-02341-2
– volume: 182
  start-page: 115297
  year: 2021
  ident: 6132_CR13
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2021.115297
– ident: 6132_CR14
  doi: 10.1109/TFUZZ.2023.3287193
– volume: 17
  start-page: 395
  year: 2007
  ident: 6132_CR27
  publication-title: Stat Comput
  doi: 10.1007/s11222-007-9033-z
– volume: 6
  start-page: 20
  issue: 1
  year: 2004
  ident: 6132_CR34
  publication-title: ACM SIGKDD Explor Newsl
  doi: 10.1145/1007730.1007735
– volume: 241
  start-page: 108296
  year: 2022
  ident: 6132_CR8
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2022.108296
– volume: 120
  start-page: 108618
  year: 2022
  ident: 6132_CR23
  publication-title: Appl Soft Comput
  doi: 10.1016/j.asoc.2022.108618
– ident: 6132_CR25
  doi: 10.1145/342009.335388
– volume: 595
  start-page: 70
  year: 2022
  ident: 6132_CR38
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2022.02.038
SSID ssj0004373
Score 2.3884327
Snippet Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 17760
SubjectTerms Accuracy
Adaptive sampling
Algorithms
Bank fraud
Classification
Clustering
Compilers
Computer Science
Datasets
Interpreters
Machine learning
Noise
Oversampling
Processor Architectures
Programming Languages
SummonAdditionalLinks – databaseName: SpringerLink Journals (ICM)
  dbid: U2A
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60Xrz4FqtV9uBNA-luNkm9lT4oQvXQFnoLs6-2UFPp4_87m4dBUUEIOSSzG5iZzcwwM98Qcq9lCJZp8BgD5gUAkScjFnucgxYOr86Aa3AevoSDSfA8FdOiKWxTVruXKcnsT101uzUZizy0Kdk0AvQL98mBcHBeqMUT1q66IXmeV25hYBSLgBWtMj_v8dUcVT7mt7RoZm36J-SocBNpO5frKdkz6Rk5Lkcw0OJEnpN5tzvqeKPh67j3RCGlizfpahWV0dTVflJXoLkBVzWezigsZ6v1Yjt_o852abpKcyLtwHOLuVe4iaZZ--Uav6-WOwekgIsvyKTfG3cGXjE8wVN4qrZ4Vyr2Q2ml5kxZAZwBWp4Q4pZBzki8Igy9uA9cWI1PYiM1BFahTbdCWH5JaukqNVeEGsAwFgNZaNpmwFyqUIUSrB9oKbUwok6aJQ8TVSCLuwEXy6TCRHZ8T5DvScb3JKqTh8817zmuxp_UjVI0SXHGNglHb020eBz6dfJYiqt6_ftu1_8jvyGHLNMYV_XXILXtemdu0RPZyrtM8T4AB-rUEw
  priority: 102
  providerName: Springer Nature
Title DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering
URI https://link.springer.com/article/10.1007/s11227-024-06132-7
https://www.proquest.com/docview/3256593860
Volume 80
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCO - Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1573-0484
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: ABDBF
  dateStart: 20030501
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241102
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: ADMLS
  dateStart: 19870101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: AFBBN
  dateStart: 19970101
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: U2A
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9NAEB61yYULb0SgRHvgBhbJrtd2kBBK2qQVqAHRRiona_bVVkqd0qb_nxl7jQUSlSxb8q7X8uxjZrwz3wfw1pkMg3SYSIkySRHzxOSySJRCpxmvziMnOB8vs6NV-uVMn-3Ass2F4bDKdk2sF2q3sfyP_IMi3awnqshGn69_JcwaxburLYUGRmoF96mGGNuFvmRkrB70Z_Pl9x9dpqRq9pwn5DQVOpUxjaZJphtLmSeks2q2A7I7_1ZVnf35z5ZprYkWj-FhNCHFtOnzJ7Djq6fwqKVnEHG2PoOLg4OT_eTk-Nvp_KPASlxeGY5jtN4JjgsVHLx5ixxRXp0LXJ_T124vrgTrNSc2VVPJMbBu5MSiRpyoUzNv6P12fccgC_Twc1gt5qf7R0kkVkgszbgtna0tRpkJxilpg0YlkbRShsXEk2QMHTm5ZWqESgdHdwpvHKbBkr4PWgf1AnrVpvIvQXgkF5ecXByHcSp5G9FmBsModcY47fUAxq0MSxtRx5n8Yl12eMks95LkXtZyL_MBvPvzzHWDuXFv7b22a8o4_27LbrQM4H3bXV3x_1t7dX9rr-GBrEcIRwDuQW97c-ffkFWyNUPYLRaHQ-hPF7PZkq-HP7_Oh3EAUulKTn8Do43ipg
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEB6V9gAX3ohAgT3ACVY4u17bQaoQNKlS2gREU6k3d_bVIqVOaVIh_hy_jVl7jQUSvVWyfLC9Y2l2PA_P4wN4aXWGXljkQqDgKWLOdS4KLiVaFebVOQwNzpNpNj5MPx2pozX41fbChLLKVifWitouTPhH_laSbVYDWWTJ-_PvPKBGhexqC6GBEVrBbtUjxmJjx577-YNCuOXW7pD2-5UQO6PZ9phHlAFuSPxWdDamSDLttZXCeIVSIKnoDIuBG4hE05FTjCITlMpbulI4bTH1hoyfV8pLonsDNlKZDij42_g4mn752nVmyibHTct4oVIR23aa5r2-EDknG1mjK5Cf-7dp7Pzdf1K0teXbuQu3o8vKPjQydg_WXHUf7rRwECxqhwdwOhwebPODyefZ6B3Din0706Fu0jjLQh0qC8WiSwwV7NUJw_kJcXd1esaCHbVsUTUP2TDIN2JwERHL6lbQC3q_mV-GoQ60-CEcXguLH8F6tajcY2AOKaSmoBr7vp-KkLY0mUafpFZrq5zqQb_lYWnilPMAtjEvu_nMge8l8b2s-V7mPXj9Z815M-Pjyqc3260p4_e-LDvp7MGbdru62_-n9uRqai_g5ng22S_3d6d7T-GWqKUlVB9uwvrq4tI9I49opZ9HsWNwfN2S_htXnRs8
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60gnjxLVar7sGbBtvdbJJ6E2vxraAFb2H21RZqKm36_53Nw6ioIIQckn3AzC4zw8z3DSGHWgZgmQaPMWCeDxB6MmSRxzlo4fjqDDiA8919cNnzr1_EyycUf1btXqYkc0yDY2lK0pM3bU8q4FuLsdBD-5J1JkAfcZ4s-I4oAU90j51VyEie55jbGCRFwmcFbObnNb6apsrf_JYizSxPd5UsFy4jPct1vEbmTLJOVsp2DLS4nRtk0Ok8nXtPdw_PF6cUEjp8la5uURlNXR0odcWaU3AV5Emfwqg_ngzTwSt1dkzTcZIP0o5It-iBhYtomkExJ7i_Gs0cqQJO3iS97sXz-aVXNFLwFN6wFN9KRc1AWqk5U1YAZ4BWKICobVAyEp8QwzDeBC6sxi-RkRp8q9C-WyEs3yK1ZJyYbUINYEiLQS20bMtnLm2oAgm26WsptTCiTlqlDGNVsIy7ZhejuOJHdnKPUe5xJvc4rJOjjzlvOcfGn6MbpWri4r5NY46em2jzKGjWyXGprur376vt_G_4AVl87HTj26v7m12yxLLD44oBG6SWTmZmDx2UVO5nZ_AdvtbbOw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DDSC-SMOTE%3A+an+imbalanced+data+oversampling+algorithm+based+on+data+distribution+and+spectral+clustering&rft.jtitle=The+Journal+of+supercomputing&rft.au=Li%2C+Xinqi&rft.au=Liu%2C+Qicheng&rft.date=2024-08-01&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=80&rft.issue=12&rft.spage=17760&rft.epage=17789&rft_id=info:doi/10.1007%2Fs11227-024-06132-7&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11227_024_06132_7
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon