DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering
Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most...
Saved in:
| Published in | The Journal of supercomputing Vol. 80; no. 12; pp. 17760 - 17789 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.08.2024
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0920-8542 1573-0484 |
| DOI | 10.1007/s11227-024-06132-7 |
Cover
| Abstract | Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most renowned methods for handling imbalanced data. However, both SMOTE and its variants have limitations due to their insufficient consideration of data distribution, leading to the generation of incorrect and unnecessary samples. This paper, therefore, introduces a novel oversampling algorithm called data distribution and spectral clustering-based SMOTE (DDSC-SMOTE). This algorithm addresses the shortcomings of SMOTE by introducing three innovative data distribution-based improvement strategies: adaptive allocation of synthetic sample quantities strategy, seed sample adaptive selection strategy, and synthetic sample improvement strategy. First, we use the
k
-nearest neighbor sample labels and the local outlier factor algorithm to remove noisy and outlier samples. Next, we leverage spectral clustering to identify clusters within the minority class and propose a dual-weight factor that considers inter-cluster and intra-cluster distances to allocate the number of synthetic samples effectively, addressing interclass and intraclass imbalances. Furthermore, we introduce a relative position weight coefficient to determine the probability of selecting seed samples within the subcluster, ensuring that important minority samples have higher chances of being sampled. Finally, we improve the SMOTE sample synthesis formula for safer generation. Extensive comparisons on real datasets from the UCI repository demonstrate that DDSC-SMOTE outperforms seven state-of-the-art oversampling algorithms significantly in terms of
G
-mean and
F
1-score, presenting a data distribution-focused solution for addressing imbalanced data challenges. |
|---|---|
| AbstractList | Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most renowned methods for handling imbalanced data. However, both SMOTE and its variants have limitations due to their insufficient consideration of data distribution, leading to the generation of incorrect and unnecessary samples. This paper, therefore, introduces a novel oversampling algorithm called data distribution and spectral clustering-based SMOTE (DDSC-SMOTE). This algorithm addresses the shortcomings of SMOTE by introducing three innovative data distribution-based improvement strategies: adaptive allocation of synthetic sample quantities strategy, seed sample adaptive selection strategy, and synthetic sample improvement strategy. First, we use the
k
-nearest neighbor sample labels and the local outlier factor algorithm to remove noisy and outlier samples. Next, we leverage spectral clustering to identify clusters within the minority class and propose a dual-weight factor that considers inter-cluster and intra-cluster distances to allocate the number of synthetic samples effectively, addressing interclass and intraclass imbalances. Furthermore, we introduce a relative position weight coefficient to determine the probability of selecting seed samples within the subcluster, ensuring that important minority samples have higher chances of being sampled. Finally, we improve the SMOTE sample synthesis formula for safer generation. Extensive comparisons on real datasets from the UCI repository demonstrate that DDSC-SMOTE outperforms seven state-of-the-art oversampling algorithms significantly in terms of
G
-mean and
F
1-score, presenting a data distribution-focused solution for addressing imbalanced data challenges. Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while accurately classifying minority class samples is more crucial. The synthetic minority oversampling technique (SMOTE) represents one of the most renowned methods for handling imbalanced data. However, both SMOTE and its variants have limitations due to their insufficient consideration of data distribution, leading to the generation of incorrect and unnecessary samples. This paper, therefore, introduces a novel oversampling algorithm called data distribution and spectral clustering-based SMOTE (DDSC-SMOTE). This algorithm addresses the shortcomings of SMOTE by introducing three innovative data distribution-based improvement strategies: adaptive allocation of synthetic sample quantities strategy, seed sample adaptive selection strategy, and synthetic sample improvement strategy. First, we use the k-nearest neighbor sample labels and the local outlier factor algorithm to remove noisy and outlier samples. Next, we leverage spectral clustering to identify clusters within the minority class and propose a dual-weight factor that considers inter-cluster and intra-cluster distances to allocate the number of synthetic samples effectively, addressing interclass and intraclass imbalances. Furthermore, we introduce a relative position weight coefficient to determine the probability of selecting seed samples within the subcluster, ensuring that important minority samples have higher chances of being sampled. Finally, we improve the SMOTE sample synthesis formula for safer generation. Extensive comparisons on real datasets from the UCI repository demonstrate that DDSC-SMOTE outperforms seven state-of-the-art oversampling algorithms significantly in terms of G-mean and F1-score, presenting a data distribution-focused solution for addressing imbalanced data challenges. |
| Author | Liu, Qicheng Li, Xinqi |
| Author_xml | – sequence: 1 givenname: Xinqi surname: Li fullname: Li, Xinqi organization: School of Computer and Control Engineering, Yantai University – sequence: 2 givenname: Qicheng surname: Liu fullname: Liu, Qicheng email: ytliuqc@163.com organization: School of Computer and Control Engineering, Yantai University |
| BookMark | eNp9kEtLAzEYRYMo2Fb_gKuA69FMHvNwJ219QKWL1nX4JpNpU-ZlkhH896aOILgohAQ-7klyzxSdt12rEbqJyV1MSHrv4pjSNCKURySJGY3SMzSJRcoiwjN-jiYkpyTKBKeXaOrcgRDCWcomaL9YbObR5m29XT5gaLFpCqihVbrEJXjA3ae2Dpq-Nu0OQ73rrPH7BhfgQqJrx1BpnLemGLwJE2hL7HqtvIUaq3pwXtsAX6GLCmqnr3_PGXp_Wm7nL9Fq_fw6f1xFisW5D7tSGUmKqigZVZUARiHPSQJZrkOFIqyU8pwRYKIqwyTTRQm8UpTySoiKzdDteG9vu49BOy8P3WDb8KRkVCQiZ1lCQoqOKWU756yuZG9NA_ZLxkQejcrRqAxG5Y9RmQYo-wcp4-FYOnQ19WmUjajrjzK0_fvVCeob9s2NJg |
| CitedBy_id | crossref_primary_10_1016_j_patter_2024_101073 |
| Cites_doi | 10.1016/j.inffus.2019.07.006 10.1109/TR.2020.3020238 10.1016/j.ins.2020.07.014 10.1016/j.ins.2021.04.017 10.1109/TIT.1967.1053964 10.1007/s13748-016-0094-0 10.1109/TKDE.2008.239 10.1145/1007730.1007735 10.1016/j.ins.2022.02.038 10.1016/j.ins.2019.02.062 10.1007/s11222-007-9033-z 10.1007/BF00994018 10.1016/j.ins.2017.04.044 10.1613/jair.953 10.1016/j.inffus.2023.102150 10.1016/j.knosys.2020.105845 10.1016/j.patrec.2020.03.016 10.1016/j.ins.2018.06.056 10.1007/s10515-021-00311-z 10.1109/TKDE.2012.232 10.1016/j.asoc.2021.108288 10.1016/j.knosys.2022.108296 10.1016/j.eswa.2021.116213 10.1016/j.ins.2020.10.013 10.1016/j.neucom.2022.05.017 10.1007/s10489-021-02341-2 10.1007/s10489-022-03512-5 10.1016/j.asoc.2022.108618 10.1016/j.eswa.2021.115297 10.1007/978-3-642-24958-7_85 10.1007/11538059_91 10.1155/2022/3068199 10.1007/978-3-642-01307-2_43 10.1109/TFUZZ.2023.3287193 10.1145/342009.335388 |
| ContentType | Journal Article |
| Copyright | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. |
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. |
| DBID | AAYXX CITATION 8FE 8FG ABJCF AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L6V M7S P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.1007/s11227-024-06132-7 |
| DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest SciTech Premium Collection Technology Collection Materials Science & Engineering Database ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials - QC ProQuest Central ProQuest Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Engineering Collection Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | CrossRef Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Computer Science Database |
| Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0484 |
| EndPage | 17789 |
| ExternalDocumentID | 10_1007_s11227_024_06132_7 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION 8FE 8FG ABJCF AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L6V M7S P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-LOGICAL-c319t-c3cc806bfbd32cf5a32a9906a89e920b20b724930a35fd9208ebda4fc224f55f3 |
| IEDL.DBID | BENPR |
| ISSN | 0920-8542 |
| IngestDate | Mon Oct 06 18:31:56 EDT 2025 Thu Apr 24 23:06:47 EDT 2025 Wed Oct 01 03:43:58 EDT 2025 Fri Feb 21 02:39:56 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Keywords | Imbalanced data Spectral clustering SMOTE Oversampling Data preprocessing |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c319t-c3cc806bfbd32cf5a32a9906a89e920b20b724930a35fd9208ebda4fc224f55f3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 3256593860 |
| PQPubID | 2043774 |
| PageCount | 30 |
| ParticipantIDs | proquest_journals_3256593860 crossref_primary_10_1007_s11227_024_06132_7 crossref_citationtrail_10_1007_s11227_024_06132_7 springer_journals_10_1007_s11227_024_06132_7 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 20240800 2024-08-00 20240801 |
| PublicationDateYYYYMMDD | 2024-08-01 |
| PublicationDate_xml | – month: 8 year: 2024 text: 20240800 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationSubtitle | An International Journal of High-Performance Computer Design, Analysis, and Use |
| PublicationTitle | The Journal of supercomputing |
| PublicationTitleAbbrev | J Supercomput |
| PublicationYear | 2024 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Chen, Yan, Han, Wang, Peng, Wang, Yang (CR9) 2018; 433 Yan, Jiang, Zheng, Yu, Zhang, Zhang (CR37) 2022; 191 Vo, Nguyen, Vo, Le (CR19) 2021; 51 Tao, Li, Guo, Ren, Li, Liu, Zou (CR36) 2019; 487 Chawla, Bowyer, Hall, Kegelmeyer (CR10) 2002; 16 Barua, Islam, Yao, Murase (CR28) 2012; 26 Sun, Zhang, Chen, Ge (CR3) 2022; 34 Dai, Song, Si, Yang, Hu, Wang (CR24) 2021; 569 Sun, Li, Fujita, Fu, Ai (CR5) 2020; 54 CR14 CR12 Douzas, Bacao, Last (CR26) 2018; 465 Krawczyk (CR6) 2016; 5 CR11 CR33 Soltanzadeh, Hashemzadeh (CR29) 2021; 542 Chen, Xia, Chen, Wang, Wang (CR15) 2020; 553 Liu (CR18) 2023; 53 CR2 Islam, Belhaouari, Rehman, Bensmail (CR35) 2022; 115 Ren, Zhu, Kang, Fu, Niu, Gao, Yan, Hong (CR8) 2022; 241 Chakraborty, Chakraborty (CR4) 2020; 70 Yin, Chen, Wan, Zhang, Horng, Li (CR16) 2024; 104 CR25 Zhang, Yu, Huan, Yang, Zheng, Gao (CR38) 2022; 595 Cover, Hart (CR32) 1967; 13 He, Garcia (CR17) 2009; 21 CR21 Balaram, Vasundra (CR7) 2022; 29 CR20 Meng, Li (CR23) 2022; 120 Huang, Huang, Fang, Xu, Qu, Zhai, Li (CR1) 2020; 133 Batista, Prati, Monard (CR34) 2004; 6 Chen, Zhang, Huang, Wu, Luo (CR22) 2022; 498 Dudjak, Martinović (CR13) 2021; 182 Von Luxburg (CR27) 2007; 17 Liang, Jiang, Li, Xue, Wang (CR30) 2020; 196 Cortes, Vapnik (CR31) 1995; 20 A Balaram (6132_CR7) 2022; 29 6132_CR14 6132_CR12 G Douzas (6132_CR26) 2018; 465 6132_CR11 Q Chen (6132_CR22) 2022; 498 S Barua (6132_CR28) 2012; 26 6132_CR33 ZX Chen (6132_CR9) 2018; 433 B Krawczyk (6132_CR6) 2016; 5 MT Vo (6132_CR19) 2021; 51 C Cortes (6132_CR31) 1995; 20 A Islam (6132_CR35) 2022; 115 Y Sun (6132_CR3) 2022; 34 M Dudjak (6132_CR13) 2021; 182 GE Batista (6132_CR34) 2004; 6 J Sun (6132_CR5) 2020; 54 H He (6132_CR17) 2009; 21 P Soltanzadeh (6132_CR29) 2021; 542 NV Chawla (6132_CR10) 2002; 16 U Von Luxburg (6132_CR27) 2007; 17 Y Yan (6132_CR37) 2022; 191 6132_CR21 B Chen (6132_CR15) 2020; 553 6132_CR20 F Dai (6132_CR24) 2021; 569 6132_CR25 T Cover (6132_CR32) 1967; 13 Z Ren (6132_CR8) 2022; 241 T Yin (6132_CR16) 2024; 104 T Chakraborty (6132_CR4) 2020; 70 R Liu (6132_CR18) 2023; 53 D Meng (6132_CR23) 2022; 120 X Tao (6132_CR36) 2019; 487 C Huang (6132_CR1) 2020; 133 6132_CR2 A Zhang (6132_CR38) 2022; 595 XW Liang (6132_CR30) 2020; 196 |
| References_xml | – volume: 54 start-page: 128 year: 2020 end-page: 144 ident: CR5 article-title: Class-imbalanced dynamic financial distress prediction based on adaboost-SVM ensemble combined with SMOTE and time weighting publication-title: Inf Fusion doi: 10.1016/j.inffus.2019.07.006 – volume: 70 start-page: 481 issue: 2 year: 2020 end-page: 494 ident: CR4 article-title: Hellinger net: a hybrid imbalance learning model to improve software defect prediction publication-title: IEEE Trans Reliab doi: 10.1109/TR.2020.3020238 – ident: CR14 – ident: CR2 – ident: CR12 – volume: 542 start-page: 92 year: 2021 end-page: 111 ident: CR29 article-title: RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem publication-title: Inf Sci doi: 10.1016/j.ins.2020.07.014 – volume: 569 start-page: 70 year: 2021 end-page: 89 ident: CR24 article-title: Improved CBSO: a distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data publication-title: Inf Sci doi: 10.1016/j.ins.2021.04.017 – volume: 13 start-page: 21 issue: 1 year: 1967 end-page: 27 ident: CR32 article-title: Nearest neighbor pattern classification publication-title: IEEE Trans Inf Theory doi: 10.1109/TIT.1967.1053964 – volume: 5 start-page: 221 issue: 4 year: 2016 end-page: 232 ident: CR6 article-title: Learning from imbalanced data: open challenges and future directions publication-title: Prog Artif Intell doi: 10.1007/s13748-016-0094-0 – ident: CR33 – volume: 21 start-page: 1263 issue: 9 year: 2009 end-page: 1284 ident: CR17 article-title: Learning from imbalanced data publication-title: IEEE Trans Knowl Data Eng doi: 10.1109/TKDE.2008.239 – volume: 6 start-page: 20 issue: 1 year: 2004 end-page: 29 ident: CR34 article-title: A study of the behavior of several methods for balancing machine learning training data publication-title: ACM SIGKDD Explor Newsl doi: 10.1145/1007730.1007735 – volume: 595 start-page: 70 year: 2022 end-page: 88 ident: CR38 article-title: SMOTE-RkNN: a hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors publication-title: Inf Sci doi: 10.1016/j.ins.2022.02.038 – volume: 487 start-page: 31 year: 2019 end-page: 56 ident: CR36 article-title: Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification publication-title: Inf Sci doi: 10.1016/j.ins.2019.02.062 – ident: CR25 – ident: CR21 – volume: 17 start-page: 395 year: 2007 end-page: 416 ident: CR27 article-title: A tutorial on spectral clustering publication-title: Stat Comput doi: 10.1007/s11222-007-9033-z – volume: 20 start-page: 273 year: 1995 end-page: 297 ident: CR31 article-title: Support-vector networks publication-title: Mach Learn doi: 10.1007/BF00994018 – volume: 433 start-page: 346 year: 2018 end-page: 364 ident: CR9 article-title: Ma-chine learning based mobile malware detection using highly imbalanced network traffic publication-title: Inf Sci doi: 10.1016/j.ins.2017.04.044 – volume: 16 start-page: 321 year: 2002 end-page: 357 ident: CR10 article-title: SMOTE: synthetic minority over-sampling technique publication-title: J Artif Intell Res doi: 10.1613/jair.953 – volume: 104 start-page: 102150 year: 2024 ident: CR16 article-title: Exploiting feature multi-correlations for multilabel feature selection in robust multi-neighborhood fuzzy β covering space publication-title: Inf Fusion doi: 10.1016/j.inffus.2023.102150 – volume: 196 start-page: 105845 year: 2020 ident: CR30 article-title: LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM publication-title: Knowl-Based Syst doi: 10.1016/j.knosys.2020.105845 – volume: 133 start-page: 280 year: 2020 end-page: 286 ident: CR1 article-title: Sample imbalance disease classification model based on association rule feature selection publication-title: Pattern Recogn Lett doi: 10.1016/j.patrec.2020.03.016 – volume: 34 start-page: 105 issue: 06 year: 2022 end-page: 113 ident: CR3 article-title: Power data anomaly detection algorithm based on multi-domain feature extraction publication-title: Proc CSU-EPSA – volume: 465 start-page: 1 year: 2018 end-page: 20 ident: CR26 article-title: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE publication-title: Inf Sci doi: 10.1016/j.ins.2018.06.056 – volume: 29 start-page: 6 issue: 1 year: 2022 ident: CR7 article-title: Prediction of software fault–prone classes using ensemble random forest with adaptive synthetic sampling algorithm publication-title: Autom Softw Eng doi: 10.1007/s10515-021-00311-z – ident: CR11 – volume: 26 start-page: 405 issue: 2 year: 2012 end-page: 425 ident: CR28 article-title: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning publication-title: IEEE Trans Knowl Data Eng doi: 10.1109/TKDE.2012.232 – volume: 115 start-page: 108288 year: 2022 ident: CR35 article-title: KNNOR: an oversampling technique for imbalanced datasets publication-title: Appl Soft Comput doi: 10.1016/j.asoc.2021.108288 – volume: 241 start-page: 108296 year: 2022 ident: CR8 article-title: Adaptive cost-sensitive learning: improving the conver-gence of intelligent diagnosis models under imbalanced data publication-title: Knowl-Based Syst doi: 10.1016/j.knosys.2022.108296 – volume: 191 start-page: 116213 year: 2022 ident: CR37 article-title: LDAS: local density-based adaptive sampling for imbalanced data classification publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2021.116213 – volume: 553 start-page: 397 year: 2020 end-page: 428 ident: CR15 article-title: RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise publication-title: Inf Sci doi: 10.1016/j.ins.2020.10.013 – volume: 498 start-page: 75 year: 2022 end-page: 88 ident: CR22 article-title: PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets publication-title: Neurocomputing doi: 10.1016/j.neucom.2022.05.017 – volume: 51 start-page: 7827 year: 2021 end-page: 7836 ident: CR19 article-title: Noise-adaptive synthetic oversampling technique publication-title: Appl Intell doi: 10.1007/s10489-021-02341-2 – volume: 53 start-page: 786 issue: 1 year: 2023 end-page: 803 ident: CR18 article-title: A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification publication-title: Appl Intell doi: 10.1007/s10489-022-03512-5 – volume: 120 start-page: 108618 year: 2022 ident: CR23 article-title: An imbalanced learning method by combining SMOTE with Center Offset Factor publication-title: Appl Soft Comput doi: 10.1016/j.asoc.2022.108618 – volume: 182 start-page: 115297 year: 2021 ident: CR13 article-title: An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2021.115297 – ident: CR20 – ident: 6132_CR21 doi: 10.1007/978-3-642-24958-7_85 – volume: 465 start-page: 1 year: 2018 ident: 6132_CR26 publication-title: Inf Sci doi: 10.1016/j.ins.2018.06.056 – ident: 6132_CR11 doi: 10.1007/11538059_91 – volume: 196 start-page: 105845 year: 2020 ident: 6132_CR30 publication-title: Knowl-Based Syst doi: 10.1016/j.knosys.2020.105845 – volume: 115 start-page: 108288 year: 2022 ident: 6132_CR35 publication-title: Appl Soft Comput doi: 10.1016/j.asoc.2021.108288 – volume: 133 start-page: 280 year: 2020 ident: 6132_CR1 publication-title: Pattern Recogn Lett doi: 10.1016/j.patrec.2020.03.016 – ident: 6132_CR33 – volume: 191 start-page: 116213 year: 2022 ident: 6132_CR37 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2021.116213 – volume: 34 start-page: 105 issue: 06 year: 2022 ident: 6132_CR3 publication-title: Proc CSU-EPSA – ident: 6132_CR12 – volume: 542 start-page: 92 year: 2021 ident: 6132_CR29 publication-title: Inf Sci doi: 10.1016/j.ins.2020.07.014 – volume: 487 start-page: 31 year: 2019 ident: 6132_CR36 publication-title: Inf Sci doi: 10.1016/j.ins.2019.02.062 – volume: 21 start-page: 1263 issue: 9 year: 2009 ident: 6132_CR17 publication-title: IEEE Trans Knowl Data Eng doi: 10.1109/TKDE.2008.239 – volume: 13 start-page: 21 issue: 1 year: 1967 ident: 6132_CR32 publication-title: IEEE Trans Inf Theory doi: 10.1109/TIT.1967.1053964 – volume: 104 start-page: 102150 year: 2024 ident: 6132_CR16 publication-title: Inf Fusion doi: 10.1016/j.inffus.2023.102150 – volume: 29 start-page: 6 issue: 1 year: 2022 ident: 6132_CR7 publication-title: Autom Softw Eng doi: 10.1007/s10515-021-00311-z – volume: 16 start-page: 321 year: 2002 ident: 6132_CR10 publication-title: J Artif Intell Res doi: 10.1613/jair.953 – volume: 26 start-page: 405 issue: 2 year: 2012 ident: 6132_CR28 publication-title: IEEE Trans Knowl Data Eng doi: 10.1109/TKDE.2012.232 – volume: 20 start-page: 273 year: 1995 ident: 6132_CR31 publication-title: Mach Learn doi: 10.1007/BF00994018 – volume: 5 start-page: 221 issue: 4 year: 2016 ident: 6132_CR6 publication-title: Prog Artif Intell doi: 10.1007/s13748-016-0094-0 – volume: 569 start-page: 70 year: 2021 ident: 6132_CR24 publication-title: Inf Sci doi: 10.1016/j.ins.2021.04.017 – volume: 53 start-page: 786 issue: 1 year: 2023 ident: 6132_CR18 publication-title: Appl Intell doi: 10.1007/s10489-022-03512-5 – volume: 498 start-page: 75 year: 2022 ident: 6132_CR22 publication-title: Neurocomputing doi: 10.1016/j.neucom.2022.05.017 – volume: 54 start-page: 128 year: 2020 ident: 6132_CR5 publication-title: Inf Fusion doi: 10.1016/j.inffus.2019.07.006 – ident: 6132_CR2 doi: 10.1155/2022/3068199 – volume: 433 start-page: 346 year: 2018 ident: 6132_CR9 publication-title: Inf Sci doi: 10.1016/j.ins.2017.04.044 – volume: 70 start-page: 481 issue: 2 year: 2020 ident: 6132_CR4 publication-title: IEEE Trans Reliab doi: 10.1109/TR.2020.3020238 – ident: 6132_CR20 doi: 10.1007/978-3-642-01307-2_43 – volume: 553 start-page: 397 year: 2020 ident: 6132_CR15 publication-title: Inf Sci doi: 10.1016/j.ins.2020.10.013 – volume: 51 start-page: 7827 year: 2021 ident: 6132_CR19 publication-title: Appl Intell doi: 10.1007/s10489-021-02341-2 – volume: 182 start-page: 115297 year: 2021 ident: 6132_CR13 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2021.115297 – ident: 6132_CR14 doi: 10.1109/TFUZZ.2023.3287193 – volume: 17 start-page: 395 year: 2007 ident: 6132_CR27 publication-title: Stat Comput doi: 10.1007/s11222-007-9033-z – volume: 6 start-page: 20 issue: 1 year: 2004 ident: 6132_CR34 publication-title: ACM SIGKDD Explor Newsl doi: 10.1145/1007730.1007735 – volume: 241 start-page: 108296 year: 2022 ident: 6132_CR8 publication-title: Knowl-Based Syst doi: 10.1016/j.knosys.2022.108296 – volume: 120 start-page: 108618 year: 2022 ident: 6132_CR23 publication-title: Appl Soft Comput doi: 10.1016/j.asoc.2022.108618 – ident: 6132_CR25 doi: 10.1145/342009.335388 – volume: 595 start-page: 70 year: 2022 ident: 6132_CR38 publication-title: Inf Sci doi: 10.1016/j.ins.2022.02.038 |
| SSID | ssj0004373 |
| Score | 2.3884327 |
| Snippet | Imbalanced data poses a significant challenge in machine learning, as conventional classification algorithms often prioritize majority class samples, while... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 17760 |
| SubjectTerms | Accuracy Adaptive sampling Algorithms Bank fraud Classification Clustering Compilers Computer Science Datasets Interpreters Machine learning Noise Oversampling Processor Architectures Programming Languages |
| SummonAdditionalLinks | – databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60Xrz4FqtV9uBNA-luNkm9lT4oQvXQFnoLs6-2UFPp4_87m4dBUUEIOSSzG5iZzcwwM98Qcq9lCJZp8BgD5gUAkScjFnucgxYOr86Aa3AevoSDSfA8FdOiKWxTVruXKcnsT101uzUZizy0Kdk0AvQL98mBcHBeqMUT1q66IXmeV25hYBSLgBWtMj_v8dUcVT7mt7RoZm36J-SocBNpO5frKdkz6Rk5Lkcw0OJEnpN5tzvqeKPh67j3RCGlizfpahWV0dTVflJXoLkBVzWezigsZ6v1Yjt_o852abpKcyLtwHOLuVe4iaZZ--Uav6-WOwekgIsvyKTfG3cGXjE8wVN4qrZ4Vyr2Q2ml5kxZAZwBWp4Q4pZBzki8Igy9uA9cWI1PYiM1BFahTbdCWH5JaukqNVeEGsAwFgNZaNpmwFyqUIUSrB9oKbUwok6aJQ8TVSCLuwEXy6TCRHZ8T5DvScb3JKqTh8817zmuxp_UjVI0SXHGNglHb020eBz6dfJYiqt6_ftu1_8jvyGHLNMYV_XXILXtemdu0RPZyrtM8T4AB-rUEw priority: 102 providerName: Springer Nature |
| Title | DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering |
| URI | https://link.springer.com/article/10.1007/s11227-024-06132-7 https://www.proquest.com/docview/3256593860 |
| Volume | 80 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: EBSCO - Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1573-0484 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: ABDBF dateStart: 20030501 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1573-0484 dateEnd: 20241102 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: ADMLS dateStart: 19870101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: AFBBN dateStart: 19970101 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9NAEB61yYULb0SgRHvgBhbJrtd2kBBK2qQVqAHRRiona_bVVkqd0qb_nxl7jQUSlSxb8q7X8uxjZrwz3wfw1pkMg3SYSIkySRHzxOSySJRCpxmvziMnOB8vs6NV-uVMn-3Ass2F4bDKdk2sF2q3sfyP_IMi3awnqshGn69_JcwaxburLYUGRmoF96mGGNuFvmRkrB70Z_Pl9x9dpqRq9pwn5DQVOpUxjaZJphtLmSeks2q2A7I7_1ZVnf35z5ZprYkWj-FhNCHFtOnzJ7Djq6fwqKVnEHG2PoOLg4OT_eTk-Nvp_KPASlxeGY5jtN4JjgsVHLx5ixxRXp0LXJ_T124vrgTrNSc2VVPJMbBu5MSiRpyoUzNv6P12fccgC_Twc1gt5qf7R0kkVkgszbgtna0tRpkJxilpg0YlkbRShsXEk2QMHTm5ZWqESgdHdwpvHKbBkr4PWgf1AnrVpvIvQXgkF5ecXByHcSp5G9FmBsModcY47fUAxq0MSxtRx5n8Yl12eMks95LkXtZyL_MBvPvzzHWDuXFv7b22a8o4_27LbrQM4H3bXV3x_1t7dX9rr-GBrEcIRwDuQW97c-ffkFWyNUPYLRaHQ-hPF7PZkq-HP7_Oh3EAUulKTn8Do43ipg |
| linkProvider | ProQuest |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEB6V9gAX3ohAgT3ACVY4u17bQaoQNKlS2gREU6k3d_bVIqVOaVIh_hy_jVl7jQUSvVWyfLC9Y2l2PA_P4wN4aXWGXljkQqDgKWLOdS4KLiVaFebVOQwNzpNpNj5MPx2pozX41fbChLLKVifWitouTPhH_laSbVYDWWTJ-_PvPKBGhexqC6GBEVrBbtUjxmJjx577-YNCuOXW7pD2-5UQO6PZ9phHlAFuSPxWdDamSDLttZXCeIVSIKnoDIuBG4hE05FTjCITlMpbulI4bTH1hoyfV8pLonsDNlKZDij42_g4mn752nVmyibHTct4oVIR23aa5r2-EDknG1mjK5Cf-7dp7Pzdf1K0teXbuQu3o8vKPjQydg_WXHUf7rRwECxqhwdwOhwebPODyefZ6B3Din0706Fu0jjLQh0qC8WiSwwV7NUJw_kJcXd1esaCHbVsUTUP2TDIN2JwERHL6lbQC3q_mV-GoQ60-CEcXguLH8F6tajcY2AOKaSmoBr7vp-KkLY0mUafpFZrq5zqQb_lYWnilPMAtjEvu_nMge8l8b2s-V7mPXj9Z815M-Pjyqc3260p4_e-LDvp7MGbdru62_-n9uRqai_g5ng22S_3d6d7T-GWqKUlVB9uwvrq4tI9I49opZ9HsWNwfN2S_htXnRs8 |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60gnjxLVar7sGbBtvdbJJ6E2vxraAFb2H21RZqKm36_53Nw6ioIIQckn3AzC4zw8z3DSGHWgZgmQaPMWCeDxB6MmSRxzlo4fjqDDiA8919cNnzr1_EyycUf1btXqYkc0yDY2lK0pM3bU8q4FuLsdBD-5J1JkAfcZ4s-I4oAU90j51VyEie55jbGCRFwmcFbObnNb6apsrf_JYizSxPd5UsFy4jPct1vEbmTLJOVsp2DLS4nRtk0Ok8nXtPdw_PF6cUEjp8la5uURlNXR0odcWaU3AV5Emfwqg_ngzTwSt1dkzTcZIP0o5It-iBhYtomkExJ7i_Gs0cqQJO3iS97sXz-aVXNFLwFN6wFN9KRc1AWqk5U1YAZ4BWKICobVAyEp8QwzDeBC6sxi-RkRp8q9C-WyEs3yK1ZJyYbUINYEiLQS20bMtnLm2oAgm26WsptTCiTlqlDGNVsIy7ZhejuOJHdnKPUe5xJvc4rJOjjzlvOcfGn6MbpWri4r5NY46em2jzKGjWyXGprur376vt_G_4AVl87HTj26v7m12yxLLD44oBG6SWTmZmDx2UVO5nZ_AdvtbbOw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DDSC-SMOTE%3A+an+imbalanced+data+oversampling+algorithm+based+on+data+distribution+and+spectral+clustering&rft.jtitle=The+Journal+of+supercomputing&rft.au=Li%2C+Xinqi&rft.au=Liu%2C+Qicheng&rft.date=2024-08-01&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=80&rft.issue=12&rft.spage=17760&rft.epage=17789&rft_id=info:doi/10.1007%2Fs11227-024-06132-7&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11227_024_06132_7 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon |