SMOTified-GAN for class imbalanced pattern classification problems
Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative rate (TNR) for a majority positive training dataset. Generally, the pre-processing technique of oversampling of minority class(es) are used to...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
27.03.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 2331-8422 |
DOI | 10.48550/arxiv.2108.03235 |
Cover
Abstract | Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative rate (TNR) for a majority positive training dataset. Generally, the pre-processing technique of oversampling of minority class(es) are used to overcome this deficiency. Our focus is on using the hybridization of Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to address class imbalanced problems. We propose a novel two-phase oversampling approach involving knowledge transfer that has the synergy of SMOTE and GAN. The unrealistic or overgeneralized samples of SMOTE are transformed into realistic distribution of data by GAN where there is not enough minority class data available for GAN to process them by itself effectively. We named it SMOTified-GAN as GAN works on pre-sampled minority data produced by SMOTE rather than randomly generating the samples itself. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets. Its performance is improved by up to 9\% from the next best algorithm tested on F1-score measurements. Its time complexity is also reasonable which is around \(O(N^2d^2T)\) for a sequential algorithm. |
---|---|
AbstractList | in IEEE Access, vol. 10, pp. 30655-30665, 2022 Class imbalance in a dataset is a major problem for classifiers that results
in poor prediction with a high true positive rate (TPR) but a low true negative
rate (TNR) for a majority positive training dataset. Generally, the
pre-processing technique of oversampling of minority class(es) are used to
overcome this deficiency. Our focus is on using the hybridization of Generative
Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique
(SMOTE) to address class imbalanced problems. We propose a novel two-phase
oversampling approach involving knowledge transfer that has the synergy of
SMOTE and GAN. The unrealistic or overgeneralized samples of SMOTE are
transformed into realistic distribution of data by GAN where there is not
enough minority class data available for GAN to process them by itself
effectively. We named it SMOTified-GAN as GAN works on pre-sampled minority
data produced by SMOTE rather than randomly generating the samples itself. The
experimental results prove the sample quality of minority class(es) has been
improved in a variety of tested benchmark datasets. Its performance is improved
by up to 9\% from the next best algorithm tested on F1-score measurements. Its
time complexity is also reasonable which is around $O(N^2d^2T)$ for a
sequential algorithm. Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative rate (TNR) for a majority positive training dataset. Generally, the pre-processing technique of oversampling of minority class(es) are used to overcome this deficiency. Our focus is on using the hybridization of Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to address class imbalanced problems. We propose a novel two-phase oversampling approach involving knowledge transfer that has the synergy of SMOTE and GAN. The unrealistic or overgeneralized samples of SMOTE are transformed into realistic distribution of data by GAN where there is not enough minority class data available for GAN to process them by itself effectively. We named it SMOTified-GAN as GAN works on pre-sampled minority data produced by SMOTE rather than randomly generating the samples itself. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets. Its performance is improved by up to 9\% from the next best algorithm tested on F1-score measurements. Its time complexity is also reasonable which is around \(O(N^2d^2T)\) for a sequential algorithm. |
Author | Singh, Prabhat Kumar Sharma, Anuraganand Chandra, Rohitash |
Author_xml | – sequence: 1 givenname: Anuraganand surname: Sharma fullname: Sharma, Anuraganand – sequence: 2 givenname: Prabhat surname: Singh middlename: Kumar fullname: Singh, Prabhat Kumar – sequence: 3 givenname: Rohitash surname: Chandra fullname: Chandra, Rohitash |
BackLink | https://doi.org/10.1109/ACCESS.2022.3158977$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.2108.03235$$DView paper in arXiv |
BookMark | eNotj8tOwzAQAC0EEqX0AzgRiXOKvX7EOZaKFqRCD-QebWNHcpUXdorg7zENe9nDjlYzN-Sy6ztLyB2jS6GlpI_ov93XEhjVS8qBywsyA85ZqgXANVmEcKSUgspASj4jTx9v-8LVzpp0u3pP6t4nVYMhJK49YINdZU0y4Dha302HyFY4ur5LBt8fGtuGW3JVYxPs4n_PSbF5LtYv6W6_fV2vdilK0KkRlGuTMaHyWmOVcSqAgUJAURlmNSBkmRE1sjggtc7rqCm0qRgqqoDPyf309hxYDt616H_Kv9DyHBqJh4mIZp8nG8by2J98F53KGJvnQmmp-S_9flXI |
ContentType | Paper Journal Article |
Copyright | 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX |
DOI | 10.48550/arxiv.2108.03235 |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One ProQuest Central SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering collection arXiv Computer Science arXiv.org |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
ExternalDocumentID | 2108_03235 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX |
ID | FETCH-LOGICAL-a528-d4038d71469f8ac73042126a2a4cd1e82a277d4fa111125889f00248dc1a60623 |
IEDL.DBID | BENPR |
IngestDate | Wed Jul 23 01:26:52 EDT 2025 Mon Jun 30 09:03:03 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a528-d4038d71469f8ac73042126a2a4cd1e82a277d4fa111125889f00248dc1a60623 |
Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
OpenAccessLink | https://www.proquest.com/docview/2559946858?pq-origsite=%requestingapplication%&accountid=15518 |
PQID | 2559946858 |
PQPubID | 2050157 |
ParticipantIDs | arxiv_primary_2108_03235 proquest_journals_2559946858 |
PublicationCentury | 2000 |
PublicationDate | 20220327 |
PublicationDateYYYYMMDD | 2022-03-27 |
PublicationDate_xml | – month: 03 year: 2022 text: 20220327 day: 27 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2022 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 1.7942841 |
SecondaryResourceType | preprint |
Snippet | Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative... in IEEE Access, vol. 10, pp. 30655-30665, 2022 Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true... |
SourceID | arxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Algorithms Computer Science - Artificial Intelligence Computer Science - Learning Datasets Generative adversarial networks Oversampling Pattern classification Sampling methods |
SummonAdditionalLinks | – databaseName: arXiv.org dbid: GOX link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwED61nVgQCFALBXlgNSS2EztjQbQVUtuBInWLHD-kDoSqD8TP5-ykYkCsfgw---6-s8_fAdxLx0VhjaO28oaih0KVQp2iTuZecsW8j0X7ZvN8-i5eV9mqA-T4F0Zvv9dfDT9wtXvEeEQ9JJzxrAtdxkJwNVmsmsfJSMXVjv8dhxgzNv0xrdFfjM_gtAV6ZNTszDl0XH0BT2-zxXLtEfbRyWhOEDASE-ArWX9UIcfQOEs2kfGybjpCIk-UHWkrv-wuYTl-WT5PaVvFgOqMKWpFwpWVaJAKr7SR4fogZblmWhibOsU0k9IKr4PtYplShY88Y9akGoMLxq-gV3_Wrg9EJV7k2J5nPhU-cRqxiStSazKNsIjnA-jHtZebhqiiDGIpo1gGMDyKo2wP6a6MbGMiENBf_z_zBk5YyPhPOGVyCL399uBu0Q_vq7u4GT_ca4YR priority: 102 providerName: Cornell University |
Title | SMOTified-GAN for class imbalanced pattern classification problems |
URI | https://www.proquest.com/docview/2559946858 https://arxiv.org/abs/2108.03235 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFH9sK4I3P9l0Sg9es7VJ26QHESf7QNgHOmG3kuUDdrCr2xRP_u0mWasHwUuhCRT6kvzey8vL7wdwQxWJUikUkkstkPFQZkmZNYUUTTQlDGvtRPvGk2T0Ej0u4kUNJtVdGFtWWWGiA2q5FjZH3nXUWJFlS78r3pBVjbKnq5WEBi-lFeStoxirg2cgOQ4a4PX6k9nTT9YFJ9R8iOyPNx2ZV5dvPlcfHbPzYZ2AYCv75rmmP-DsPM7gCLwZL9TmGGoqP4EDV6gptqfQex5P5yttAkc0vJ_4JuT0hQ2A_dXr0lYpCiX9wnFm5vsOWwrkrO-X2jHbM5gP-vOHESp1EBCPMUMyCgiT1EBaqhkX1CYgQpxwzCMhQ8Uwx5TKSHOLfjhmLNWOqUyKkJvtCSbn0MjXuWqCzwIdJaY9iXUY6UBxE92oNJQi5iawIkkLmu7fs2JPdZFZs2TOLC1oV-bIymm-zX4H5eL_7ks4xPbeQEAQpm1o7Dbv6sp4893yGupsMLwuB8q8DacL8xx_9b8BAEmgmg |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3JTsMwELWqVghurGqhQA5wdJvYTuwcEGLrQheQKFJvketF6oE0tGX7KP4R22nggMSNq31JZjzPb-zxGwBOqMIklkJBOdECmh3KhJSJKahopClmSGvXtG8wjDqP5HYcjkvgs3gLY8sqC0x0QC1nwp6RN500FrFq6efZM7Rdo-ztatFCI18WPfXxZlK2xVn32vj3FKHWzeiqA1ddBSAPEYOS-JhJagAi1owLatP5AEUccSJkoBjiiFJJNLdYgkLGYu10v6QIuCH7VufAIH6FYIxtBSFrtb-PdFBEzVfi_O7UKYU1-fx9-towaRVr-BjZnnIVN_QL-d121toElXueqfkWKKl0G6y5KlCx2AGXD4O70VQbVgrbF0PP8FlPWHbtTZ8mtgRSKOllTpAzzSdsnZFzrbdqTLPYBaP_MMceKKezVFWBx3xNIjMehTog2lfcUCcVB1KE3LA2HNVA1f17kuU6Gok1S-LMUgP1whzJKoYWyY_H9_-ePgbrndGgn_S7w94B2ED2gYKPIaJ1UF7OX9ShoQ3LyZFzlgeSf14cXzf00DA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SMOTified-GAN+for+class+imbalanced+pattern+classification+problems&rft.jtitle=arXiv.org&rft.au=Sharma%2C+Anuraganand&rft.au=Singh%2C+Prabhat+Kumar&rft.au=Chandra%2C+Rohitash&rft.date=2022-03-27&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2108.03235 |