Python code smells detection using conventional machine learning models

Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 9; p. e1370
Main Authors	Sandouka, Rana, Aljamaan, Hamoud
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 29.05.2023 PeerJ Inc
Subjects	Algorithms Analysis Artificial Intelligence Code smell Data mining Data Mining and Machine Learning Detection Java (Computer program language) Large class Long method Machine learning Python Software Engineering Large class Code smell Detection Long method Machine learning Python
Online Access	Get full text
ISSN	2376-5992 2376-5992
DOI	10.7717/peerj-cs.1370

Cover

Abstract	Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89.
AbstractList	Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89. Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89.Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89.
ArticleNumber	e1370
Audience	Academic
Author	Aljamaan, Hamoud Sandouka, Rana
Author_xml	– sequence: 1 givenname: Rana surname: Sandouka fullname: Sandouka, Rana organization: Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia – sequence: 2 givenname: Hamoud surname: Aljamaan fullname: Aljamaan, Hamoud organization: Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/37346528$$D View this record in MEDLINE/PubMed
BookMark	eNp9ks1v1DAQxSNUREvpkStaiQscsthxHDsnVFW0rFQJxMfZmtiTXa8ce4mTwv73dbqlNEgQH2LNvPeT59nPsyMfPGbZS0qWQlDxbofYb3Mdl5QJ8iQ7KZiocl7XxdGj_XF2FuOWEEI5TV_9LDtmgpUVL-RJdvV5P2yCX-hgcBE7dC4uDA6oB5uqY7R-nXr-Bv1UALfoQG-sx4VD6P3U7ZLTxRfZ0xZcxLP7_2n2_fLDt4uP-fWnq9XF-XWuOedDbgQtSsNF03BmDEpsidQ1FBVvWEvbCoBipQEFCmMK1MIgM4WpkGvS6oay02x14JoAW7XrbQf9XgWw6q4Q-rWCfrDaoaqp1EIKrKE0JRENMNlCCxpKQqUxTWItD6zR72D_E5x7AFKipoDVXcBKRzUFnAzvD4bd2HRodAqlBzc7xbzj7Uatw03CFZKUciK8uSf04ceIcVCdjTqlDh7DGFUhCyk4rzhP0tcH6RrSMNa3ISH1JFfngk_EWrA_M8xUaRnsbLo4bG2qzwxvZ4akGfDXsIYxRrX6-mWuffV43odBfz-fJGAHge5DjD22StsBppeSTmHdP2PM_3L9P_Zb0zfsjA
CitedBy_id	crossref_primary_10_1016_j_knosys_2024_111390 crossref_primary_10_1038_s41598_023_43380_8 crossref_primary_10_1007_s10664_024_10445_9 crossref_primary_10_1007_s10515_024_00429_w crossref_primary_10_1007_s42979_025_03680_4 crossref_primary_10_7717_peerj_cs_2254 crossref_primary_10_3846_ntcs_2024_21305
Cites_doi	10.1016/j.jss.2018.05.057 10.1016/j.asoc.2019.105524 10.1016/j.entcs.2005.02.059 10.1016/j.jss.2021.110936 10.1142/S0218194021500431 10.1016/j.jss.2010.11.921 10.1007/978-981-19-0901-6_25 10.1109/ACCESS.2021.3084050 10.1007/3-540-45672-4_31 10.1186/s12864-019-6419-1 10.1007/s10664-015-9378-4 10.1016/j.infsof.2018.12.009 10.1016/j.jss.2020.110610 10.1007/s11390-020-0323-7 10.22152/programming-journal.org/2017/1/11 10.1007/s11219-020-09498-y 10.1109/32.6156 10.1016/j.scico.2021.102713 10.1109/TSE.2009.50 10.1016/j.infsof.2021.106648 10.1109/TSE.2016.2584050 10.1007/s10664-017-9535-z 10.1007/s13369-019-04311-w 10.1145/3383219.3383264 10.1109/TSE.2014.2327044
ContentType	Journal Article
Copyright	2023 Sandouka and Aljamaan. COPYRIGHT 2023 PeerJ. Ltd. 2023 Sandouka and Aljamaan 2023 Sandouka and Aljamaan
Copyright_xml	– notice: 2023 Sandouka and Aljamaan. – notice: COPYRIGHT 2023 PeerJ. Ltd. – notice: 2023 Sandouka and Aljamaan 2023 Sandouka and Aljamaan
DBID	AAYXX CITATION NPM ISR 7X8 5PM ADTOC UNPAY DOA
DOI	10.7717/peerj-cs.1370
DatabaseName	CrossRef PubMed Gale In Context: Science MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall Openly Available Collection - DOAJ
DatabaseTitle	CrossRef PubMed MEDLINE - Academic
DatabaseTitleList	CrossRef PubMed MEDLINE - Academic
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2376-5992
ExternalDocumentID	oai_doaj_org_article_918c787e9a4d407ba38fafaca4018ddb 10.7717/peerj-cs.1370 PMC10280480 A751028973 37346528 10_7717_peerj_cs_1370
Genre	Journal Article
GrantInformation_xml	– fundername: King Fahd University of Petroleum and Minerals (KFUPM)
GroupedDBID	53G 5VS 8FE 8FG AAFWJ AAYXX ABUWG ADBBV AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS ARAPS AZQEC BCNDV BENPR BGLVJ BPHCQ CCPQU CITATION DWQXO FRP GNUQQ GROUPED_DOAJ HCIFZ IAO ICD IEA ISR ITC K6V K7- M~E OK1 P62 PHGZM PHGZT PIMPY PQGLB PQQKQ PROAC PUEGO RPM 3V. ARCSS H13 M0N NPM 7X8 5PM ADTOC UNPAY
ID	FETCH-LOGICAL-c555t-d7124d57bb53dde8ef08c9a265b3f1f6aa1e6cae7e7dd2ec7de3d2d6e5c0fcb13
IEDL.DBID	DOA
ISSN	2376-5992
IngestDate	Fri Oct 03 12:46:27 EDT 2025 Sun Oct 26 04:15:49 EDT 2025 Tue Sep 30 17:13:40 EDT 2025 Sun Aug 24 03:47:57 EDT 2025 Mon Oct 20 22:19:42 EDT 2025 Mon Oct 20 16:30:14 EDT 2025 Thu Oct 16 16:11:15 EDT 2025 Thu Jan 02 22:52:23 EST 2025 Wed Oct 01 04:07:35 EDT 2025 Thu Apr 24 23:11:13 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Large class Code smell Detection Long method Machine learning Python
Language	English
License	https://creativecommons.org/licenses/by/4.0 2023 Sandouka and Aljamaan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c555t-d7124d57bb53dde8ef08c9a265b3f1f6aa1e6cae7e7dd2ec7de3d2d6e5c0fcb13
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	https://doaj.org/article/918c787e9a4d407ba38fafaca4018ddb
PMID	37346528
PQID	2828755655
PQPubID	23479
PageCount	e1370
ParticipantIDs	doaj_primary_oai_doaj_org_article_918c787e9a4d407ba38fafaca4018ddb unpaywall_primary_10_7717_peerj_cs_1370 pubmedcentral_primary_oai_pubmedcentral_nih_gov_10280480 proquest_miscellaneous_2828755655 gale_infotracmisc_A751028973 gale_infotracacademiconefile_A751028973 gale_incontextgauss_ISR_A751028973 pubmed_primary_37346528 crossref_citationtrail_10_7717_peerj_cs_1370 crossref_primary_10_7717_peerj_cs_1370
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-05-29
PublicationDateYYYYMMDD	2023-05-29
PublicationDate_xml	– month: 05 year: 2023 text: 2023-05-29 day: 29
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: San Diego, USA
PublicationTitle	PeerJ. Computer science
PublicationTitleAlternate	PeerJ Comput Sci
PublicationYear	2023
Publisher	PeerJ. Ltd PeerJ Inc
Publisher_xml	– name: PeerJ. Ltd – name: PeerJ Inc
References	Alazba (10.7717/peerj-cs.1370/ref-2) 2021; 138 Leopold (10.7717/peerj-cs.1370/ref-27) 2014; 40 Al-Shaaby (10.7717/peerj-cs.1370/ref-1) 2020; 45 Chen (10.7717/peerj-cs.1370/ref-10) 2016 Basili (10.7717/peerj-cs.1370/ref-7) 1988; 14 Sharma (10.7717/peerj-cs.1370/ref-35) 2021 Lutz (10.7717/peerj-cs.1370/ref-28) 2009 Moha (10.7717/peerj-cs.1370/ref-32) 2009; 36 Wang (10.7717/peerj-cs.1370/ref-47) 2012 Chicco (10.7717/peerj-cs.1370/ref-12) 2021; 9 Beazley (10.7717/peerj-cs.1370/ref-8) 2009 Madeyski (10.7717/peerj-cs.1370/ref-29) 2020 Kim (10.7717/peerj-cs.1370/ref-23) 2017; 7 Menshawy (10.7717/peerj-cs.1370/ref-30) 2021 Palomba (10.7717/peerj-cs.1370/ref-33) 2018; 23 Azeem (10.7717/peerj-cs.1370/ref-6) 2019; 108 Walter (10.7717/peerj-cs.1370/ref-44) 2018; 144 Güzel (10.7717/peerj-cs.1370/ref-19) 2016; 5 Wang (10.7717/peerj-cs.1370/ref-45) 2021; 31 Srinath (10.7717/peerj-cs.1370/ref-37) 2017; 4 Tempero (10.7717/peerj-cs.1370/ref-39) 2010 Vavrová (10.7717/peerj-cs.1370/ref-43) 2017 Tian (10.7717/peerj-cs.1370/ref-40) 2012 Demšar (10.7717/peerj-cs.1370/ref-13) 2006; 7 Mhawish (10.7717/peerj-cs.1370/ref-31) 2020; 35 Fontana (10.7717/peerj-cs.1370/ref-16) 2016; 21 Aljamaan (10.7717/peerj-cs.1370/ref-3) 2021 Bergstra (10.7717/peerj-cs.1370/ref-9) 2012; 13 Fowler (10.7717/peerj-cs.1370/ref-17) 2002 Woolson (10.7717/peerj-cs.1370/ref-48) 2008 Yu (10.7717/peerj-cs.1370/ref-51) 2023 Di Nucci (10.7717/peerj-cs.1370/ref-15) 2018 Wang (10.7717/peerj-cs.1370/ref-46) 2021 Lacerda (10.7717/peerj-cs.1370/ref-25) 2020; 167 Yadav (10.7717/peerj-cs.1370/ref-49) 2021 Guggulothu (10.7717/peerj-cs.1370/ref-18) 2020; 28 Singh (10.7717/peerj-cs.1370/ref-36) 2020; 97 Lenarduzzi (10.7717/peerj-cs.1370/ref-26) 2019 Tomczak (10.7717/peerj-cs.1370/ref-41) 2014; 1 Vaucher (10.7717/peerj-cs.1370/ref-42) 2009 Khomh (10.7717/peerj-cs.1370/ref-22) 2011; 84 Dewangan (10.7717/peerj-cs.1370/ref-14) 2022 Chicco (10.7717/peerj-cs.1370/ref-11) 2020; 21 Arcelli Fontana (10.7717/peerj-cs.1370/ref-5) 2016; 21 Sharma (10.7717/peerj-cs.1370/ref-34) 2021; 176 Karegowda (10.7717/peerj-cs.1370/ref-21) 2010; 2 Kreimer (10.7717/peerj-cs.1370/ref-24) 2005; 141 Yu (10.7717/peerj-cs.1370/ref-50) 2010 Jain (10.7717/peerj-cs.1370/ref-20) 2021; 212 Zazworka (10.7717/peerj-cs.1370/ref-52) 2011 Amorim (10.7717/peerj-cs.1370/ref-4) 2015 Tantithamthavorn (10.7717/peerj-cs.1370/ref-38) 2016; 43
References_xml	– volume: 144 start-page: 1 year: 2018 ident: 10.7717/peerj-cs.1370/ref-44 article-title: Code smells and their collocations: a large-scale experiment on open-source systems publication-title: Journal of Systems and Software doi: 10.1016/j.jss.2018.05.057 – start-page: 145 year: 2009 ident: 10.7717/peerj-cs.1370/ref-42 article-title: Tracking design smells: lessons from a study of god classes – volume: 97 start-page: 105524 year: 2020 ident: 10.7717/peerj-cs.1370/ref-36 article-title: Investigating the impact of data normalization on classification performance publication-title: Applied Soft Computing doi: 10.1016/j.asoc.2019.105524 – start-page: 336 year: 2010 ident: 10.7717/peerj-cs.1370/ref-39 article-title: The qualitas corpus: a curated collection of Java code for empirical studies – volume: 141 start-page: 117 issue: 4 year: 2005 ident: 10.7717/peerj-cs.1370/ref-24 article-title: Adaptive detection of design flaws publication-title: Electronic Notes in Theoretical Computer Science doi: 10.1016/j.entcs.2005.02.059 – volume: 5 start-page: 114 issue: 6 year: 2016 ident: 10.7717/peerj-cs.1370/ref-19 article-title: A survey on bad smells in codes and usage of algorithm analysis publication-title: International Journal of Computer Science and Software Engineering – volume: 176 start-page: 110936 year: 2021 ident: 10.7717/peerj-cs.1370/ref-34 article-title: Code smell detection by deep direct-learning and transfer-learning publication-title: Journal of Systems and Software doi: 10.1016/j.jss.2021.110936 – start-page: 1 year: 2021 ident: 10.7717/peerj-cs.1370/ref-49 article-title: Extraction of prediction rules of code smell using decision tree algorithm – start-page: 612 year: 2018 ident: 10.7717/peerj-cs.1370/ref-15 article-title: Detecting code smells using machine learning techniques: are we there yet? – volume: 7 start-page: 3613 issue: 6 year: 2017 ident: 10.7717/peerj-cs.1370/ref-23 article-title: Finding bad code smells with neural network models publication-title: International Journal of Electrical and Computer Engineering – volume: 31 start-page: 1329 issue: 09 year: 2021 ident: 10.7717/peerj-cs.1370/ref-45 article-title: Python code smell refactoring route generation based on association rule and correlation publication-title: International Journal of Software Engineering and Knowledge Engineering doi: 10.1142/S0218194021500431 – start-page: 261 year: 2015 ident: 10.7717/peerj-cs.1370/ref-4 article-title: Experience report: evaluating the effectiveness of decision trees for detecting code smells – start-page: 215 year: 2012 ident: 10.7717/peerj-cs.1370/ref-40 article-title: Information retrieval based nearest neighbor classification for fine-grained bug severity prediction – volume-title: Programming Python: powerful object-oriented programming year: 2009 ident: 10.7717/peerj-cs.1370/ref-28 – volume: 84 start-page: 559 issue: 4 year: 2011 ident: 10.7717/peerj-cs.1370/ref-22 article-title: BDTEX: a GQM-based Bayesian approach for the detection of antipatterns publication-title: Journal of Systems and Software doi: 10.1016/j.jss.2010.11.921 – start-page: 257 volume-title: Intelligent systems year: 2022 ident: 10.7717/peerj-cs.1370/ref-14 article-title: Code smell detection using classification approaches doi: 10.1007/978-981-19-0901-6_25 – volume: 9 start-page: 78368 year: 2021 ident: 10.7717/peerj-cs.1370/ref-12 article-title: The Matthews correlation coefficient (MCC) is more informative than Cohen’s Kappa and Brier score in binary classification assessment publication-title: IEEE Access doi: 10.1109/ACCESS.2021.3084050 – year: 2002 ident: 10.7717/peerj-cs.1370/ref-17 article-title: Refactoring: improving the design of existing code doi: 10.1007/3-540-45672-4_31 – volume: 21 start-page: 1 issue: 1 year: 2020 ident: 10.7717/peerj-cs.1370/ref-11 article-title: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation publication-title: BMC Genomics doi: 10.1186/s12864-019-6419-1 – start-page: 17 year: 2011 ident: 10.7717/peerj-cs.1370/ref-52 article-title: Investigating the impact of design debt on software quality – volume: 21 start-page: 1143 issue: 3 year: 2016 ident: 10.7717/peerj-cs.1370/ref-5 article-title: Comparing and experimenting machine learning techniques for code smell detection publication-title: Empirical Software Engineering doi: 10.1007/s10664-015-9378-4 – volume: 13 start-page: 281 issue: 2 year: 2012 ident: 10.7717/peerj-cs.1370/ref-9 article-title: Random search for hyper-parameter optimization publication-title: Journal of Machine Learning Research – volume: 108 start-page: 115 year: 2019 ident: 10.7717/peerj-cs.1370/ref-6 article-title: Machine learning techniques for code smell detection: a systematic literature review and meta-analysis publication-title: Information and Software Technology doi: 10.1016/j.infsof.2018.12.009 – start-page: 78 year: 2021 ident: 10.7717/peerj-cs.1370/ref-30 article-title: Code smells and detection techniques: a survey – start-page: 18 year: 2016 ident: 10.7717/peerj-cs.1370/ref-10 article-title: Detecting code smells in Python programs – volume: 167 start-page: 110610 year: 2020 ident: 10.7717/peerj-cs.1370/ref-25 article-title: Code smells and refactoring: a tertiary systematic review of challenges and observations publication-title: Journal of Systems and Software doi: 10.1016/j.jss.2020.110610 – volume: 35 start-page: 1428 year: 2020 ident: 10.7717/peerj-cs.1370/ref-31 article-title: Predicting code smells and analysis of predictions: using machine learning techniques and software metrics publication-title: Journal of Computer Science and Technology doi: 10.1007/s11390-020-0323-7 – start-page: 1 year: 2008 ident: 10.7717/peerj-cs.1370/ref-48 article-title: Wilcoxon signed-rank test publication-title: Wiley Encyclopedia of Clinical Trials – year: 2017 ident: 10.7717/peerj-cs.1370/ref-43 article-title: Does python smell like java? Tool support for design defect discovery in python doi: 10.22152/programming-journal.org/2017/1/11 – start-page: 352 year: 2010 ident: 10.7717/peerj-cs.1370/ref-50 article-title: A survey on metric of software complexity – volume: 28 start-page: 1063 year: 2020 ident: 10.7717/peerj-cs.1370/ref-18 article-title: Code smell detection using multi-label classification approach publication-title: Software Quality Journal doi: 10.1007/s11219-020-09498-y – volume: 14 start-page: 758 issue: 6 year: 1988 ident: 10.7717/peerj-cs.1370/ref-7 article-title: The TAME project: towards improvement-oriented software environments publication-title: IEEE Transactions on Software Engineering doi: 10.1109/32.6156 – start-page: 170 year: 2012 ident: 10.7717/peerj-cs.1370/ref-47 article-title: Can I clone this piece of code here? – start-page: 897 year: 2021 ident: 10.7717/peerj-cs.1370/ref-3 article-title: Voting heterogeneous ensemble for code smell detection – start-page: 2 year: 2019 ident: 10.7717/peerj-cs.1370/ref-26 article-title: The technical debt dataset – volume: 1 start-page: 19 issue: 21 year: 2014 ident: 10.7717/peerj-cs.1370/ref-41 article-title: The need to report effect size estimates revisited. An overview of some recommended measures of effect size publication-title: Trends in Sport Sciences – start-page: 593 year: 2021 ident: 10.7717/peerj-cs.1370/ref-46 article-title: PyNose: a test smell detector for Python – volume: 212 start-page: 102713 year: 2021 ident: 10.7717/peerj-cs.1370/ref-20 article-title: Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection publication-title: Science of Computer Programming doi: 10.1016/j.scico.2021.102713 – volume: 7 start-page: 1 year: 2006 ident: 10.7717/peerj-cs.1370/ref-13 article-title: Statistical comparisons of classifiers over multiple data sets publication-title: The Journal of Machine Learning Research – volume: 36 start-page: 20 issue: 1 year: 2009 ident: 10.7717/peerj-cs.1370/ref-32 article-title: Decor: a method for the specification and detection of code and design smells publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2009.50 – volume-title: Python essential reference year: 2009 ident: 10.7717/peerj-cs.1370/ref-8 – volume: 138 start-page: 106648 year: 2021 ident: 10.7717/peerj-cs.1370/ref-2 article-title: Code smell detection using feature selection and stacking ensemble: an empirical investigation publication-title: Information and Software Technology doi: 10.1016/j.infsof.2021.106648 – start-page: 590 year: 2021 ident: 10.7717/peerj-cs.1370/ref-35 article-title: QScored: a large dataset of code smells and quality metrics – volume: 43 start-page: 1 issue: 1 year: 2016 ident: 10.7717/peerj-cs.1370/ref-38 article-title: An empirical comparison of model validation techniques for defect prediction models publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2016.2584050 – volume: 4 start-page: 354 issue: 12 year: 2017 ident: 10.7717/peerj-cs.1370/ref-37 article-title: Python—the fastest growing programming language publication-title: International Research Journal of Engineering and Technology – volume: 23 start-page: 1188 issue: 3 year: 2018 ident: 10.7717/peerj-cs.1370/ref-33 article-title: On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation publication-title: Empirical Software Engineering doi: 10.1007/s10664-017-9535-z – volume: 21 start-page: 1143 issue: 3 year: 2016 ident: 10.7717/peerj-cs.1370/ref-16 article-title: Comparing and experimenting machine learning techniques for code smell detection publication-title: Empirical Software Engineering doi: 10.1007/s10664-015-9378-4 – volume: 2 start-page: 271 issue: 2 year: 2010 ident: 10.7717/peerj-cs.1370/ref-21 article-title: Comparative study of attribute selection using gain ratio and correlation based feature selection publication-title: International Journal of Information Technology and Knowledge Management – volume: 45 start-page: 2341 issue: 4 year: 2020 ident: 10.7717/peerj-cs.1370/ref-1 article-title: Bad smell detection using machine learning techniques: a systematic literature review publication-title: Arabian Journal for Science and Engineering doi: 10.1007/s13369-019-04311-w – year: 2023 ident: 10.7717/peerj-cs.1370/ref-51 article-title: On the relative value of imbalanced learning for code smell detection publication-title: Authorea Preprints – start-page: 342 year: 2020 ident: 10.7717/peerj-cs.1370/ref-29 article-title: MLCQ: industry-relevant code smell data set doi: 10.1145/3383219.3383264 – volume: 40 start-page: 818 issue: 8 year: 2014 ident: 10.7717/peerj-cs.1370/ref-27 article-title: Supporting process model validation through natural language generation publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2014.2327044
SSID	ssj0001511119
Score	2.3589284
Snippet	Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is...
SourceID	doaj unpaywall pubmedcentral proquest gale pubmed crossref
SourceType	Open Website Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	e1370
SubjectTerms	Algorithms Analysis Artificial Intelligence Code smell Data mining Data Mining and Machine Learning Detection Java (Computer program language) Large class Long method Machine learning Python Software Engineering
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lj9MwELZQ9wAXljeBBRmE4EJKXn7kWBBl4bBaAZWWkzW2x-XRTatNI7T8euwkrZpd8bjGE8UznrFnlM_fEPKMY6FLMDoGmbi40ImJNSQ6Ftqlwgm_5qZF-R7xw1nx4YSd9CCacBdm5_-98JXGqxXi2ffY1OM0F74y3-PMp9wjsjc7Op58aRvHCR6zssw6_szL7wzOm5aW__Lmu3P6XERGXm2qFZz_hMVi59iZ7pPpZsId2uTHuFnrsfl1gcvxnxrdINf7xJNOOk-5Sa5gdYvsb5o60D7Gb5N3x-eBToCGu-60PsXFoqYW1y1iq6IBJj-nu1B1etrCMZH2_SfmtG2uU98hs-nbz28O477bQmwYY-vYCn_UWya0Zrnf8yS6RJoSMs507lLHAVLkBlCgsDZDIyzmNrMcmUmc0Wl-l4yqZYX3CfU5nHMZk7aQWFgAcMiFwFKUTnIJEJGXmzVRpqciDx0xFsqXJMFIqjWSMrUKRorI8634quPg-JPg67DAW6FAnd0-8PZXfSSqMpXG71JYQmF9Nashlw4cGPCVprRWR-RpcA8VyDGqgL6ZQ1PX6v2nj2oiWMjHSpFH5EUv5JZ-5gb6ywxe_8CnNZA8GEj66DWD4ScbL1RhKEDeKlw2tQq1sGA-32YRudd55VaxXOQFZ5mMiBz460Dz4Uj17WtLHh6-G3gEvAZb1_67VR_8t-RDci3zOWAAV2TlARmtzxp85HO2tX7cR-xvJgZGSg priority: 102 providerName: Unpaywall
Title	Python code smells detection using conventional machine learning models
URI	https://www.ncbi.nlm.nih.gov/pubmed/37346528 https://www.proquest.com/docview/2828755655 https://pubmed.ncbi.nlm.nih.gov/PMC10280480 https://doi.org/10.7717/peerj-cs.1370 https://doaj.org/article/918c787e9a4d407ba38fafaca4018ddb
UnpaywallVersion	publishedVersion
Volume	9
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2376-5992 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001511119 issn: 2376-5992 databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2376-5992 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001511119 issn: 2376-5992 databaseCode: M~E dateStart: 20150101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 2376-5992 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001511119 issn: 2376-5992 databaseCode: RPM dateStart: 20170101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2376-5992 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001511119 issn: 2376-5992 databaseCode: BENPR dateStart: 20150527 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 2376-5992 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001511119 issn: 2376-5992 databaseCode: 8FG dateStart: 20150527 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9NAEF5BOcCFN9RQogUhuGDq13rXxxQ1LRyiqBCpnFazr0DlOlGdCPXfs2M7kQ0CLhztHcnemdl5SN9-Q8jr3GaqAK1CEJELMxXpUEGkQq5czB33NtcNynean86zT-fsvDfqCzFhLT1wq7jDIhbaO5UtIDO--VCQCgcONPjGQBijMPpGoug1U-39YAwFRUuqyX3Lcriy9uoi1PX7OMXBxL0k1HD1_x6ReynpV7jk7U21gusfUJa9XDS5T-52RSQdtz__gNyw1UNybzuggXbn9RE5mV0jNQDFe-u0vrRlWVNj1w36qqIIeV_QPuycXjbQSku7WRIL2gzKqR-T-eT4y4fTsJucEGrG2Do03Kdtw7hSLPXxS1gXCV1AkjOVutjlALHNNVhuuTGJ1dzY1CQmt0xHTqs4fUL2qmVl9wn19ZhzCRMmEzYzAOBszr05eOFELgAC8m6rSqk7WnGcblFK316g5mWjealriZoPyJud-Krl0_iT4BHaZSeENNjNC-8csnMO-S_nCMgrtKpEoosKkTQL2NS1_Pj5TI45w9qq4GlA3nZCbun_XEN3McHvH7mxBpIHA0l_EvVg-eXWeSQuIXytsstNLbGv5czXziwgT1tn2m0s5WmWs0QERAzcbLDz4Ur1_VtDBI7fRU4Av4OdR_5dq8_-h1afkzuJL_UQQ5EUB2RvfbWxL3xptlYjclNMTkbk1tHxdHY2as6kf5pPZ-OvPwHA7ELN
linkProvider	Directory of Open Access Journals
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lj9MwELZQ9wAXljeBBRmE4EJKXn7kWBBl4bBaAZWWkzW2x-XRTatNI7T8euwkrZpd8bjGE8UznrFnlM_fEPKMY6FLMDoGmbi40ImJNSQ6Ftqlwgm_5qZF-R7xw1nx4YSd9CCacBdm5_-98JXGqxXi2ffY1OM0F74y3-PMp9wjsjc7Op58aRvHCR6zssw6_szL7wzOm5aW__Lmu3P6XERGXm2qFZz_hMVi59iZ7pPpZsId2uTHuFnrsfl1gcvxnxrdINf7xJNOOk-5Sa5gdYvsb5o60D7Gb5N3x-eBToCGu-60PsXFoqYW1y1iq6IBJj-nu1B1etrCMZH2_SfmtG2uU98hs-nbz28O477bQmwYY-vYCn_UWya0Zrnf8yS6RJoSMs507lLHAVLkBlCgsDZDIyzmNrMcmUmc0Wl-l4yqZYX3CfU5nHMZk7aQWFgAcMiFwFKUTnIJEJGXmzVRpqciDx0xFsqXJMFIqjWSMrUKRorI8634quPg-JPg67DAW6FAnd0-8PZXfSSqMpXG71JYQmF9Nashlw4cGPCVprRWR-RpcA8VyDGqgL6ZQ1PX6v2nj2oiWMjHSpFH5EUv5JZ-5gb6ywxe_8CnNZA8GEj66DWD4ScbL1RhKEDeKlw2tQq1sGA-32YRudd55VaxXOQFZ5mMiBz460Dz4Uj17WtLHh6-G3gEvAZb1_67VR_8t-RDci3zOWAAV2TlARmtzxp85HO2tX7cR-xvJgZGSg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Python+code+smells+detection+using+conventional+machine+learning+models&rft.jtitle=PeerJ.+Computer+science&rft.au=Sandouka%2C+Rana&rft.au=Aljamaan%2C+Hamoud&rft.date=2023-05-29&rft.pub=PeerJ.+Ltd&rft.issn=2376-5992&rft.eissn=2376-5992&rft.volume=9&rft.spage=e1370&rft_id=info:doi/10.7717%2Fpeerj-cs.1370&rft.externalDocID=A751028973
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2376-5992&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2376-5992&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2376-5992&client=summon