Python code smells detection using conventional machine learning models

Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 9; p. e1370
Main Authors	Sandouka, Rana, Aljamaan, Hamoud
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 29.05.2023 PeerJ Inc
Subjects	Algorithms Analysis Artificial Intelligence Code smell Data mining Data Mining and Machine Learning Detection Java (Computer program language) Large class Long method Machine learning Python Software Engineering Large class Code smell Detection Long method Machine learning Python
Online Access	Get full text
ISSN	2376-5992 2376-5992
DOI	10.7717/peerj-cs.1370

Cover

More Information
Summary:	Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.1370