Towards understanding bugs in Python interpreters

Python has been widely used to develop large-scale software systems such as distributed systems, cloud computing, artificial intelligence, and Web platforms due to its flexibility and versatility. As a kind of complex software, Python interpreter could also suffer from software bugs and thus fundame...

Full description

Saved in:

Bibliographic Details
Published in	Empirical software engineering : an international journal Vol. 28; no. 1; p. 19
Main Authors	Liu, Di, Feng, Yang, Yan, Yanyan, Xu, Baowen
Format	Journal Article
Language	English
Published	New York Springer US 01.01.2023 Springer Nature B.V
Subjects	Algorithms Applications programs Artificial intelligence Cloud computing Compilers Complexity Computer networks Computer Science Debugging Interpreters Programming Languages Quality assurance Root cause analysis Software Software Engineering/Programming and Operating Systems Bugs Python interpreter Empirical software engineering
Online Access	Get full text
ISSN	1382-3256 1573-7616
DOI	10.1007/s10664-022-10239-x

Cover

More Information
Summary:	Python has been widely used to develop large-scale software systems such as distributed systems, cloud computing, artificial intelligence, and Web platforms due to its flexibility and versatility. As a kind of complex software, Python interpreter could also suffer from software bugs and thus fundamentally threaten the quality of all Python program applications. Since the first release of Python, more than 30,000 bugs have been discovered. While modern interpreters often consist of many modules, built-in libraries, extensions, etc, they could reach millions of code lines. The large size and high complexity of interpreters bring substantial challenges to their quality assurance. To characterize the interpreter bugs and provide empirical supports, this paper conducts a large-scale empirical study on the two most popular Python interpreters – CPython and PyPy. We have comprehensively investigated the maintenance log information and collected 30,069 fixed bugs and 20,334 confirmed revisions. We further manually characterized and taxonomized 1200 bugs to investigate their representative symptoms and root causes deeply. Finally, we identified nine findings by comprehensively investigating bug locations, symptoms, root causes, and bug revealing & fixing time. The key findings include (for both interpreters): (1) the Library, object model, and interpreter back-end are the most buggy components; (2) unexpected behavior, crash, and performance are the most common symptoms; (3) incorrect algorithm logic, configuration, and internal call are the most common general root causes; incorrect object design is the most common Python-specific root cause; (4) some test-program triggering bugs are tiny (less than ten lines), and most bug fixes only involve slight modifications. Depending on these findings, we discuss the lessons learned and practical implications that can support the research on interpreters’ testing, debugging, and improvements.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1382-3256 1573-7616
DOI:	10.1007/s10664-022-10239-x