To automatically map source code entities to architectural modules with Naive Bayes

The process of mapping a source code entity onto an architectural module is to a large degree a manual task. Automating this process could increase the use of static architecture conformance checking methods, such as reflexion modeling, in industry. Current techniques rely on user parameterization a...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of systems and software Vol. 183; p. 111095
Main Authors	Olsson, Tobias, Ericsson, Morgan, Wingkvist, Anna
Format	Journal Article
Language	English
Published	Elsevier Inc 01.01.2022
Subjects	Incremental clustering Machine learning Naive Bayes Orphan adoption Programvaruteknik Software architecture Software Technology Incremental clustering Naive Bayes Software architecture Orphan adoption
Online Access	Get full text
ISSN	0164-1212 1873-1228 1873-1228
DOI	10.1016/j.jss.2021.111095

Cover

More Information
Summary:	The process of mapping a source code entity onto an architectural module is to a large degree a manual task. Automating this process could increase the use of static architecture conformance checking methods, such as reflexion modeling, in industry. Current techniques rely on user parameterization and a highly cohesive design. A machine learning approach would potentially require less parameters and better use of the available information to aid in automatic mapping. We investigate how a classifier can be trained to map from source code to architecture modules automatically. This classifier is trained with semantic and syntactic dependency information extracted from the source code and from architecture descriptions. The classifier is implemented using multinomial naive Bayes and evaluated. We perform experiments and compare the classifier with three state-of-the-art mapping functions in eight open-source Java systems with known ground-truth-mappings. We find that the classifier outperforms the state-of-the-art in all cases and that it provides a useful baseline for further research in the area of semi-automatic incremental clustering. We conclude that machine learning is a useful approach that performs better and with less need for parameterization compared to other approaches. Future work includes investigating problematic mappings and a more diverse set of subject systems. •Machine learning can improve automatic mapping from source code to architecture.•Dependencies are useful in machine learning from code to architecture mapping.•Representing dependencies as text can improve other text-based mapping algorithms.•Dependency weights and other parameters should reflect the design goals.•Good dependency weights are difficult for a human expert to set but can be learned from data.
ISSN:	0164-1212 1873-1228 1873-1228
DOI:	10.1016/j.jss.2021.111095