Source Code Modularization Theory and Techniques

This book presents source code modularization as a key activity in reverse engineering to extract the software architecture from the existing source code. To this end, it provides detailed techniques for source code modularization and discusses their effects on different software quality attributes....

Full description

Saved in:
Bibliographic Details
Main Author Isazadeh, Ayaz
Format eBook
LanguageEnglish
Published Cham Springer Nature 2017
Springer International Publishing AG
Springer International Publishing
Edition1
Subjects
Online AccessGet full text
ISBN3319633465
9783319633466
9783319633442
3319633449
DOI10.1007/978-3-319-63346-6

Cover

Table of Contents:
  • 4.2.2.1 Generation of Initial Population and Fitness Evaluation -- 4.2.2.2 Chromosome Selection -- 4.2.2.3 Chromosome Crossover -- 4.2.2.4 Chromosome Mutation -- 4.2.3 The DAGC Encoding Approach -- 4.2.3.1 Crossover Operator -- 4.2.3.2 Mutation Operator -- 4.3 A Combined Genetic and Hill-Climbing Modularization Approach -- 4.4 Modularization Approach Based on Learning Automata -- 4.4.1 Reward and Penalty Operators -- 4.5 A Genetic k-Means Modularization Approach -- 4.6 Exercises and Discussion Topics -- 5 Algebraically-Based Software Modularization -- 5.1 Modularization Using a Concept Analysis Approach -- 5.1.1 Basic Definitions -- 5.1.2 Lattice Construction -- 5.1.3 Partitioning of Concepts -- 5.1.3.1 Partitioning of Atomic Concepts -- 5.2 Modularization Using Spectral Graph Theory -- 5.2.1 Fiedler Theory -- 5.3 Modularization Using Text-Mining Techniques -- 5.3.1 Vector Space Model -- 5.3.1.1 Term Weighting -- 5.3.1.2 Term Frequency -- 5.3.1.3 Inverse Document Frequency -- 5.3.1.4 Similarity Calculation -- 5.3.1.5 Disadvantages of VSM -- 5.3.2 Latent Semantic Analysis -- 5.3.2.1 Definition of Singular Value Decomposition -- 5.3.2.2 Interpretation of Singular Value Decomposition -- 5.3.2.3 Dimensionality Reduction -- 5.3.2.4 How Many Singular Values Should We Retain? -- 5.3.3 Modularization Using Different Types of Features -- 5.4 Exercises and Discussion Topics -- 6 Techniques for the Evaluation of Software Modularizations -- 6.1 Preliminaries -- 6.2 Evaluation by External Criteria -- 6.2.1 Metrics Based on Coverage of Artefacts -- 6.2.1.1 The Classical MoJo Metric -- 6.2.1.2 The Extended MoJo Metric -- 6.2.1.3 The Precision and Recall Metrics -- 6.2.2 Call-Dependency-Based Metrics -- 6.2.2.1 The EdgeSim Metric -- 6.2.2.2 The MeCl Metric -- 6.2.2.3 The EdgeMoJo Metric -- 6.2.2.4 Other Evaluation Techniques
  • 6.2.3 Information-Theory-Based Metrics -- 6.3 Evaluation by Internal Criteria -- 6.3.1 The Cophenetic Distance -- 6.3.2 The Silhouette Index -- 6.3.3 The RS Index -- 6.3.4 The Compactness Metric -- 6.3.5 The Dunn Index -- 6.4 Exercises and Discussion Topics -- 7 Software Quality Attributes and Modularization -- 7.1 Preliminaries -- 7.1.1 Modeling of Software System Behavior -- 7.1.2 Overview of Discrete-Time Markov Chains -- 7.1.3 Correlation Analysis Versus Regression Analysis -- 7.1.4 Components Failure Models -- 7.2 Security Analysis -- 7.2.1 Security Overview of Mozilla Firefox -- 7.2.2 Vulnerability Quantification -- 7.2.3 Vulnerability Prediction -- 7.2.4 Sensitivity Analysis -- 7.3 Reliability Analysis -- 7.3.1 Absorbing DTMC-Based Models -- 7.3.1.1 Reliability Assessment Using Composite Methods -- 7.3.1.2 Reliability Assessment Using Hierarchical Methods -- 7.3.2 Absorbing CTMC-Based Models -- 7.3.2.1 Composite-Method-Based Reliability Assessment -- 7.3.2.2 Hierarchical-Method-Based Reliability Assessment -- 7.3.3 Reliability Modeling for Different Software Architecture Styles -- 7.3.3.1 Reliability Modeling for Batch-Sequential/Pipeline Style -- 7.3.3.2 Reliability Modeling for Parallel/Pipeline and Filter Style -- 7.3.3.3 Reliability Modeling for Fault-Tolerant Style -- 7.3.3.4 Reliability Modeling for Call-and-Return Style -- 7.3.3.5 Reliability Modeling for Polymorphic Style -- 7.3.4 Path-Based Model -- 7.4 Performance Analysis -- 7.5 Exercises and Discussion Topics -- References
  • Intro -- Preface -- Who Should Read This Book? -- How Should This Book Be Read? -- Acknowledgments -- Contents -- 1 Introduction -- 1.1 The Basic Setup -- 1.2 Challenges of Software Modularization -- 1.3 Overview of the Software Modularization Process -- 1.4 Source Code Analysis -- 1.5 Types of Artefact Dependency Graph -- 1.5.1 Call Dependency Graph -- 1.5.2 Artefact-Feature Dependency Graph -- 1.6 Software Modularization Methods -- 1.7 Reverse Engineering Tools -- 1.7.1 Understand Tool-Set -- 1.7.2 NDepend Tool-Set -- 1.8 Exercises and Discussion Topics -- 2 Proximity of Software Artefacts -- 2.1 Preliminaries -- 2.1.1 Software Artefacts and Features -- 2.1.2 Types of Proximity Measures -- 2.1.2.1 Similarity Coefficient/Measure -- 2.1.2.2 Dissimilarity Coefficient/Measure -- 2.1.2.3 Proximity Matrix -- 2.2 Similarity Coefficients/Measures -- 2.2.1 Measures for Binary Feature Vectors -- 2.2.1.1 Jaccard Similarity Measure -- 2.2.1.2 List of Proximity Measures -- 2.2.1.3 Symmetrical vs. Asymmetrical Proximity Measures -- 2.2.2 Measures for Nonbinary Feature Vectors -- 2.2.2.1 Ellenberg Measure -- 2.2.3 A General Similarity Coefficient -- 2.2.4 The Cosine Similarity Measure -- 2.3 Distance Coefficients/Measures -- 2.3.1 L2 Distance Measures -- 2.3.1.1 Euclidean Distance (L2 Distance) -- 2.3.1.2 Average Distance -- 2.3.1.3 Chord Distance -- 2.3.2 L1 Distance Measures -- 2.3.2.1 Manhattan Distance (L1 Distance, City Block Distance, Taxi Cab Distance) -- 2.3.2.2 Chebyshev Distance (Total Variational Distance) -- 2.3.2.3 Avg(L1, L∞) Distance -- 2.3.3 Intersection-Based Measures -- 2.3.4 General Distance Measures -- 2.3.4.1 Minkowski Distance -- 2.3.4.2 A General Distance Coefficient -- 2.4 Correlation Coefficients/Measures -- 2.4.1 Covariance Similarity Measure -- 2.4.2 Pearson's Correlation Measures -- 2.5 Categorical Data Measures
  • 2.6 Proximity of Modules -- 2.6.1 Mean-Based Proximity -- 2.6.2 Neighbor-Based Proximity Measures -- 2.6.2.1 Nearest-Neighbor Proximity -- 2.6.2.2 Farthest-Neighbor Distance -- 2.6.2.3 Average-Neighbor Distance -- 2.6.3 Lance-Williams Formula -- 2.7 Modularization Quality -- 2.7.1 BasicMQ -- 2.7.2 TurboMQ -- 2.8 Information Loss Measure -- 2.8.1 Basics of Information Theory -- 2.8.1.1 Similarity Between Two Probability Distributions -- 2.9 Exercises and Discussion Topics -- 3 Hierarchical and Partitional Modularization Algorithms -- 3.1 Preliminaries -- 3.1.1 Overview of Dendrograms -- 3.2 Graphically-Based Hierarchical Agglomerative Modularization -- 3.2.1 The Single-Linkage Method -- 3.2.2 The Complete Linkage Method -- 3.2.3 The Group Average Method -- 3.2.4 The Weighted Group Average Method -- 3.3 Hierarchical Agglomerative Modularization of Binary Features -- 3.3.1 Combined Algorithm -- 3.3.2 Weighted Combined Algorithm -- 3.4 Geometrically-Based Hierarchical Agglomerative Modularization -- 3.4.1 The Centroid Method -- 3.4.2 The Median Method -- 3.5 Entropy-Based Hierarchical Agglomerative Modularization -- 3.5.1 AIB Method -- 3.5.2 LIMBO Method -- 3.6 Nonhierarchical/Partitional Modularization -- 3.6.1 The k-Means Algorithm -- 3.6.2 Variations of the k-Means Algorithm -- 3.6.2.1 X-Means Method -- 3.6.2.2 K-Medoids Method -- 3.6.2.3 The Compare-Means Method -- 3.6.2.4 The Sort-Means Method -- 3.7 Exercises and Discussion Topics -- 4 Search-Based Software Modularization -- 4.1 Hill-Climbing Modularization Approaches -- 4.1.1 A Generic Hill-Climbing Approach -- 4.1.2 Simulated Annealing Approaches -- 4.2 Genetic Modularization Approaches -- 4.2.1 The BUNCH Approach -- 4.2.1.1 BUNCH Objective Functions -- 4.2.1.2 BUNCH Data Encoding -- 4.2.1.3 BUNCH Genetic Operators -- 4.2.1.4 Consolidated Model in BUNCH -- 4.2.2 BUNCH Running Example