Assessing the Capability of Code Smells to Support Software Maintainability Assessments : Empirical Inquiry and Methodological Approach
Appears in the following Collection
- Institutt for informatikk 
AbstractCode smells are indicators of software design shortcomings that can decrease software maintainability. An advantage of code smells over traditional software measures is that the former are associated with an explicit set of refactoring strategies to improve the existing design. As such, code smell analysis is a promising approach to address both the assessment and the improvement of maintainability. An important challenge in code smell analysis is understanding the interplay between code smells and different aspects of maintenance. Research on code smells conducted in the past decade has emphasized the formalization and automated detection of code smells. Much less research has been conducted to empirically investigate how comprehensive and informative code smells are for the assessment of software maintainability. If we are to use code smells to assess maintainability, we need to understand their potential in explaining and predicting different maintenance outcomes and their usefulness in industrial settings. Relevant questions in using code smells as maintainability indicators include: “What and how much can code smells tell me about the maintainability of a system as a whole?” and “How suitable are code smells in identifying code segments (i.e., files) with low software maintainability?” The main goal of this thesis is to empirically investigate, from different perspectives and in a realistic setting, the strengths and limitations of code smells in supporting software maintainability assessments. The secondary goal is to suggest approaches to address the limitations of code smells and the assessments based on them. Both goals are reflected in our attempt to answer the following research questions:
Research Question 1: How good are code smells as indicators of system-level maintainability of software, and how well do code-smell-based assessments perform compared with other assessment approaches?
Research Question 2: How good are code smells in distinguishing source code files that are likely to require more maintenance effort than others?
Research Question 3: How good are code smells in discriminating between source code files that are likely to be problematic and those that are not likely to be so during maintenance?
Research Question 4: How much of the total set of problems that occur during maintenance can be explained by the presence of design problems related to code smells?
Research Question 5: How well do current code smell definitions correspond with maintainability characteristics deemed critical by software developers?
Research Question 6: How should code smell analysis be combined with expert judgment to achieve better maintainability assessments?
To answer these questions, we conducted a multiple-case study in which a maintenance project–involving four Java web systems, six software professionals, and two software companies–was observed over several weeks. The four systems had almost identical functionalities, which gave us the opportunity to observe how developers performed the same maintenance tasks on systems with different code designs. We used code smells to discriminate between the systems and to characterize the maintainability of each. Information about different maintenance outcomes (e.g., effort and defects) was collected and used to compare the ability of code smells to explain or predict maintenance outcomes, i.e., to determine how differences in the presence of code smells were related to differences in the maintenance outcomes and to what extent. Qualitative data were collected and analyzed to supplement and triangulate the analyses of the relation between code smells and maintenance outcomes.
A main observation derived from our analyses is that the usefulness of code smells depends on the maintainability perspective involved and the particular operationalization of maintainability. Although results of one analysis may appear contradictory to the results of another analysis, this may just indicate different perspectives and/or operationalizations of maintainability. These perspectives and operationalizations are therefore emphasized as the interpretation contexts of our results. Some of the results, which we consider our main research contributions, are listed below.
From a system-analysis perspective of maintainability, this thesis contributes the following findings.
When maintainability was operationalized through maintenance effort and defects, the number of code smells present in a system was not a better indicator of maintainability than the simpler measure of system size, measured as lines of code (LOC) of a system. When the systems differed largely in size, the use of code smell density (i.e., the number of code smells/LOC) yielded system maintainability assessments that were inconsistent with those derived from a comparison of the systems’ maintenance effort and defects. Code smell density was a better measure of maintainability than the number of code smells only when comparing the maintainability of systems similar in size. Expert-judgment-based assessment was a more accurate and flexible approach for system-level maintainability assessments than code-smell-based and C&K-metric-based assessment approaches. An advantage of expert-judgment-based assessments was that they were able to include adjustments related to differences in system size and complexity and to consider the effect of different maintenance scenarios. In spite of this advantage of expert judgment, we found that the use of code smells can complement the expert-judgment-based assessment approach because code smells were able to identify critical code that experts overlooked.
When maintainability was operationalized through measures of the occurrence of problems1 during maintenance, the role of code smells on the overall system maintainability was relatively small. Of the total set of maintenance problems, only about 30% were related to files containing code smells. The majority of maintenance problems were not directly related to the source code at all, such as lack of adequate technical infrastructure and external services.
When maintainability was operationalized through a set of system-level characteristics deemed important by software developers (e.g., infrastructure, architecture, and external services), many of these characteristics did not directly correspond to current definitions of code smells. Consequently, many maintainability characteristics require the use of other approaches, such as expert judgment and semantic analysis techniques to be evaluated. However, some important system-level maintainability characteristics displayed better correspondence with the definitions of code smells. “Design consistency,” for example, was considered highly important by software developers and at the same time showed high correspondence with the definition of several code smells, including some for which detection strategies are not yet available.
From a file-level-analysis perspective of maintainability, this thesis contributes the following findings:
When maintainability was operationalized through maintenance effort, none of the 12 investigated code smells significantly indicated an increase in the maintenance effort of files.
When maintainability was operationalized through the incidence of maintenance problems, a violation of the interface segregation principle (ISP) within a file indicated a significantly higher likelihood of problems with that file during maintenance.
A methodological contribution of our thesis comprises a report on our experiences, insights and recommendations from using the concept mapping technique in a software maintainability assessment context. This method was adopted from social research, and we used it to better incorporate input from expert judgment in the context of selection, analysis, and interpretation of code smells during maintainability assessments. The main conclusion is that despite some limitations, concept mapping is a promising approach for maintainability assessments that need to combine different sources of information, such as code smell analysis and expert judgment.
List of papers. Papers 1-2 and 4 are removed from the thesis due to copyright restrictions.
Paper 1: A. Yamashita and S. Counsell. Code Smells as System-Level Indicators of Maintainability: An Empirical Study. Journal of Systems and Software Volume 86, Issue 10, October 2013, Pages 2639–2653 doi:10.1016/j.jss.2013.05.007
Paper 2: D.I.K. Sjøberg, A. Yamashita, B. Anda, A. Mockus, and T. Dybå Quantifying the Effect of Code Smells on Maintenance Effort Software Engineering, IEEE Transactions on (Volume:39 , Issue: 8 ), Pages 1144-1156. doi:10.1109/TSE.2012.89
Paper 3: A. Yamashita Assessing the Capability of Code Smells to Explain Maintenance Problems: An Empirical Study Combining Quantitative and Qualitative Data Submitted version. Empirical Software Engineering March 2013. doi:10.1007/s10664-013-9250-3 The final publication is available at link.springer.com
Paper 4: A. Yamashita and L. Moonen To What Extent can Code Smell Detection Predict Maintenance Problems? – An Empirical Study Information and Software Technology Volume 55, Issue 12, December 2013, Pages 2223–2242 doi:10.1016/j.infsof.2013.08.002
Paper 5: A. Yamashita and L. Moonen Do Code Smells Reflect Important Maintainability Aspects? In Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM), 2012, pp 306 - 315 doi:10.1109/ICSM.2012.6405287
Copyright 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Paper 6: A. Yamashita, B. Anda, D.I.K. Sjøberg, H.C. Benestad, P.E. Arnstad and L. Moonen Using Concept Mapping for Maintainability Assessments In Proceedings of the Third International Symposium of Empirical Software Engineering and Measurement (ESEM2009), Orlando, Florida, October 15, 2009, IEEE Computer Society, pp. 378–389 doi:10.1109/ESEM.2009.5314234
Copyright 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.