Code Anomalies Flock Together: Exploring Code Anomaly Agglomerations for Locating Design Problems


Design problems affect every software system. Diverse software systems have been discontinued or reengineered due to design problems. As design documentation is often informal or nonexistent, design problems need to be located in the source code. The main difficulty to identify a design problem in the implementation stems from the fact that such problem is often scattered through several program elements. Previous work assumed that code anomalies – popularly known as code smells – may provide sufficient hints about the location of a design problem. However, each code anomaly alone may represent only a partial embodiment of a design problem. In this paper, we hypothesize that code anomalies tend to “flock together” to realize a design problem. We analyze to what extent groups of inter-related code anomalies, named agglomerations, suffice to locate design problems. We analyze more than 2200 agglomerations found in seven software systems of different sizes and from different domains. Our analysis indicates that certain forms of agglomerations are consistent indicators of both congenital and evolutionary design problems, with accuracy often higher than 80%.

Proceedings of the 38th International Conference on Software Engineering (ICSE), acceptance rate: 19.1% = 101/530