This Article 
 Bibliographic References 
 Add to: 
An Empirical Approach to Studying Software Evolution
July/August 1999 (vol. 25 no. 4)
pp. 493-509

Abstract—With the approach of the new millennium, a primary focus in software engineering involves issues relating to upgrading, migrating, and evolving existing software systems. In this environment, the role of careful empirical studies as the basis for improving software maintenance processes, methods, and tools is highlighted. One of the most important processes that merits empirical evaluation is software evolution. Software evolution refers to the dynamic behavior of software systems as they are maintained and enhanced over their lifetimes. Software evolution is particularly important as systems in organizations become longer-lived. However, evolution is challenging to study due to the longitudinal nature of the phenomenon in addition to the usual difficulties in collecting empirical data. In this paper, we describe a set of methods and techniques that we have developed and adapted to empirically study software evolution. Our longitudinal empirical study involves collecting, coding, and analyzing more than 25,000 change events to 23 commercial software systems over a 20-year period. Using data from two of the systems, we illustrate the efficacy of flexible phase mapping and gamma sequence analytic methods originally developed in social psychology to examine group problem solving processes. We have adapted these techniques in the context of our study to identify and understand the phases through which a software system travels as it evolves over time. We contrast this approach with time series analysis, the more traditional way of studying longitudinal data. Our work demonstrates the advantages of applying methods and techniques from other domains to software engineering and illustrates how, despite difficulties, software evolution can be empirically studied.

[1] A. Abbott, “Transcending General Linear Reality,” Sociological Theory, vol. 6, pp. 169–86, 1988.
[2] A. Abbott, “A Primer on Sequence Methods,” Organization Science, vol. 1, pp. 375–392, 1990.
[3] A. Abbott, “Sequence Analysis: New Methods for Old Ideas,” Ann. Review of Sociology, vol. 21, pp. 23–40, 1995.
[4] A. Abbott and A. Hyrcak, “Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers,” Am. J. Sociology, vol. 96, pp. 144–85, 1990.
[5] R.D. Banker, H. Chang, and C.F. Kemerer, “Evidence on Economies of Scale in Software Development,” Information and Software Technology, vol. 36, no. 5, pp. 275–282, 1994.
[6] R.D. Banker and C.F. Kemerer, "Scale of Economies in New Software Development," IEEE Trans. Software Eng., vol. 15, no. 10, pp. 1,199-1,205, 1989.
[7] R.D. Banker and S. Slaughter, “Field Study of Scale Economies in Software Maintenance,” Management Science, vol. 43, no. 12, pp. 1,709–1,725, 1997.
[8] V. Basili, L. Briand, S. Condon, Y. Kim, W. Melo, and J. Valett, Understanding and Predicting the Process of Software Maintenance Releases Proc. Eighth Int'l Conf. Software Eng. (ICSE-18), Mar. 1996.
[9] L.A. Belady and M.M. Lehman, “A Model of Large Program Development,” IBM Systems J., vol. 15, no. 1, pp. 225–252, 1976.
[10] K. Bennett, “Software Evolution: Past Present and Future,” Information and Software Technology, vol. 38, pp. 673–680, 1996.
[11] G. Box, G. Jenkins, and G. Reinsel, Time Series Analysis: Forecasting and Control. Englewood Cliffs, N.J.: Prentice Hall, 1994.
[12] L.C. Briand and V.R. Basili, A Classification Procedure for an Effective Management of Changes during the Software Maintenance Process Proc. IEEE Int'l Conf. Software Maintenance, 1992.
[13] A. Bryk and S. Raudenbush, Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park, Calif.: Sage Publications, 1992.
[14] S. Cha, I.S. Chung, and Y.R. Kwon, “Complexity Measures for Concurrent Programs Based on Information-Theoretic Metrics,” Information Processing Letters, vol. 46, pp. 43–50, 1993.
[15] C.K.S. Chong Hok Yuen, “An Empirical Approach to the Study of Errors in Large Software under Maintenance,” Proc. Second Conf. Software Maintenance, IEEE, Washington, DC., 1985.
[16] C.K.S. Chong Hok Yuen, “A Statistical Rationale for Evolution Dynamics Concepts,” Proc. Conf. Software Maintenance, Austin, Tex., 1987.
[17] C.K.S. Chong Hok Yuen, “On Analyzing Maintenance Process Data at the Global and Detailed Levels: A Case Study,” Proc. Fourth Conf. Software Maintenance, IEEE, Phoenix, Az., 1988.
[18] J. Cohen, “A Coefficient of Agreement for Nominal Scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960.
[19] C.R. Cook and A. Roesch, “Real-Time Software Metrics,” J. Systems and Software, vol. 24, 1994.
[20] J. Dvorak, “Conceptual Entropy and its Effect on Class Hierarchies,” Computer, pp. 59–63, 1994.
[21] B.A. Fisher, “Decision Emergence: Phases in Group Decision Making,” Speech Monographs, vol. 37, pp. 53–66, 1970.
[22] D. Gefen and S.L. Schneberger, “The Non-Homogeneous Maintenance Periods: A Case Study of Software Modifications,” Proc. Int'l Conf. Software Maintenance, pp. 134-141, Monterey, Calif., Nov. 1996.
[23] W.H. Green, Econometric Analysis, second ed., New York: Macmillan, 1993.
[24] W. Harrison, "An entropy-based measure of software complexity," IEEE Trans. Software Eng., vol. 18, no. 11, pp. 1,025-1,029, Nov. 1992.
[25] M.E. Holmes and R.E. Sykes, “A Test of the Fit of Gullivers's Phase Model to Hostage Negotiations,” Comm. Studies, vol. 44, pp. 38–55, 1993.
[26] IEEE, IEEE Standard for Software Maintenance. pp. 39, New York: IEEE, 1993.
[27] M. Keil, “Pulling the Plug: Software Project Management and the Problem of Project Escalation,” MIS Quarterly, vol. 19, no. 4, pp. 421-447, Dec. 1995.
[28] C.F. Kemerer, “Empirical Research on Software Complexity and Software Maintenance,” Annals of Software Eng., vol. 1, no. 1, pp. 1–22, 1995.
[29] C.F. Kemerer and S. Slaughter, “Need for More Longitudinal Studies of Software Maintenance,” Proc. Int'l Workshop Empirical Studies of Software Maintenance, Monterey, Calif., 1996.
[30] D. Johnson, "Survivability Strategies for Broadband Networks," IEEE Globecom, 1996, pp. 452-456.
[31] R.J. Landis and G.G. Koch, “The Measurement of Observer Agreement for Categorical Data,” Biometrics, vol. 22, pp. 159-174, 1977.
[32] M.M. Lehman, “Programs, Life Cycles, and Laws of Software Evolution,” Proc. Special Issue Software Eng., IEEE, vol. 68, no. 9, pp. 1,060–1,076, 1980.
[33] M.M. Lehman, J.F. Ramil, and P.D. Wernick, “Metrics and Laws of Software Evolution—The Nineties View,” Proc. Fourth Int'l Software Metrics Symp. (Metrics '97), pp. 20-32, 1997.
[34] D.C. Pelz, “Innovation Complexity and the Sequence of Innovating Stages,” Knowledge: Creation, Diffusion, Utilization, vol. 6, pp. 261-291, 1985.
[35] D.E. Perry, “Dimensions of Software Evolution,” Proc. Conf. Software Maintenance, IEEE, 1994.
[36] M.S. Poole and J. Roth, “Decision Development in Small Groups IV: A Typology of Group Decision Paths,” Human Comm. Research, vol. 15, no. 3, pp. 323-356, 1989.
[37] H.D. Rombach, B. Ulery, and J.D. Valett, “Toward Full Life Cycle Control: Adding Maintenance Measurement to the SEL,” J. Systems and Software, vol. 18, pp. 125-138, 1992.
[38] R. Saberwhal and D. Robey, “An Empirical Taxonomy of Implementation Processes Based on Sequences of Events in Information System Development,” Organization Science, vol. 4, pp. 548-576, 1993.
[39] N.F. Schneidewind, “The State of Software Maintenance,” IEEE Trans. Software Eng., vol. 13, no. 3, pp. 303–310, Mar. 1987.
[40] B.A. Semic and D.J. Cnary, “Trait Argumentativeness, Verbal Aggressiveness, and Minimally Rational Argument: An Observational Analysis of Friendship Discussions,” Comm. Quarterly, vol. 45, no. 4, pp. 355–378, 1997.
[41] J.R. Sparks, C.S. Areni, and K.C. Cox, “An Investigation of the Effects of Language Style and Communication Modality on Persuasion,” Comm. Monographs, vol. 65, no. 2, pp. 108–125, 1998.
[42] A.M. Stillwell and R.F. Baumeister, “The Construction of Victim and Perpetrator Memories: Accuracy and Distortion in Role-Based Accounts,” Personality and Social Psychology Bulletin, vol. 23, no. 11, pp. 1,157–1,172, 1997.
[43] E.B. Swanson, "The Dimensions of Maintenance," Proc. Second Int'l Conf. Software Eng., pp. 492-497, 1976.
[44] E.B. Swanson and C.M. Beath, “Departmentalization in Software Development and Maintenance,” Comm. ACM, vol. 33, no. 6, pp. 658–667, 1990.

Index Terms:
Empirical methods, software evolution, software maintenance, longitudinal studies, time series, sequence analysis, gamma analysis, phasic analysis.
Chris F. Kemerer, Sandra Slaughter, "An Empirical Approach to Studying Software Evolution," IEEE Transactions on Software Engineering, vol. 25, no. 4, pp. 493-509, July-Aug. 1999, doi:10.1109/32.799945
Usage of this product signifies your acceptance of the Terms of Use.