This Article 
 Bibliographic References 
 Add to: 
Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System
May/June 2009 (vol. 35 no. 3)
pp. 407-429
Bente C.D. Anda, University of Oslo, Oslo
Dag I.K. Sjøberg, Simula Research Laboratory, Lysaker
Audris Mockus, Avaya Labs Research, Basking Ridge
The scientific study of a phenomenon requires it to be reproducible. Mature engineering industries are recognized by projects and products that are, to some extent, reproducible. Yet, reproducibility in software engineering (SE) has not been investigated thoroughly, despite the fact that lack of reproducibility has both practical and scientific consequences. We report a longitudinal multiple-case study of variations and reproducibility in software development, from bidding to deployment, on the basis of the same requirement specification. In a call for tender to 81 companies, 35 responded. Four of them developed the system independently. The firm price, planned schedule, and planned development process, had, respectively, “low,” “low,” and “medium” reproducibilities. The contractor's costs, actual lead time, and schedule overrun of the projects had, respectively, “medium,” “high,” and “low” reproducibilities. The quality dimensions of the delivered products, reliability, usability, and maintainability had, respectively, “low,” "high,” and “low” reproducibilities. Moreover, variability for predictable reasons is also included in the notion of reproducibility. We found that the observed outcome of the four development projects matched our expectations, which were formulated partially on the basis of SE folklore. Nevertheless, achieving more reproducibility in SE remains a great challenge for SE research, education, and industry.

[1] M. Agrawal and K. Chari, “Software Effort, Quality, and Cycle Time: A Study of CMM Level 5 Projects,” IEEE Trans. Software Eng., vol. 33, no. 3, pp. 145-156, Mar. 2007.
[2] B. Anda, “Assessing Software System Maintainability Using Structural Measures and Expert Assessments,” Proc. 23rd Int'l Conf. Software Maintenance, pp. 204-213, 2007.
[3] E. Arisholm and D.I.K. Sjøberg, “Evaluating the Effect of a Delegated versus Centralized Control Style on the Maintainability of Object-Oriented Software,” IEEE Trans. Software Eng., vol. 30, no. 8, pp. 521-534, Aug. 2004.
[4] E. Arisholm and D.I.K. Sjøberg, “Towards a Framework for Empirical Assessment of Changeability Decay,” J. Systems and Software, vol. 53, no. 1, pp. 3-14, 2000.
[5] H. Atmanspacher and R.G. Jahn, “Problems of Reproducibility Complex Mind-Matter Systems,” J. Scientific Exploration, vol. 17, no. 2, pp. 243-270, 2003.
[6] A.A. Avižienis and L. Chen, “On the Implementation of $N$ -Version Programming for Software Fault Tolerance during Execution,” Proc. IEEE Int'l Computer Software and Applications Conf., pp. 149-155, Nov. 1977.
[7] A.A. Avižienis, “The Methodology of $N$ -Version Programming,” Software Fault Tolerance, M. Lyu, ed., John Wiley & Sons, 1995.
[8] R.K. Bandi, V.K. Vaishnavi, and D.E. Turk, “Predicting Maintenance Performance Using Object-Oriented Design Complexity Metrics,” IEEE Trans. Software Eng., vol. 29, no. 1, pp. 11-87, Jan. 2003.
[9] H.C. Benestad, B. Anda, and E. Arisholm, “Assessing Software Product Maintainability Based on Class-Level Structural Measures,” Proc. Seventh Int'l Conf. Product-Focused Software Process Improvement, pp. 94-111, 2006.
[10] J.D. Blackburn, G.D. Scudder, and L.N. Van Wassenhove, “Improving Speed and Productivity of Software Development: A Global Survey of Software Developers,” IEEE Trans. Software Eng., vol. 22, no. 12, pp. 875-885, Dec. 1996.
[11] J.M. Bland and D.G. Altman, “Statistical Methods for Assessing Agreement between Two Methods of Clinical Measurement,” The Lancet, vol. 327, no. 8476, pp. 307-310, 1986.
[12] B.W. Boehm, Software Engineering Economics. Prentice Hall, 1981.
[13] B.W. Boehm, B. Clark, E. Horowitz, C. Westland, R. Madachy, and R. Selby, “Cost Models for Future Software Life Cycle Processes: COCOMO 2.0,” Annals of Software Eng., vol. 1, no. 1, pp. 1-24, 1995.
[14] S.S. Brilliant, J.C. Knight, and N.G. Leveson, “Analysis of Faults in an $N$ -Version Software Experiment,” IEEE Trans. Software Eng., vol. 16, no. 2, pp. 238-247, Feb. 1990.
[15] L.C. Briand, J. Daly, and J. Wuest, “A Unified Framework for Coupling Measurement in Object-Oriented Systems,” IEEE Trans. Software Eng., vol. 25, no. 1, pp. 91-121, Jan./Feb. 1999.
[16] R.E. Brooks, “Studying Programmer Behavior Experimentally: The Problems of Proper Methodology,” Human Aspects of Computing, vol. 23, no. 4, pp. 207-213, 1980.
[17] S.R. Chidamber and C.F. Kemerer, “A Metrics Suite for Object Oriented Design,” IEEE Trans. Software Eng., vol. 20, no. 6, pp. 476-493, June 1994.
[18] R. Chillarege, I. Bhandari, J. Chaar, M. Halliday, D. Moebus, B. Ray, and M.Y. Wong, “Orthogonal Defect Classification—A Concept for In-Process Measurement,” IEEE Trans. Software Eng., vol. 18, no. 11, pp. 943-956, Nov. 1992.
[19] B.K. Clark, “Quantifying the Effects of Process Improvement on Effort,” IEEE Software, vol. 17, no. 6, pp. 65-70, Nov./Dec. 2000.
[20] CMMI Maturity Profile September 2006, Software Eng. Inst., 2006.
[21] A. Cockburn, “Selecting a Project's Methodology,” IEEE Software, vol. 17, no. 4, pp. 64-71, July/Aug. 2000.
[22] B.P. Cohen, Developing Sociological Knowledge: Theory and Method, second ed., 1989.
[23] K. Cox and K. Phalp, “Replicating the CREWS Use Case Authoring Guidelines Experiment,” Empirical Software Eng., vol. 5, no. 3, pp. 245-267, 2000.
[24] L.J. Cronbach, Designing Evaluations of Educational and Social Programs. Jossey-Bass, 1983.
[25] B. Curtis, “Substantiating Programmer Variability,” Proc. IEEE, vol. 69, no. 7, p. 846, 1981.
[26] B. Curtis, “By the Way, Did Anyone Study Any Real Programmers,” Empirical Studies of Programmers, E. Soloway and S. Iyengar, eds., pp. 256-262, 1986.
[27] D. Darcy and C.F. Kemerer, “OO Metrics in Practice,” IEEE Software, vol. 22, no. 6, pp. 17-19, Nov./Dec. 2005.
[28] A. De Lucia, E. Pompella, and S. Stefanucci, “Assessing the Maintenance Processes of a Software Organization: An Empirical Analysis of a Large Industrial Project,” J. Systems and Software, vol. 65, no. 2, pp. 87-103, 2003.
[29] T. DeMarco and T. Lister, Peopleware: Productive Projects and Teams. Dorset House, 1987.
[30] H.F. Dingman, “Scientific Method and Reproducibility of Results,” Multivariate Behavioral Research, vol. 4, no. 4, pp. 517-522, 1969.
[31] T. Dybå, E. Arisholm, D.I.K. Sjøberg, J.E. Hannay, and F. Shull, “Are Two Heads Better than One? On the Effectiveness of Pair Programming,” IEEE Software, vol. 24, no. 6, pp. 10-13, Nov./Dec. 2007.
[32] E.F. Easton and D.R. Moodi, “Pricing and Lead-Time Decisions for Make-to-Order Forms with Contingent Orders,” European J. Operational Research, vol. 11, no. 3, pp. 57-67, 1997.
[33] N.E. Fenton and S.L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, second ed. PWS, 1998.
[34] A. Følstad and J. Heim, “Usability Evaluation of Four Functional Identical Versions of DES (Database of Empirical Studies),” SINTEF Report SINTEF_A309, 2006.
[35] J.E. Hannay, D.I.K. Sjøberg, and T. Dybå, “A Systematic Review of Theory Use in Software Engineering Experiments,” IEEE Trans. Software Eng., vol. 33, no. 2, pp. 87-107, Feb. 2007.
[36] D.E. Harter, M.S. Krishnan, and S.A. Slaughter, “Effects of Process Maturity on Quality, Cycle Time, and Effort in Software Product Development,” Management Science, vol. 46, no. 4, pp. 451-466, 2000.
[37] D.E. Harter and S.A. Slaughter, “Quality Improvement and Infrastructure Activity Costs in Software Development: A Longitudinal Analysis,” Management Science, vol. 49, no. 6, pp. 784-800, 2003.
[38] L.V. Hedges and I. Olkin, Statistical Methods for Meta-Analysis. Academic Press, 1985.
[39] J.D. Herbsleb and A. Mockus, “Formulation and Preliminary Test of an Empirical Theory of Coordination in Software Engineering,” ACM SIGSOFT Software Eng. Notes, vol. 28, no. 5, pp. 138-147, 2003.
[40] J.D. Herbsleb, D. Zubrow, D. Goldenson, W. Hayes, and M. Paulk, “Software Quality and the Capability Maturity Model,” Comm. ACM, vol. 40, no. 6, pp. 30-40, 1997.
[41] M. Holcombe, T. Cowling, and F. Macias, “Towards an Agile Approach to Empirical Software Engineering,” Proc. Workshop Empirical Studies in Software Eng., pp. 33-48, 2003.
[42] N.E. Holt, “A Systematic Review of Case Studies in Software Engineering,” MSc thesis, Univ. of Oslo, 2006.
[43] J.S. Hunter, “The National System of Scientific Measurement,” Science, vol. 210, no. 4472, pp. 869-874, 1980.
[44] IBM Software Support Handbook. IBM, Version 4.0.1, May 2008.
[45] IEEE Standard Glossary of Software Engineering Terminology. IEEE, 1990.
[46] ISO 5725-l first ed. 1994-I 2-15, Accuracy (Trueness and Precision) of Measurement Methods and Results, 1994.
[47] IUPAC Compendium of Chemical Terminology, second ed., A.D.McNaught and A. Wilkinson, eds. (Royal Soc. Chemistry, Cambridge, U.K.), index.html, 1997.
[48] A.M. Jenkins, J.D. Naumann, and J.C. Wetherbe, “Empirical Investigation of Systems Development Practices and Results,” Information and Management, vol. 7, no. 2, pp. 73-82, 1984.
[49] C.M. Judd, E.R. Smith, and L.H. Kidder, Research Methods in Social Relations, sixth ed. Harcourt Brace Jova novich, 1991.
[50] M. Jørgensen, “The Effects of the Format of Software Project Bidding Processes,” Int'l J. Project Management, vol. 24, no. 6, pp.522-528, 2006.
[51] M. Jørgensen, S.S. Bygdås, and T. Lunde, “Efficiency Evaluation of CASE Tools—Methods and Results,” TF R 38/95, Telenor FoU, 1995.
[52] M. Jørgensen and G.J. Carelius, “An Empirical Study of Software Project Bidding,” IEEE Trans. Software Eng., vol. 30, no. 12, pp. 953-969, Dec. 2004.
[53] M. Jørgensen and D.I.K. Sjøberg, “Impact of Experience on Maintenance Skills,” Software Maintenance: Research and Practice, vol. 14, no. 2, pp. 123-146, 2002.
[54] V.B. Kampenes, T. Dybå, J.E. Hannay, and D.I.K. Sjøberg, “A Systematic Review of Effect Size in Software Engineering Experiments,” Information and Software Technology, vol. 49, nos.11/12, pp. 1073-1086, 2007.
[55] K. Kelley, “Sample Size Planning for the Coefficient of Variation from the Accuracy in Parameter Estimation Approach,” Behavior Research Methods, vol. 39, pp. 755-766, 2007.
[56] L. Kidder and C.M. Judd, Research Methods in Social Relations, fifth ed. Holt, Rinehart, and Winston, 1986.
[57] J.C. Knight and N.G. Leveson, “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Trans. Software Eng., vol. 12, no. 1, pp. 96-109, Jan. 1986.
[58] D.R. Krathwohl, Social and Behavioral Science Research. Jossey-Bass, 1985.
[59] M.S. Krishnan, C.H. Kriebel, S. Kekre, and T. Mukhopadhyay, “An Empirical Analysis of Productivity and Quality in Software Products,” Management Science, vol. 46, no. 6, pp. 745-759, 2000.
[60] C. Larman, Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process, second ed. Prentice Hall, 2001.
[61] A.L. Lederer and J. Prasad, “Software Management and Cost Estimating Error,” J. Systems and Software, vol. 50, no. 19, pp. 33-42, 2000.
[62] E. Marshall, “Getting the Noise Out of Gene Arrays,” Science, vol. 22, no. 306, pp. 630-631, 2004.
[63] K.D. Maxwell, L. Van Wassenhove, and S. Dutta, “Software Development Productivity of European Space, Military, and Industrial Applications,” IEEE Trans. Software Eng., vol. 22, no. 10, pp. 706-718, Oct. 1996.
[64] S. McConnell, “I Know What I Know,” IEEE Software, vol. 19, no. 3, pp. 5-7, May/June 2002.
[65] K. Milis and R. Mercken, “Success Factors Regarding the Implementation of ICT Investment Projects,” Int'l J. Production Economics, vol. 80, no. 1, pp. 105-117, 2002.
[66] K.J. Moløkken-Østvold, “Effort and Schedule Estimation of Software Development Projects,” PhD thesis, Univ. of Oslo, 2004.
[67] K.J. Moløkken-Østvold and M. Jørgensen, “A Comparison of Software Project Overruns—Flexible versus Sequential Development Models,” IEEE Trans. Software Eng., vol. 31, no. 9, pp. 754-766, Sept. 2005.
[68] M.C. Paulk, B. Curtis, M.B. Chrissis, and C.V. Weber, “Capability Maturity Model for Software, Version 1.1,” Report CMU/SEI-93-TR-24, Software Eng. Inst., Pittsburgh, 1993.
[69] P.W. Oman and J.R. Hagemeister, “Construction and Testing of Polynomials Predicting Software Maintainability,” J. Systems and Software, vol. 24, no. 3, pp. 251-266, 1994.
[70] H. Pedersen, “Tender Prices: Bridge, Tunnel, Electro and Road Building and Maintenance 1998-2006,” Technology Report 2468, Norwegian Public Roads Administration, 2006.
[71] L. Prechelt, “The 28:1 Grant/Sackman Legend Is Misleading, or: How Large Is Interpersonal Variation Really?” Technical Report 1999-18, Universität Karlsruhe, Fakultät für Informatik, 1999.
[72] L. Prechelt, “An Empirical Comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a Search/String-Processing Program,” Technical Report 2000-5, Universität Karlsruhe, Fakultät für Informatik, 2000.
[73] L. Prechelt, “An Empirical Comparison of Seven Programming Languages,” Computer, vol. 33, no. 10, pp. 23-29, Oct. 2000.
[74] L. Prechelt, “Plat_Forms 2007: The Web Development Platform Comparison—Evaluation and Results,” Technical Report B-07-10, Freie Universität Berlin, Institut für Informatik, 2007.
[75] J.D. Procaccino, J.M. Verner, K.M. Shelfer, and D. Gefen, “What Do Software Practitioners Really Think about Project Success: An Exploratory Study,” J. Systems and Software, vol. 78, no. 2, pp. 194-203, 2005.
[76] C.C. Ragin, “Case-Oriented Research,” Encyclopaedia of the Social and Behavioural Sciences, vol. 3, pp. 1519-1525, 2001.
[77] G.F. Reed, F. Lynn, and B.D. Meade, “Use of Coefficient of Variation in Assessing Variability of Quantitative Assays,” Clinical and Diagnostic Laboratory Immunology, vol. 9, no. 6, pp. 1235-1239, 2002.
[78] R.H. Reussner, H.W. Schmidt, and I.H. Poernomo, “Reliability Prediction for Component-Based Software Architecture,” The J. Systems and Software, vol. 66, no. 3, pp. 241-252, 2003.
[79] L. Richards, Handling Qualitative Data—A Practical Guide. Sage Publications, 2005.
[80] R. Rosenthal and M.R. DiMatteo, “Meta-Analysis: Recent Development in Quantitative Methods for Literature Reviews,” Ann. Rev. Psychology, vol. 52, pp. 59-82, 2001.
[81] T.P. Rout, K. El Emam, M. Fusani, D. Goldenson, and H.-W. Jung, “SPICE in Retrospect: Developing a Standard for Process Assessment,” J. Systems and Software, vol. 80, no. 9, pp. 1483-1493, 2007.
[82] K. Schwaber, Agile Project Management with Scrum. Microsoft Press, 2004.
[83] W.R. Shadish, T.D. Cook, and D.T. Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, 2002.
[84] M. Shepperd, C. Schofield, and B. Kitchenham, “Effort Estimation Using Analogy,” Proc. 18th Int'l Conf. Software Eng., pp. 170-178, 1996.
[85] M. Sidman, Scientific Research. Basic, 1960.
[86] H.P. Siy, A. Mockus, J.D. Herbsleb, M. Krishnan, and G.T. Tucker, “Making the Software Factory Work: Lessons from a Decade of Experience,” Proc. Seventh Int'l Symp. Software Metrics, pp. 317-327, 2001.
[87] D.I.K. Sjøberg, T. Dybå, and M. Jørgensen, “The Future of Empirical Methods in Software Engineering Research,” Future of Software Eng., pp. 358-378, IEEE-CS Press, 2007.
[88] D.I.K. Sjøberg, J.E. Hannay, O. Hansen, V.B. Kampenes, A. Karahasanovic, N.-K. Liborg, and A.C. Rekdal, “A Survey of Controlled Experiments in Software Engineering,” IEEE Trans. Software Eng., vol. 31, no. 9, pp. 733-753, Sept. 2005.
[89] H. Tetens, “Reproducibility,” Enzyklopädie Philosophie und Wissenschaftstheorie, J. Mittelstrass et al., eds., pp.593-594, Metzlersche J.B., 2004.
[90] W. Trochim, “Outcome Pattern Matching and Program Theory,” Evaluation and Program Planning, vol. 12, no. 4, pp. 355-366, 1989.
[91] D.G. Wagner, “The Growth of Theories,” Group Processes: Sociological Analyses, M. Foschi and E.J. Lawler, eds., pp. 25-42, Nelson-Hall, 1994.
[92] F. Walkerden and R. Jeffery, “An Empirical Study of Analogy-Based Software Effort Estimation,” Empirical Software Eng., vol. 4, no. 2, pp. 135-158, 1999.
[93] R.K. Yin, Case Study Research: Design and Methods, third ed. Sage Publications, 2003.
[94] R.K. Yin, P.G. Bateman, and G.B. Moore, “Case Studies and Organizational Innovation: Strengthening the Connection,” Science Comm., vol. 6, no. 3, pp. 249-260, 1985.

Index Terms:
Software engineering life cycle, software quality, software project success, software process, multiple-case study.
Bente C.D. Anda, Dag I.K. Sjøberg, Audris Mockus, "Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System," IEEE Transactions on Software Engineering, vol. 35, no. 3, pp. 407-429, May-June 2009, doi:10.1109/TSE.2008.89
Usage of this product signifies your acceptance of the Terms of Use.