The Community for Technology Leaders
RSS Icon
Issue No.06 - November/December (2011 vol.37)
pp: 872-877
Michele Marchesi , University of Cagliari, Cagliari
Alessandro Murgia , University of Cagliari, Cagliari
Roberto Tonelli , University of Cagliari, Cagliari
Giulio Concas , University of Cagliari, Cagliari
The distribution of bugs in software systems has been shown to satisfy the Pareto principle, and typically shows a power-law tail when analyzed as a rank-frequency plot. In a recent paper, Zhang showed that the Weibull cumulative distribution is a very good fit for the Alberg diagram of bugs built with experimental data. In this paper, we further discuss the subject from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double-Pareto, and Yule-Simon distributions may fit the bug distribution at least as well as the Weibull distribution. In particular, we show how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. While our results have been obtained on Eclipse, we believe that these models, in particular the Yule-Simon one, can generalize to other software systems.
Software bug distribution, empirical research, object-oriented systems.
Michele Marchesi, Alessandro Murgia, Roberto Tonelli, Giulio Concas, "On the Distribution of Bugs in the Eclipse System", IEEE Transactions on Software Engineering, vol.37, no. 6, pp. 872-877, November/December 2011, doi:10.1109/TSE.2011.54
[1] L.A. Adamic, "Zipf, Power-Law, Pareto—a Ranking Tutorial," Technical Report CA 94304, Information Dynamics Lab, ranking/, Oct. 2000.
[2] C. Andersson and P. Runeson, "A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems," IEEE Trans. Software Eng., vol. 33, no. 5, pp. 273-286, May 2007.
[3] G. Baxter, M. Fean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero, "Understanding the Shape of Java Software," Proc. 21st Ann. ACM SIGPLAN Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 397-412, 2006.
[4] G. Concas, M. Marchesi, S. Pinna, and N. Serra, "Power-Laws in a Large Object-Oriented Software System," IEEE Trans. Software Eng., vol. 33, no. 10, pp. 687-708, Oct. 2007.
[5] G. Concas, M. Marchesi, S. Pinna, and N. Serra, "On the Suitability of Yule Process to Stochastically Model Some Properties of Object-Oriented Systems," Physica A: Statistical Mechanics and Its Applications, vol. 370, pp. 817-831, Oct. 2006.
[6] N. Fenton and N. Ohlsson, "Quantitative Analysis of Faults and Failures in a Complex Software System," IEEE Trans. Software Eng., vol. 26, no. 8, pp. 797-814, Aug. 2000.
[7] N. Ganesh, K. Gopinath, and V. Sridhar, "Structure and Interpretation of Computer Programs," Proc. Second IFIP/IEEE Int'l Symp. Theoretical Aspects of Software Eng., pp. 73-80, 2008.
[8] L. Hatton, "Power-Law Distributions of Component Size in General Software Systems," IEEE Trans. Software Eng., vol. 35, no. 4, pp. 566-572, July/Aug. 2009.
[9] P. Liggesmeyer, G. Engels, J. Mnch, J. Drr, and N. Riegel, Proc. Software Eng., pp. 151-162, Mar. 2009.
[10] P. Louridas, D. Spinellis, and V. Vlachos, "Power Laws in Software," ACM Trans. Software Eng. and Methodology, vol. 18, no. 1, pp. 1-26, Sept. 2008.
[11] M. Mitzenmacher, "Dynamic Models for File Sizes and Double Pareto Distributions," Internet Math., vol. 1, no. 3, pp. 305-333, 2003.
[12] M. Newman, "Power laws, Pareto, Distributions and Zipf's Law" Contemporary Physics, vol. 46, pp. 323-351, 2005.
[13] W.J. Reed, "The Pareto Law of Incomes—An Explanation and an Extension," Physica A: Statistical Mechanics and Its Applications, vol. 319, pp. 469-485, 2003.
[14] W.J. Reed and B.D. Hughes, "From Gene Families and Genera to Incomes and Internet File Sizes: Why Power-Laws Are So Common in Nature," Physical Rev. E, vol. 66, pp. 67-103, 2002.
[15] W.J. Reed and M. Jorgensen, "The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions," Comm. in Statistics—Theory and Methods, vol. 33, pp. 1733-1753, 2004.
[16] H.A. Simon, "On a Class of Skew Distribution Functions," Biometrika, vol. 42, pp. 425-440, 1955.
[17] C.P. Stark and N. Hovius, "The Characterization of Landslide Size Distributions," Geophysical Research Letters, vol. 28, no. 6, pp. 1091-1094, Mar. 2001.
[18] R. Wheeldon and S. Counsell, "Power Law Distributions in Class Relationships," Proc. IEEE Third Int'l Workshop Source Code Analysis and Manipulation, pp. 45-57, Sept. 2003.
[19] W.K. Wiener-Ehrich, J.R. Hamrick, and V.F. Rupolo, "Modeling Software Behavior in Terms of a Formal Life Cycle Curve: Implications for Software Maintenance," IEEE Trans. Software Eng., vol. 10, no. 4, pp. 376-383, July 1984.
[20] G.U. Yule, "A Mathematical Theory of Evolution Based on the Conclusions of Dr. J.C. Willis" Philosophical Trans. Royal Soc. of London. Series B, vol. 213, pp. 21-87, 1925.
[21] H. Zhang, "On the Distribution of Software Faults," IEEE Trans. Software Eng., vol. 34, no. 2, pp. 301-302, Mar./Apr. 2008.
[22] H. Zhang and H.B.K. Tan, "An Empirical Study of Class Sizes for Large Java Systems," Proc. 14th Asia-Pacific Software Eng. Conf., pp. 230-237, Dec. 2007.
334 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool