This Article 
 Bibliographic References 
 Add to: 
On the Distribution of Bugs in the Eclipse System
November/December 2011 (vol. 37 no. 6)
pp. 872-877
Giulio Concas, University of Cagliari, Cagliari
Michele Marchesi, University of Cagliari, Cagliari
Alessandro Murgia, University of Cagliari, Cagliari
Roberto Tonelli, University of Cagliari, Cagliari
Ivana Turnu, University of Cagliari, Cagliari
The distribution of bugs in software systems has been shown to satisfy the Pareto principle, and typically shows a power-law tail when analyzed as a rank-frequency plot. In a recent paper, Zhang showed that the Weibull cumulative distribution is a very good fit for the Alberg diagram of bugs built with experimental data. In this paper, we further discuss the subject from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double-Pareto, and Yule-Simon distributions may fit the bug distribution at least as well as the Weibull distribution. In particular, we show how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. While our results have been obtained on Eclipse, we believe that these models, in particular the Yule-Simon one, can generalize to other software systems.

[1] L.A. Adamic, "Zipf, Power-Law, Pareto—a Ranking Tutorial," Technical Report CA 94304, Information Dynamics Lab, ranking/, Oct. 2000.
[2] C. Andersson and P. Runeson, "A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems," IEEE Trans. Software Eng., vol. 33, no. 5, pp. 273-286, May 2007.
[3] G. Baxter, M. Fean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero, "Understanding the Shape of Java Software," Proc. 21st Ann. ACM SIGPLAN Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 397-412, 2006.
[4] G. Concas, M. Marchesi, S. Pinna, and N. Serra, "Power-Laws in a Large Object-Oriented Software System," IEEE Trans. Software Eng., vol. 33, no. 10, pp. 687-708, Oct. 2007.
[5] G. Concas, M. Marchesi, S. Pinna, and N. Serra, "On the Suitability of Yule Process to Stochastically Model Some Properties of Object-Oriented Systems," Physica A: Statistical Mechanics and Its Applications, vol. 370, pp. 817-831, Oct. 2006.
[6] N. Fenton and N. Ohlsson, "Quantitative Analysis of Faults and Failures in a Complex Software System," IEEE Trans. Software Eng., vol. 26, no. 8, pp. 797-814, Aug. 2000.
[7] N. Ganesh, K. Gopinath, and V. Sridhar, "Structure and Interpretation of Computer Programs," Proc. Second IFIP/IEEE Int'l Symp. Theoretical Aspects of Software Eng., pp. 73-80, 2008.
[8] L. Hatton, "Power-Law Distributions of Component Size in General Software Systems," IEEE Trans. Software Eng., vol. 35, no. 4, pp. 566-572, July/Aug. 2009.
[9] P. Liggesmeyer, G. Engels, J. Mnch, J. Drr, and N. Riegel, Proc. Software Eng., pp. 151-162, Mar. 2009.
[10] P. Louridas, D. Spinellis, and V. Vlachos, "Power Laws in Software," ACM Trans. Software Eng. and Methodology, vol. 18, no. 1, pp. 1-26, Sept. 2008.
[11] M. Mitzenmacher, "Dynamic Models for File Sizes and Double Pareto Distributions," Internet Math., vol. 1, no. 3, pp. 305-333, 2003.
[12] M. Newman, "Power laws, Pareto, Distributions and Zipf's Law" Contemporary Physics, vol. 46, pp. 323-351, 2005.
[13] W.J. Reed, "The Pareto Law of Incomes—An Explanation and an Extension," Physica A: Statistical Mechanics and Its Applications, vol. 319, pp. 469-485, 2003.
[14] W.J. Reed and B.D. Hughes, "From Gene Families and Genera to Incomes and Internet File Sizes: Why Power-Laws Are So Common in Nature," Physical Rev. E, vol. 66, pp. 67-103, 2002.
[15] W.J. Reed and M. Jorgensen, "The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions," Comm. in Statistics—Theory and Methods, vol. 33, pp. 1733-1753, 2004.
[16] H.A. Simon, "On a Class of Skew Distribution Functions," Biometrika, vol. 42, pp. 425-440, 1955.
[17] C.P. Stark and N. Hovius, "The Characterization of Landslide Size Distributions," Geophysical Research Letters, vol. 28, no. 6, pp. 1091-1094, Mar. 2001.
[18] R. Wheeldon and S. Counsell, "Power Law Distributions in Class Relationships," Proc. IEEE Third Int'l Workshop Source Code Analysis and Manipulation, pp. 45-57, Sept. 2003.
[19] W.K. Wiener-Ehrich, J.R. Hamrick, and V.F. Rupolo, "Modeling Software Behavior in Terms of a Formal Life Cycle Curve: Implications for Software Maintenance," IEEE Trans. Software Eng., vol. 10, no. 4, pp. 376-383, July 1984.
[20] G.U. Yule, "A Mathematical Theory of Evolution Based on the Conclusions of Dr. J.C. Willis" Philosophical Trans. Royal Soc. of London. Series B, vol. 213, pp. 21-87, 1925.
[21] H. Zhang, "On the Distribution of Software Faults," IEEE Trans. Software Eng., vol. 34, no. 2, pp. 301-302, Mar./Apr. 2008.
[22] H. Zhang and H.B.K. Tan, "An Empirical Study of Class Sizes for Large Java Systems," Proc. 14th Asia-Pacific Software Eng. Conf., pp. 230-237, Dec. 2007.

Index Terms:
Software bug distribution, empirical research, object-oriented systems.
Giulio Concas, Michele Marchesi, Alessandro Murgia, Roberto Tonelli, Ivana Turnu, "On the Distribution of Bugs in the Eclipse System," IEEE Transactions on Software Engineering, vol. 37, no. 6, pp. 872-877, Nov.-Dec. 2011, doi:10.1109/TSE.2011.54
Usage of this product signifies your acceptance of the Terms of Use.