Subscribe

Issue No.06 - November/December (2011 vol.37)

pp: 872-877

Giulio Concas , University of Cagliari, Cagliari

Michele Marchesi , University of Cagliari, Cagliari

Alessandro Murgia , University of Cagliari, Cagliari

Roberto Tonelli , University of Cagliari, Cagliari

Ivana Turnu , University of Cagliari, Cagliari

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2011.54

ABSTRACT

The distribution of bugs in software systems has been shown to satisfy the Pareto principle, and typically shows a power-law tail when analyzed as a rank-frequency plot. In a recent paper, Zhang showed that the Weibull cumulative distribution is a very good fit for the Alberg diagram of bugs built with experimental data. In this paper, we further discuss the subject from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double-Pareto, and Yule-Simon distributions may fit the bug distribution at least as well as the Weibull distribution. In particular, we show how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. While our results have been obtained on Eclipse, we believe that these models, in particular the Yule-Simon one, can generalize to other software systems.

INDEX TERMS

Software bug distribution, empirical research, object-oriented systems.

CITATION

Giulio Concas, Michele Marchesi, Alessandro Murgia, Roberto Tonelli, Ivana Turnu, "On the Distribution of Bugs in the Eclipse System",

*IEEE Transactions on Software Engineering*, vol.37, no. 6, pp. 872-877, November/December 2011, doi:10.1109/TSE.2011.54REFERENCES

- [1] L.A. Adamic, "Zipf, Power-Law, Pareto—a Ranking Tutorial," Technical Report CA 94304, Information Dynamics Lab, http://www.hpl.hp.com/research/idl/papers ranking/, Oct. 2000.
- [2] C. Andersson and P. Runeson, "A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems,"
IEEE Trans. Software Eng., vol. 33, no. 5, pp. 273-286, May 2007.- [3] G. Baxter, M. Fean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero, "Understanding the Shape of Java Software,"
Proc. 21st Ann. ACM SIGPLAN Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 397-412, 2006.- [4] G. Concas, M. Marchesi, S. Pinna, and N. Serra, "Power-Laws in a Large Object-Oriented Software System,"
IEEE Trans. Software Eng., vol. 33, no. 10, pp. 687-708, Oct. 2007.- [5] G. Concas, M. Marchesi, S. Pinna, and N. Serra, "On the Suitability of Yule Process to Stochastically Model Some Properties of Object-Oriented Systems,"
Physica A: Statistical Mechanics and Its Applications, vol. 370, pp. 817-831, Oct. 2006.- [6] N. Fenton and N. Ohlsson, "Quantitative Analysis of Faults and Failures in a Complex Software System,"
IEEE Trans. Software Eng., vol. 26, no. 8, pp. 797-814, Aug. 2000.- [7] N. Ganesh, K. Gopinath, and V. Sridhar, "Structure and Interpretation of Computer Programs,"
Proc. Second IFIP/IEEE Int'l Symp. Theoretical Aspects of Software Eng., pp. 73-80, 2008.- [8] L. Hatton, "Power-Law Distributions of Component Size in General Software Systems,"
IEEE Trans. Software Eng., vol. 35, no. 4, pp. 566-572, July/Aug. 2009.- [9] P. Liggesmeyer, G. Engels, J. Mnch, J. Drr, and N. Riegel,
Proc. Software Eng., pp. 151-162, Mar. 2009.- [10] P. Louridas, D. Spinellis, and V. Vlachos, "Power Laws in Software,"
ACM Trans. Software Eng. and Methodology, vol. 18, no. 1, pp. 1-26, Sept. 2008.- [11] M. Mitzenmacher, "Dynamic Models for File Sizes and Double Pareto Distributions,"
Internet Math., vol. 1, no. 3, pp. 305-333, 2003.- [12] M. Newman, "Power laws, Pareto, Distributions and Zipf's Law"
Contemporary Physics, vol. 46, pp. 323-351, 2005.- [13] W.J. Reed, "The Pareto Law of Incomes—An Explanation and an Extension,"
Physica A: Statistical Mechanics and Its Applications, vol. 319, pp. 469-485, 2003.- [14] W.J. Reed and B.D. Hughes, "From Gene Families and Genera to Incomes and Internet File Sizes: Why Power-Laws Are So Common in Nature,"
Physical Rev. E, vol. 66, pp. 67-103, 2002.- [15] W.J. Reed and M. Jorgensen, "The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions,"
Comm. in Statistics—Theory and Methods, vol. 33, pp. 1733-1753, 2004.- [16] H.A. Simon, "On a Class of Skew Distribution Functions,"
Biometrika, vol. 42, pp. 425-440, 1955.- [17] C.P. Stark and N. Hovius, "The Characterization of Landslide Size Distributions,"
Geophysical Research Letters, vol. 28, no. 6, pp. 1091-1094, Mar. 2001.- [18] R. Wheeldon and S. Counsell, "Power Law Distributions in Class Relationships,"
Proc. IEEE Third Int'l Workshop Source Code Analysis and Manipulation, pp. 45-57, Sept. 2003.- [19] W.K. Wiener-Ehrich, J.R. Hamrick, and V.F. Rupolo, "Modeling Software Behavior in Terms of a Formal Life Cycle Curve: Implications for Software Maintenance,"
IEEE Trans. Software Eng., vol. 10, no. 4, pp. 376-383, July 1984.- [20] G.U. Yule, "A Mathematical Theory of Evolution Based on the Conclusions of Dr. J.C. Willis"
Philosophical Trans. Royal Soc. of London. Series B, vol. 213, pp. 21-87, 1925.- [21] H. Zhang, "On the Distribution of Software Faults,"
IEEE Trans. Software Eng., vol. 34, no. 2, pp. 301-302, Mar./Apr. 2008.- [22] H. Zhang and H.B.K. Tan, "An Empirical Study of Class Sizes for Large Java Systems,"
Proc. 14th Asia-Pacific Software Eng. Conf., pp. 230-237, Dec. 2007. |