This Article 
 Bibliographic References 
 Add to: 
Power-Laws in a Large Object-Oriented Software System
October 2007 (vol. 33 no. 10)
pp. 687-708
We present a comprehensive study of an implementation of the Smalltalk object oriented system, one of the first and purest object-oriented programming environment, searching for scaling laws in its properties. We study ten system properties, including the distributions of variable and method names, inheritance hierarchies, class and method sizes, system architecture graph. We systematically found Pareto - or sometimes log-normal - distributions in these properties. This denotes that the programming activity, even when modeled from a statistical perspective, can in no way be simply modeled as a random addition of independent increments with finite variance, but exhibits strong organic dependencies on what has been already developed. We compare our results with similar ones obtained for large Java systems, reported in the literature or computed by ourselves for those properties never studied before, showing that the behavior found is similar in all studied object oriented systems. We show how the Yule process is able to stochastically model the generation of several of the power-laws found, identifying the process parameters and comparing theoretical and empirical tail indexes. Lastly, we discuss how the distributions found are related to existing object-oriented metrics, like Chidamber and Kemerer's, and how they could provide a starting point for measuring the quality of a whole system, versus that of single classes. In fact, the usual evaluation of systems based on mean and standard deviation of metrics can be misleading. It is more interesting to measure differences in the shape and coefficients of the data?s statistical distributions.

[1] M. Newman, “The Structure and Function of Complex Networks,” Siam Rev., vol. 45, pp.167-256, 2003.
[2] A. Barabasi and R. Albert, “Emergence of Scaling in Random Networks,” Science, vol. 286, pp.509-512, 1999.
[3] M. Newman, “Power Laws, Pareto Distributions and Zipf's Law,” Contemporary Physics, vol. 46, pp.323-351, 2005.
[4] G. Zipf, “Selective Studies and the Principle of Relative Frequency In Language,” Harvard Univ. Press, 1932.
[5] V. Pareto, Cours D'economique Politique.” Macmillan, 1897.
[6] R. Axtell, “Zipf Distribution of U.S. Firm Sizes,” Science, vol. 293, pp.1818-1820, 2001.
[7] A. Pagan, “The Econometrics of Financial Markets,” J. Empirical Finance, vol. 3, pp.15-102, 1996.
[8] X. Gabaix, “Zipf's Law for Cities: An Explanation,” Quarterly J. Economics, vol. 114, pp.739-767, 1999.
[9] S. Focardi, M. Marchesi, and G. Succi, “A Stochastic Model of Software Maintenance and Its Implications on Extreme Programming Processes,” Extreme Programming Examined, XP Series, pp.191-206, G. Succi and M. Marchesi, eds., Addison-Wesley, 2000.
[10] A. Potanin, J. Noble, M. Frean, and R. Biddle, “Scale-Free Geometry in Object-Oriented Programs,” Comm. ACM, vol. 48, pp. 99-103, 2005.
[11] S. Valverde, R. Ferrer-Cancho, and R. Solé, “Scale-Free Networks from Optimal Design,” Europhysics Letters, vol. 60, pp.512-517, 2002.
[12] S. Valverde, R. Ferrer-Cancho, and R. Solé, “Hierarchical Small Worlds in Software Architecture,” Santa Fe Inst. Working Paper SFI/03-07-044, 2003.
[13] C. Myers, “Software Systems as Complex Networks: Structure, Function, and Evolvability of Software Collaboration Graphs,” Physical Rev. E, vol. 68, 2003.
[14] R. Wheeldon and S. Counsell, “Power Law Distributions in Class Relationships,” Proc. Third IEEE Int'l Workshop Source Code Analysis and Manipulation, 2003.
[15] A. Gorshenev and Y. Pis'mak, “Punctuated Equilibrium in Software Evolution,” Physical Rev. E, vol. 70, 2004.
[16] A. de Moura, Y.-C. Lai, and A. Motter, “Signatures of Small-World and Scale-Free Properties in Large Computer Programs,” Physical Rev. E, vol. 68, 2003.
[17] G. Concas, M. Marchesi, S. Pinna, and N. Serra, “On the Suitability of Yule Process to Stochastically Model Some Properties of Object-Oriented Systems,” Physica A, vol. 370, pp.817-831, 2006.
[18] T. Tamai and T. Nakatani, “Analysis of Software Evolution Processes Using Statistical Distribution Models,” Proc. Int'l Workshop Principles of Software Evolution (IWPSE), pp. 120-123, 2002.
[19] A. Goldberg and D. Robson, Smalltalk 80: The Language. Addison-Wesley, 1989.
[20] Visualworks Application Developer's Guide, 2004.
[21] S. Chidamber and C. Kemerer, “A Metrics Suite for Object-Oriented Design,” IEEE Trans. Software Eng., vol. 20, no. 6, pp.476-493, June 1998.
[22] B. Meyer, Object-Oriented Software Construction, second ed. Prentice-Hall, 1997.
[23] V. Basili, L. Briand, and W. Melo, “A Validation of Object-Oriented Design Metrics as Quality Indicators,” IEEE Trans. Software Eng., vol. 22, no. 10, pp.751-761, Oct. 1996.
[24] R. Subramanyam and M. Krishnan, “Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects,” IEEE Trans. Software Eng., vol. 29, no. 4, pp.297-310, Apr. 2003.
[25] T. Gyimóthy, R. Ferenc, and I. Siket, “Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction,” IEEE Trans. Software Eng., vol. 31, no. 10, pp.897-910, Oct. 2005.
[26] H. Simon, “On a Class of Skew Distribution Functions,” Biometrika, vol. 42, pp.425-440, 1955.
[27] K. Yamasaki, K. Matia, S. Buldyrev, D. Fu, F. Pammolli, M. Riccaboni, and H. Stanley, “Preferential Attachment and Growth Dynamics in Complex Systems,” Physical Rev. E, vol. 74, 2006.
[28] W. Willinger, D. Alderson, and L. Li, “A Pragmatic Approach to Dealing with High Variability in Network Measurements,” Proc. Internet Measurement Conf. (IMC '04), pp.88-100, 2004.
[29] X. Gabaix and Y.M. Ioannides, “The Evolution of City Size Distributions,” Handbook of Regional and Urban Economics, vol 4, pp. 2341-2378, J.V. Henderson and J.F. Thiesse, eds., pp. 2341-2378, North-Holland, 2004.
[30] S. Chidamber, D. Darcy, and C. Kemerer, “Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis,” IEEE Trans. Software Eng., vol. 24, pp.629-639, 1998.
[31] Java 2 Platform, Standard Edition, v 1.4.2, http://java.sun.comj2se/, 2005.
[32] E. Gamma and K. Beck, Contributing to Eclipse: Principles, Patterns, and Plug-Ins. Addison-Wesley, 2003.
[33] W. Shadish, T. Cook, and D. Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, 2002.

Index Terms:
D.2.3.a Object-oriented programming, D.2.4.h Statistical methods, D.2.8.a Complexity measures, D.2.8.d Product metrics, D.2.8.e Software science, D.3.2.p Object-oriented languages, G.3.p Stochastic processes
Giulio Concas, Michele Marchesi, Sandro Pinna, Nicola Serra, "Power-Laws in a Large Object-Oriented Software System," IEEE Transactions on Software Engineering, vol. 33, no. 10, pp. 687-708, Oct. 2007, doi:10.1109/TSE.2007.1019
Usage of this product signifies your acceptance of the Terms of Use.