This Article 
 Bibliographic References 
 Add to: 
Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components
November 1993 (vol. 19 no. 11)
pp. 1028-1044

Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are tight. Therefore, one needs to low/high fault frequency components so that testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. The authors present the optimized set reduction approach for constructing such models, which is intended to fulfill specific software engineering needs. The approach to classification is to measure the software system and build multivariate stochastic models for predicting high-risk system components. Experimental results obtained by classifying Ada components into two classes (is, or is not likely to generate faults during system and acceptance rest) are presented. The accuracy of the model and the insights it provides into the error-making process are evaluated.

[1] A. Agresti,Categorical Data Analysis. New York: Wiley, 1990.
[2] W. Agresti, W. Evanco, and M. Smith, "Early experiences building a software quality prediction model," inProc. Fifteenth Ann. Software Eng. Workshop, Nov. 1990.
[3] W. Agresti and W. Evanco, "Projecting software defects from analyzing Ada design,"IEEE Trans. Software Eng., vol. 18, no. 11, Nov. 1992.
[4] W. Agresti, W. Evanco, D. Murphy, W. Thomas, and B. Ulery, "Statistical models for Ada design quality," inProc. Fourth Software Quality Workshop(Alexandria Bay, New York), Aug. 1992.
[5] V. R. Basili and B. T. Perricone, "Software errors and complexity: An empirical investigation,"Commun. ACM, vol. 27, no. 1, p. 42-52, Jan. 1984.
[6] V. Basili, "Quantitative evaluation of software methodology," inProc. First Pan Pacific Comput. Conf.(Australia), July 1985.
[7] V.R. Basili and H.D. Rombach, "The Tame Project: Towards Improvement-Oriented Software Environments,"IEEE Trans. Software Eng., Vol. SE-14, No. 6, June 1988, pp. 758-773.
[8] L. Breiman, J. Friedman, R. Olshen, and C. Stone,Classification and Regression Trees. Monterey, CA: Wadsworth&Brooks/Cole, 1984.
[9] L. Briand and A. Porter, "An alternative modeling approach for predicting error profiles in Ada systems,"EUROMETRICS '92(European Conference on Quantitative Evaluation of Software and Systems, Brussels, Belgium), Apr. 1992.
[10] L. Briand, V. Basili, and C. Hetmanski, "Providing an empirical basis for optimizing the verification and testing phases of software development," presented at the IEEE Int. Symp. Software Reliability Engineering, Oct. 1992.
[11] L. Briand, W. Thomas, and C. Hetmanski, "Modeling and managing risk early in software development,"Int. Conf. Software Eng.(Maryland), May 1993.
[12] L. Briand, V. Baili, and W. Thomas, "A pattern recognition approach for software engineering data analysis,"IEEE Trans. Software Eng., vol. 18, no. 11, Nov. 1992.
[13] D.N. Card and W. W. Agresti, "Measuring software design complexity,"J. Syst. Software, vol. 8, pp. 185-197, Mar. 1988.
[14] J. Capon, "Statistics for the social sciences," Wadworth Publishing Co., 1988.
[15] J. Cendrowska, "PRISM: an algorithm for inducing modular rules,"J. Man-Machine Studies, vol. 27, p. 349.
[16] W. Dillon and M. Goldstein,Multivariate Analysis: Methods and Applications. New York: 1984.
[17] D. Doubleday, "ASAP: an Ada static source code analyzer program," Tech. Rep. TR-1895, Dept. Comput. Sci., Univ. of Maryland, Aug. 1987.
[18] W. Evanco and W. Agresti, "Statistical representations and analyses of software," inProc. 24th Symp. Interface of Computing Sci. Statistics(College Station, Texas), Mar. 1992.
[19] J. Gannon, E. Katz, and V. Basili, "Measures for Ada packages: an initial study,"Commun. ACM, vol. 29, no. 7, July 1986.
[20] S. Henry and D. Kafura, "Software structure metrics based on information flow,"IEEE Trans. Software Eng., vol. SE-7, no. 5, Sept. 1981.
[21] D. Hosmer and S. Lemeshow,Applied Logistic Regression. New York: Wiley, 1989.
[22] R. Michalski, "Theory and methodology of inductive learning," inMachine Learning(vol. 1), R. Michalski, J. Carbonell, and T. Mitchell, Eds. Los Altos, CA: Morgan Kaufmann.
[23] J. Munson and T. Khoshgoftaar, "The detection of fault-prone programs,"IEEE Trans. Software Eng., vol. 18, no. 5, May 1992.
[24] D. Potier, J. Albin, V. Ferreol, and A. Bilodeau, "Experiments with computer software complexity and reliability," inProc. 6th Int. Conf. on Software Eng., 1982, pp. 94-101.
[25] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[26] H. D. Rombach, "A controlled experiment on the impact of software structure on maintainability,"IEEE Trans. Software Eng., vol. SE-13, no. 3, Mar. 1987.
[27] J. Chambers and T. Hastie, "Statistical models in S." Pacific Grove, CA: Wadsworth&Brooks/Cole.
[28] R. Selby and A. Porter, "Learning from examples: generation and evaluation of decision trees for software resource analysis,"IEEE Trans. Software Eng., vol. 14, no. 12, Dec. 1988.
[29] Software Engineering Laboratory, NASA Goddard Flight Center, "Data collection procedures for the software engineering laboratory database," TR SEL-92-002, Mar. 1992.

Index Terms:
high-risk software components; testing effort; verification effort; optimized set reduction approach; multivariate stochastic model; classifying Ada components; error-making process; program testing; program verification; software reliability
L.C. Briand, V.R. Brasili, C.J. Hetmanski, "Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components," IEEE Transactions on Software Engineering, vol. 19, no. 11, pp. 1028-1044, Nov. 1993, doi:10.1109/32.256851
Usage of this product signifies your acceptance of the Terms of Use.