This Article 
 Bibliographic References 
 Add to: 
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"
September 2007 (vol. 33 no. 9)
pp. 637-640
Zhang & Zhang (hereafter, the Zhangs) argue that such the low precision detectors seen in Menzies, Greenwald, and Frank's paper Data Mining Static Code Attributes to Learn Defect Predictors [13] (hereafter, DMP) are "not satisfactory for practical purposes". They demand that "a good prediction model should achieve both high Recall and high Precision" (which we will denote as "high precision & recall"). All other detectors, they argue, "may lead to impractical prediction models". We have a different view and this short note explains why. While we disagree with the Zhangs' conclusions, we find that their derived equation is an important result. The insightful feature of the Zhangs' equation is that it can use information about the problem at hand to characterize the pre-conditions for high precision and high recall detectors. To the best of our knowledge, no such characterization has been previously reported (at least, not in the software engineering literature).

[1] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo, “Recovering Traceability Links between Code and Documentation,” IEEE Trans. Software Eng., vol. 28, no. 10, pp. 970-983, Oct. 2002.
[2] G. Antoniol and Y.-G. Gueheneuc, “Feature Identification: A Novel Approach and a Case Study,” Proc. Int'l Conf. Software Maintenance (ICSM '05) pp. 357-366, 2005.
[3] A. Yun chung Liu, “The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets,” master's thesis, http://www.lans. aliu_ masters_thesis.pdf, 2004.
[4] J. Cleland-Huang, R. Settimi, X. Zou, and P. Solc, “The Detection and Classification of Non-Functional Requirements with Application to Early Aspects,” Proc. Requirements Eng. Conf. (RE '06), pp.36-45, 2006.
[5] A. Dekhtyar, J.H. Hayes, and J. Larsen, “Make the Most of Your Time: How Should the Analyst Work with Automated Traceability Tools?” Proc. Third Int'l Workshop Predictive Modeling in Software Eng. (PROMISE '07), 2007.
[6] C. Drummond and R.C. Holte, “C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling,” Proc. Workshop Learning from Imbalanced Datasets II, 2003.
[7] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of Online Learning and an Application to Boosting,” J.Computer and System Sciences, vol. 55, 1997.
[8] S.R. Gaddam, V.V. Phoha, and K.S. Balagani, “K-means+id3: ANovel Method for Supervised Anomaly Detection by Cascading k-Means Clustering and id3 Decision Tree Learning Methods,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, Mar. 2007.
[9] J.H. Hayes, A. Dekhtyar, and S.K. Sundaram, “Advancing Candidate Link Generation for Requirements Tracing: The Study of Methods,” IEEE Trans. Software Eng., vol. 32, no. 1, Jan. 2006.
[10] T.M. Khoshgoftaar, E. Geleyn, L. Nguyen, and L. Bullard, “Cost-Sensitive Boosting in Software Quality Modeling,” Proc. Symp. High Assurance Software Eng., p. 51, 2002.
[11] Y. Ma, “An Empirical Investigation of Tree Ensembles in Biometrics and Bioinformatics,” PhD thesis, Jan. 2007.
[12] A. Marcus and J. Maletic, “Recovering Documentation-to-Source Code Traceability Links Using Latent Semantic Indexing,” Proc. 25th Int'l Conf. Software Eng., 2003.
[13] T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[14] T. Menzies and J.S. Di Stefano, “How Good Is Your Blind Spot Sampling Policy?” Proc. IEEE Conf. High Assurance Software Eng.,, 2003.
[15] H. Zhang and X. Zhang, “Comments on ‘Data Mining Static Code Attributes to Learn Defect Predictors,’” IEEE Trans. Software Eng., Sept. 2007.

Tim Menzies, Alex Dekhtyar, Justin Distefano, Jeremy Greenwald, "Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"," IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 637-640, Sept. 2007, doi:10.1109/TSE.2007.70721
Usage of this product signifies your acceptance of the Terms of Use.