The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2013 vol.39)
pp: 18-44
Davide Falessi , Simula Research Laboratory, Lysaker and University of Rome "TorVergata", Rome
Giovanni Cantone , University of Rome "TorVergata", Rome
Gerardo Canfora , University of Sannio, Benevento,
ABSTRACT
Though very important in software engineering, linking artifacts of the same type (clone detection) or different types (traceability recovery) is extremely tedious, error-prone, and effort-intensive. Past research focused on supporting analysts with techniques based on Natural Language Processing (NLP) to identify candidate links. Because many NLP techniques exist and their performance varies according to context, it is crucial to define and use reliable evaluation procedures. The aim of this paper is to propose a set of seven principles for evaluating the performance of NLP techniques in identifying equivalent requirements. In this paper, we conjecture, and verify, that NLP techniques perform on a given dataset according to both ability and the odds of identifying equivalent requirements correctly. For instance, when the odds of identifying equivalent requirements are very high, then it is reasonable to expect that NLP techniques will result in good performance. Our key idea is to measure this random factor of the specific dataset(s) in use and then adjust the observed performance accordingly. To support the application of the principles we report their practical application to a case study that evaluates the performance of a large number of NLP techniques for identifying equivalent requirements in the context of an Italian company in the defense and aerospace domain. The current application context is the evaluation of NLP techniques to identify equivalent requirements. However, most of the proposed principles seem applicable to evaluating any estimation technique aimed at supporting a binary decision (e.g., equivalent/nonequivalent), with the estimate in the range [0,1] (e.g., the similarity provided by the NLP), when the dataset(s) is used as a benchmark (i.e., testbed), independently of the type of estimator (i.e., requirements text) and of the estimation method (e.g., NLP).
INDEX TERMS
Natural language processing, Context, Semantics, Measurement, Matrix decomposition, Monitoring, Thesauri, metrics and measurement, Empirical software engineering, traceability recovery, natural language processing, equivalent requirements
CITATION
Davide Falessi, Giovanni Cantone, Gerardo Canfora, "Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques", IEEE Transactions on Software Engineering, vol.39, no. 1, pp. 18-44, Jan. 2013, doi:10.1109/TSE.2011.122
REFERENCES
[1] J. Natt och Dag, B. Regnell, P. Carlshamre, M. Andersson, and J. Karlsson, "A Feasibility Study of Automated Support for Similarity Analysis of Natural Language Requirements in Market-Driven Development," Requirements Eng. J., vol. 7, 2002.
[2] D. Falessi, L.C. Briand, and G. Cantone, "The Impact of Automated Support for Linking Equivalent Requirements Based on Similarity Measures," Technical Report 2009-08, Simula Research Laboratory, 2009.
[3] http://www.selex-si.com/SelexSI/ENindex.sdo , 2012.
[4] N. Niu and S. Easterbrook, "On-Demand Cluster Analysis for Product Line Functional Requirements," Proc. 12th Int'l Software Product Line Conf., 2008.
[5] E. Stierna and N. Rowe, "Applying Information-Retrieval Methods to Software Reuse: A Case Study," Information Processing and Management, vol. 39, pp. 67-74, 2003.
[6] V. Alves, C. Schwanninger, L. Barbosa, A. Rashid, P. Sawyer, P. Rayson, C. Pohl, and A. Rummler, "An Exploratory Study of Information Retrieval Techniques in Domain Analysis," Proc. 12th Int'l Software Product Line Conf., 2008.
[7] J. Natt och Dag, V. Gervasi, S. Brinkkemper, and B. Regnell, "Speeding Up Requirements Management in a Product Software Company: Linking Customer Wishes to Product Requirements through Linguistic Engineering," Proc. 12th IEEE Int'l Requirements Eng. Conf., 2004.
[8] N. Niu and S. Easterbrook, "Extracting and Modeling Product Line Functional Requirements," Proc. 16th IEEE Int'l Requirements Eng. Conf., 2008.
[9] P. Clements and L. Northrop, Software Product Lines: Practice and Patterns. Addison-Wesley, 2002.
[10] I. John, "Capturing Product Line Information from Legacy User Documentation," Software Product Lines, T. Kakola and J.C. Duenas, eds., pp. 127-159, Springer, 2006.
[11] G. Canfora and L. Cerulo, "A Taxonomy of Information Retrieval Models and Tools," J. Computing and Information Technology, vol. 12, 2007.
[12] A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora, "Recovering Traceability Links in Software Artifact Management Systems Using Information Retrieval Methods," ACM Trans. Software Eng. Methodologies, vol. 16, p. 13, 2007.
[13] C. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[14] A. De Lucia, R. Oliveto, and G. Tortora, "Assessing IR-Based Traceability Recovery Tools through Controlled Experiments," Empirical Software Eng., vol. 14, pp. 57-92, 2009.
[15] K. Pohl, G. Böckle, and F. van der Linden, Software Product Line Engineering: Foundations, Principles and Techniques. Springer, 2005.
[16] C. Manning and H. Schuetze, Foundations of Statistical Natural Language Processing. The MIT Press, 2000.
[17] K. Ryan, "The Role of Natural Language in Requirements Engineering," Proc. IEEE Int'l Symp. Requirements Eng., 1993.
[18] D. Binkley and D. Lawrie, "Information Retrieval Applications in Software Maintenance and Evolution," Encyclopedia of Software Engineering, E.P. Laplante, ed., Taylor & Francis LLC, 2010.
[19] B. Güldali, H. Funke, M. Jahnich, S. Sauer, and G. Engels, "Semi-Automated Test Planning for e-ID Systems by Using Requirements Clustering," Proc. IEEE/ACM Int'l Conf. Automated Software Eng., 2009.
[20] W.B. Frakes and B.A. Nejmeh, "Software Reuse through Information Retrieval," SIGIR Forum, vol. 21, pp. 30-36, 1987.
[21] Y.S. Maarek, D.M. Berry, and G.E. Kaiser, "An Information Retrieval Approach for Automatically Constructing Software Libraries," IEEE Trans. Software Eng., vol. 17, no. 8, pp. 800-813, Aug. 1991.
[22] T. Yue, L. Briand, and Y. Labiche, "A Systematic Review of Transformation Approaches between User Requirements and Analysis Models," Requirements Eng., pp. 1-25, 2010.
[23] B. Paech and C.H. Martell, Innovations for Requirement Analysis. From Stakeholders' Needs to Formal Designs. Springer-Verlag, 2008.
[24] N. Kiyavitskaya, N. Zeni, L. Mich, and D.M. Berry, "Requirements for Tools for Ambiguity Identification and Measurement in Natural Language Requirements Specifications," Requirements Eng., vol. 13, pp. 207-239, 2008.
[25] C. Francis, B. Nuseibeh, A. de Roeck, and A. Willis, "Identifying Nocuous Ambiguities in Natural Language Requirements," Proc. 14th IEEE Int'l Requirements Eng. Conf., pp. 59-68, 2006.
[26] H. Yang, A. Willis, A.D. Roeck, and B. Nuseibeh, "Automatic Detection of Nocuous Coordination Ambiguities in Natural Language Requirements," Proc. IEEE/ACM Int'l Conf. Automated Software Eng., 2010.
[27] J.H. Weber-Jahnke and A. Onabajo, "Finding Defects in Natural Language Confidentiality Requirements," Proc. 17th IEEE Int'l Requirements Eng. Conf., 2009.
[28] K. Lauenroth and K. Pohl, "Towards Automated Consistency Checks of Product Line Requirements Specifications," Proc. 22nd IEEE/ACM Int'l Conf. Automated Software Eng., 2007.
[29] J. Savolainen and J. Kuusela, "Consistency Management of Product Line Requirements," Proc. Fifth IEEE Int'l Symp. Requirements Eng., 2001.
[30] A. De Lucia, R. Oliveto, and P. Sgueglia, "Incremental Approach and User Feedbacks: A Silver Bullet for Traceability Recovery," Proc. 22nd IEEE Int'l Conf. Software Maintenance, 2006.
[31] H. Sultanov and J.H. Hayes, "Application of Swarm Techniques to Requirements Engineering: Requirements Tracing," Proc. 18th IEEE Int'l Requirements Eng. Conf., 2010.
[32] S.K. Sundaram, J.H. Hayes, A. Dekhtyar, and E.A. Holbrook, "Assessing Traceability of Software Engineering Artifacts," Requirements Eng. J., vol. 15, pp. 313-335, 2010.
[33] C. Duan and J. Cleland-Huang, "Clustering Support for Automated Tracing," Proc. 22nd IEEE/ACM Int'l Conf. Automated Software Eng., 2007.
[34] X. Zou, R. Settimi, and J. Cleland-Huang, "Term-Based Enhancement Factors for Improving Automated Requirement Trace Retrieval," Proc. ACM Int'l Symp. Grand Challenges of Traceability, 2007.
[35] M. Lormans and A. van Deursen, "Can LSI Help Reconstructing Requirements Traceability in Design and Test?" Proc. Conf. Software Maintenance and Reeng., 2006.
[36] E.A. Holbrook, J.H. Hayes, and A. Dekhtyar, "Toward Automating Requirements Satisfaction Assessment," Proc. 17th IEEE Int'l Requirements Eng. Conf., pp. 149-158, 2009.
[37] M. Di Penta, S. Gradara, and G. Antoniol, "Traceability Recovery in RAD Software Systems," Proc. 10th Int'l Workshop Program Comprehension, 2002.
[38] J. Cleland-Huang, R. Settimi, C. Duan, and X. Zou, "Utilizing Supporting Evidence to Improve Dynamic Requirements Traceability," Proc. 13th IEEE Int'l Conf. Requirements Eng., 2005.
[39] A. Marcus and J. Maletic, "Recovering Documentation-to-Source-Code Traceability Links Using Latent Semantic Indexing," Proc. 25th Int'l Conf. Software Eng., 2003.
[40] D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich, "Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval," IEEE Trans. Software Eng., vol. 33, no. 6, pp. 420-432, June 2007.
[41] G. Antoniol, A. Cimitile, and G. Casazza, "Traceability Recovery by Modeling Programmer Behavior," Proc. Seventh Working Conf. Reverse Eng., 2000.
[42] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo, "Recovering Traceability Links between Code and Documentation," IEEE Trans. Software Eng., vol. 28, no. 10, pp. 970-983, Oct. 2002.
[43] E.J. Uusitalo, M. Komssi, M. Kauppinen, and A.M. Davis, "Linking Requirements and Testing in Practice," Proc. 16th IEEE Int'l Requirements Eng. Conf., 2008.
[44] J.H. Hayes, A. Dekhtyar, and D.S. Janzen, "Towards Traceable Test-Driven Development," Proc. ICSE Workshop Traceability in Emerging Forms of Software Eng., 2009.
[45] S. Yadla, J.H. Hayes, and A. Dekhtyar, "Tracing Requirements to Defect Reports: An Application of Information Retrieval Techniques," Information Systems Software Eng., A NASA J., 2005.
[46] P. Runeson, M. Alexandersson, and O. Nyholm, "Detection of Duplicate Defect Reports Using Natural Language Processing," Proc. 29th Int'l Conf. Software Eng., 2007.
[47] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, "An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information," Proc. Int'l Conf. Software Eng., 2008.
[48] G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia, "Identifying the Starting Impact Set of a Maintenance Request: A Case Study," Proc. Conf. Software Maintenance and Reeng., 2000.
[49] J.H. Hayes and A. Dekhtyar, "A Framework for Comparing Requirements Tracing Experiments," Int'l J. Software Eng. and Knowledge Eng., vol. 15, pp. 751-781, 2005.
[50] J.H. Hayes, A. Dekhtyar, and S.K. Sundaram, "Advancing Candidate Link Generation for Requirements Tracing: The Study of Methods," IEEE Trans. Software Eng., vol. 32, no. 1 pp. 4-19, Jan. 2006.
[51] K. Kang, S. Cohen, J. Hess, W. Novak, and A. Peterson, "Feature-Oriented Domain Analysis (FODA) Feasibility Study," Technical Report CMU/SEI-90-TR-21, Software Eng. Inst., Carnegie Mellon Univ., 1990.
[52] C. Fellbaum, WordNet: An Electronic Lexical Database. The MIT Press, 1998.
[53] X.-Y. Liu, Y.-M. Zhou, and R.-S. Zheng, "Measuring Semantic Similarity in Wordnet," Proc. Int'l Conf. Machine Learning and Cybernetics, 2007.
[54] A. De Lucia, R. Oliveto, and P. Sgueglia, "Incremental Approach and User Feedbacks: A Silver Bullet for Traceability Recovery?," Proc. 22nd IEEE Int'l Conf. Software Maintenance, 2006.
[55] P. Sawyer, P. Rayson, and K. Cosh, "Shallow Knowledge as an Aid to Deep Understanding in Early Phase Requirements Engineering," IEEE Trans. Software Eng., vol. 31, no. 11, pp. 969-981, Nov. 2005.
[56] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[57] J. Chang and D. Blei, "Relational Topic Models for Document Networks," Proc. Conf. Artificial Intelligence and Statistics, 2009.
[58] G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella, and S. Panichella, "On the Role of the Nouns in IR-Based Traceability Recovery," Proc. Int'l Conf. Program Comprehension, 2009.
[59] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Longman Publishing Co., Inc., 1989.
[60] D. Hull, J. Pedersen, and H. Schutze, "Method Combination for Document Filtering," Proc. 19th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 1996.
[61] M. Porter, "An Algorithm for Suffix Stripping," Program Readings in Information Retrieval, vol. 14, pp. 130-137, 1980.
[62] K. Church and W.A. Gale, "Inverse Document Frequency (IDF): A Measure of Deviations from Poisson," Proc. Third Workshop Very Large Corpora, 1995.
[63] K. Sparck Jones, "A Statistical Interpretation of Term Specificity and Its Application in Retrieval," Document Retrieval Systems, pp. 132-142, Taylor Graham Publishing, 1988.
[64] S. Robertson, "Understanding Inverse Document Frequency: On Theoretical Arguments for IDF," J. Documentation, vol. 60, pp. 503-520, 2004.
[65] B. McCune, J. Grace, and D. Urban, Analysis of Ecological Communities. MjM Software Design, 2002.
[66] R.A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., 1999.
[67] W.B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., 1992.
[68] P. Resnik, "Using Information Content to Evaluate Semantic Similarity," Proc. 14th Int'l Joint Conf. Artificial Intelligence, 1995.
[69] D. Lin, "An Information-Theoretic Definition of Similarity," Proc. 15th Int'l Conf. Machine Learning, 1995.
[70] J. Jiang and D. Conrath, "Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy," Proc. Int'l Conf. Research in Computational Linguistics, 1997.
[71] G. Pirró and N. Seco, "Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content," Proc. Conf. Move to Meaningful Internet Systems, 2008.
[72] C. Corley and R. Mihalcea, "Measuring the Semantic Similarity of Texts," Proc. ACL Workshop Empirical Modeling of Semantic Equivalence and Entailment, 2005.
[73] X. Zou, R. Settimi, and J. Cleland-Huang, "Improving Automated Requirements Trace Retrieval: A Study of Term-Based Enhancement Methods," Empirical Software Eng., vol. 15, pp. 119-146, 2009.
[74] J. Cleland-Huang, R. Settimi, O. BenKhadra, E. Berezhanskaya, and S. Christina, "Goal-Centric Traceability for Managing Non-Functional Requirements," Proc. 27th Int'l Conf. Software Eng., 2005.
[75] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann 2005.
[76] D. Falessi, M. Ali Babar, G. Cantone, and P. Kruchten, "Applying Empirical Software Engineering to Software Architecture: Challenges and Lessons Learned," Empirical Software Eng., vol. 15, pp. 250-276, 2010.
[77] C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, and A. Wesslen, Experimentation in Software Eng.: An Introduction, Springer, 2000.
[78] A. Dekhtyar, J.H. Hayes, and J. Larsen, "Make the Most of Your Time: How Should the Analyst Work with Automated Traceability Tools," Proc. Third Int'l Workshop Predictive Modeling in Software Eng., 2007.
[79] X. Zou, R. Settimi, J. Cleland-Huang, and C. Duan, "Thresholding Strategy in Requirements Trace Retrieval," Proc. CTI Research Symp., 2004.
[80] C.J. van Rijsbergen, Information Retrieval. Butterworths, 1979.
[81] D.T. Larose, Data Mining Methods and Models. John Wiley and Sons, Inc., 2007.
[82] http://www.cs.waikato.ac.nz/~mlweka/, 2012.
[83] A. Dekhtyar, J. Hayes, and G. Antoniol, "Benchmarks for Traceability?," Proc. Int'l Symp. Grand Challenges in Traceability, 2007.
[84] B. Kitchenham, "A Procedure for Analyzing Unbalanced Datasets," IEEE Trans. Software Eng., vol. 24, no. 4, pp. 278-301, Apr. 1998.
[85] J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker, "A Machine Learning Approach for Tracing Regulatory Codes to Product Specific Requirements," Proc. 32nd ACM/IEEE Int'l Conf. Software Eng, vol. 1, 2010.
[86] M. Gibiec, A. Czauderna, and J. Cleland-Huang, "Towards Mining Replacement Queries for Hard-to-Retrieve Traces," Proc. IEEE/ACM Int'l Conf. Automated Software Eng., 2010.
[87] M. Stevenson and Y. Wilks, "The Interaction of Knowledge Sources in Word Sense Disambiguation," Computational Linguistics, vol. 27, pp. 321-349, 2001.
[88] D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, 2007.
[89] N. Mittas and L. Angelis, "Comparing Cost Prediction Models by Resampling Techniques," J. Systems and Software, vol. 81, pp. 616-632, 2008.
[90] F. Yates, "Contingency Tables Involving Small Numbers and the $\chi^2$ Test," J. Royal Statistical Soc., Suppl 1, 1934.
[91] P.M. DeLuca, A. Wambersie, and F.G. Whitmore, "Receiver Operating Characteristic Analysis in Medical Imaging," J. ICRU, vol. 8, p. 3, 2008.
[92] J.A. Wass, "Comparative Statistical Software Review," http://www.scientificcomputing.comcomparative-statistical-software. aspx , 2012.
[93] A.B. Kitchenham, E. Mendes, and H.G. Travassos, "Cross versus Within-Company Cost Estimation Studies: A Systematic Review," IEEE Trans. Software Eng., vol. 33, no. 5, pp. 316-329, May 2007.
[94] B. Kitchenham and E. Mendes, "Why Comparative Effort Prediction Studies May Be Invalid," Proc. Fifth Int'l Workshop Predictor Models in Software Eng., 2009.
[95] J. Miller, "Statistical Significance Testing: A Panacea for Software Technology Experiments?" J. Systems Software, vol. 73, pp. 183-192, 2004.
[96] J. Miller, J. Daly, M. Wood, M. Roper, and A. Brooks, "Statistical Power and Its Subcomponents—Missing and Misunderstood Concepts in Empirical Software Engineering Research," Information and Software Technology, vol. 39, pp. 285-295, 1997.
[97] J. Cohen, Statistical Power Analysis for the Behavioral Sciences. Psychology Press, 1988.
[98] G. Dunteman, Principal Component Analysis. Sage Publications, 1989.
[99] L.C. Briand, J. Wüst, J.W. Daly, and D.V. Porter, "Exploring the Relationship between Design Measures and Software Quality in Object-Oriented Systems," J. Systems and Software, vol. 51, pp. 245-273, 2000.
[100] L.C. Briand, J. Wust, and H. Lounis, "Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs," Empirical Software Eng., vol. 6, pp. 11-58, 2001.
[101] E. Arisholm, L. Briand, and A. Foyen, "Dynamic Coupling Measurement for Object-Oriented Software," IEEE Trans. Software Eng., vol. 30, no. 8, pp. 491-506, Aug. 2004.
[102] S. Thompson, Sampling. Wiley-Interscience, 1992.
[103] D.G. Kleinbaum, L.L. Kupper, and K.E. Muller, Applied Regression Analysis and Other Multivariable Methods. PWS Publishing Co., 1988.
[104] R. Kittredge and J. Lehrberger, Sublanguage: Studies on Language in Restricted Semantic Domains. Walter De Gruyter Inc., 1982.
[105] J. Lehrberger, "Sublanguage Analysis," Analyzing Language in Restricted Domains, R. Grishman and R. Kittredge, eds., Psychology Press, 1986.
[106] TEFSE, "Grand Challenges of Traceability," http://www.cs.wm. edu/semeru/tefse2011Challenge.htm , 2011.
[107] A.P. Sage and C.D. Cuppan, "On the Systems Engineering and Management of Systems of Systems and Federations of Systems," Information-Knowledge-Systems Management, vol. 2, pp. 325-345, 2001.
[108] P. Clements and C. Kreuger, "Point-Counterpoint: Being Proactive Pays Off—Eliminating the Adoption Barrier," IEEE Software, vol. 19, no. 4, pp. 28-31, July/Aug. 2002.
[109] J. Bosch, "On the Development of Software Product-Family Components," Proc. Third Int'l Conf. Software Product Lines, 2004.
[110] P. Clements, J.D. McGregor, and S.G. Cohen, "The Structured Intuitive Model for Product Line Economics (SIMPLE)," CMU/SEI-2005-TR-003, Carnegie Mellon Univ., 2005.
[111] I. John, J. Knodel, T. Lehner, and D. Muthig, "A Practical Guide to Product Line Scoping," Proc. 10th Int'l Software Product Line Conf., 2006.
[112] J. Natt och Dag, V. Gervasi, S. Brinkkemper, and B. Regnell, "A Linguistic-Engineering Approach to Large-Scale Requirements Management," IEEE Software, vol. 22, no. 1, pp. 32-39, Jan/Feb. 2005.
[113] http://eseg.uniroma2.it/tools/ANTARCTICA index.htm, 2012.
[114] J. Natt och Dag, T. Thelin, and B. Regnell, "An Experiment on Linguistic Tool Support for Consolidation of Requirements from Multiple Sources in Market-Driven Product Development," Empirical Software Eng., vol. 11, pp. 303-329, 2006.
[115] B. Vinz and L. Etzkorn, "Comments as a Sublanguage: A Study of Comment Grammar and Purpose," Proc. Int'l Conf. Software Eng. Research and Practice, 2008.
[116] L. Etzkorn, C. Davis, and L. Bowen, "The Language of Comments in Computer Software: A Sublanguage of English," J. Pragmatics, vol. 33, pp. 1731-1756, 2001.
10 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool