The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2011 vol.23)
pp: 282-296
David Lo , Singapore Management University, Singapore
Jinyan Li , Nanyang Technological University, Singapore
Limsoon Wong , National University of Singapore, Singapore
Siau-Cheng Khoo , National University of Singapore, Singapore
ABSTRACT
Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely correspond to software specifications. To reduce the number of patterns and improve the efficiency of the algorithm, Lo et al. have also introduced mining closed iterative patterns, i.e., maximal patterns without any superpattern having the same support. In this paper, to technically deepen research on iterative pattern mining, we introduce mining iterative generators, i.e., minimal patterns without any subpattern having the same support. Iterative generators can be paired with closed patterns to produce a set of rules expressing forward, backward, and in-between temporal constraints among events in one general representation. We refer to these rules as representative rules. A comprehensive performance study shows the efficiency of our approach. A case study on traces of an industrial system shows how iterative generators and closed iterative patterns can be merged to form useful rules shedding light on software design.
INDEX TERMS
Frequent pattern mining, sequence database, iterative patterns, generators, representative rules, software engineering, reverse engineering, program comprehension.
CITATION
David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo, "Mining Iterative Generators and Representative Rules for Software Specification Discovery", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 2, pp. 282-296, February 2011, doi:10.1109/TKDE.2010.24
REFERENCES
[1] M. Lehmanm and L. Belady, Program Evolution—Processes of Software Change. Academic Press, 1985.
[2] S. Deelstra, M. Sinnema, and J. Bosch, "Experiences in Software Product Families: Problems and Issues During Product Derivation," Proc. Int'l Software Product Line Conf., 2004.
[3] E. Erlikh, "Leveraging Legacy System Dollars for E-Business," IEEE IT Professional, vol. 2, no. 3, pp. 17-23, May 2000.
[4] T. Standish, "An Essay on Software Reuse," IEEE Trans. Software Eng., vol. 10, no. 5, pp. 494-497, Sept. 1984.
[5] ITU-T, "ITU-T Recommendation Z.120: Message Sequence Chart (MSC)," 1999.
[6] C. Steel, R. Nagappan, and R. Lai, Core Security Patterns. Sun Microsystem, 2006.
[7] Java Transaction API (JTA), "http://java.sun.com/javaee/ technologies/ jtaindex.jsp," Dec. 2008.
[8] D. Lo, S.-C. Khoo, and C. Liu, "Efficient Mining of Iterative Patterns for Software Specification Discovery," Proc. ACM SIGKDD, 2007.
[9] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. IEEE Int'l Conf. Data Eng., 1995.
[10] H. Mannila, H. Toivonen, and A. Verkamo, "Discovery of Frequent Episodes in Event Sequences," Data Mining and Knowledge Discovery, vol. 1, pp. 259-289, 1997.
[11] W. Damm and D. Harel, "LSCs: Breathing Life into Message Sequence Charts," Formal Methods in System Design, vol. 19, pp. 45-80, 2001.
[12] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, "Discovering Frequent Closed Itemsets for Association Rules," Proc. Symp. Principles of Database Systems, 1999.
[13] J. Li, H. Li, L. Wong, J. Pei, and G. Dong, "Minimum Description Length Principle: Generators Are Preferable to Closed Patterns," Proc. AAAI Conf. Artificial Intelligence, 2006.
[14] J. Li, G. Liu, and L. Wong, "Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns," Proc. ACM SIGKDD, 2007.
[15] D. Lo, S.-C. Khoo, and J. Li, "Mining and Ranking Generators of Sequential Rules," Proc. SIAM Int'l Conf. Data Mining, 2008.
[16] E. Clarke, O. Grumberg, and D. Peled, Model Checking. MIT Press, 1999.
[17] M. Spiliopoulou, "Managing Interesting Rules in Sequence Mining," Proc. European Conf. Principles of Data Mining and Knowledge Discovery, 1999.
[18] D. Lo, S.-C. Khoo, and C. Liu, "Efficient Mining of Recurrent Rules from a Sequence Database," Proc. Int'l Conf. Database Systems for Advanced Applications, 2008.
[19] D. Lo, S.-C. Khoo, and C. Liu, "Mining Past-Time Temporal Rules from Execution Traces," Proc. Int'l Workshop Dynamic Analysis, 2008.
[20] "MarkingQueuedIrps," msdn.microsoft.com/en-us/libraryaa469118.aspx , Dec. 2008.
[21] X. Yan, J. Han, and R. Afhar, "CloSpan: Mining Closed Sequential Patterns in Large Data Sets," Proc. SIAM Int'l Conf. Data Mining, 2003.
[22] J. Wang and J. Han, "BIDE: Efficient Mining of Frequent Closed Sequences," Proc. IEEE Int'l Conf. Data Eng., 2004.
[23] G. Garriga, "Discovering Unbounded Episodes in Sequential Data," Proc. European Conf. Principles of Data Mining and Knowledge Discovery, 2003.
[24] M. Zhang, B. Kao, D. Cheung, and K. Yip, "Mining Periodic Patterns with Gap Requirement from Sequences," Proc. ACM SIGMOD, 2005.
[25] B. Ding, D. Lo, J. Han, and S.-C. Khoo, "Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database," Proc. IEEE Int'l Conf. Data Eng., 2009.
[26] C. Gao, J. Wang, Y. He, and L. Zhou, "Efficient Mining of Frequent Sequence Generators," Proc. Int'l Conf. World Wide Web (Poster), 2008.
[27] D. Lo, S. Maoz, and S.-C. Khoo, "Mining Modal Scenario-Based Specifications from Execution Traces of Reactive Systems," Proc. ACM/IEEE Int'l Conf. Automated Software Eng., 2007.
[28] D. Lo and S. Maoz, "Mining Scenario-Based Triggers and Effects," Proc. ACM/IEEE Int'l Conf. Automated Software Eng., 2008.
[29] D. Harel and R. Marelly, Come, Let's Play: Scenario-Based Programming Using LSCs and the Play-Engine. Springer, 2003.
[30] H. Kugler, D. Harel, A. Pnueli, Y. Lu, and Y. Bontemps, "Temporal Logic for Scenario-Based Specifications," Proc. Int'l Conf. Tools and Algorithms for the Construction and Analysis of Systems, 2005.
[31] K. Olender and L. Osterweil, "Cecil: A Sequencing Constraint Language for Automatic Static Analysis Generation," IEEE Trans. Software Eng., vol. 16, no. 3, pp. 268-280, Mar. 1990.
[32] "Windows Driver Kit: Driver Development Tools—CancelSpinLock," msdn.microsoft.com/en-us/libraryaa469115.aspx , Dec. 2008.
[33] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "Prefixspan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," Proc. IEEE Int'l Conf. Data Eng., 2001.
[34] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, "KDD-Cup 2000 Organizers' Report: Peeling the Onion," ACM SIGKDD Explorations, vol. 2, pp. 86-98, 2000.
[35] M. Hutchins, H. Foster, T. Goradia, and T. Ostrand, "Experiments on the Effectiveness of Dataflow- and Control-Flow-Based Test Adequacy Criteria," Proc. Int'l Conf. Software Eng., 1994.
[36] H. Zhong, L. Zhang, and H. Mei, "Early Filtering of Polluting Method Calls for Mining Temporal Specifications," Proc. Asia Pacific Software Eng. Conf., 2008.
[37] A. Hamou-Lhadj and T. Lethbridge, "Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System," Proc. IEEE Int'l Conf. Program Comprehension, 2006.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool