The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2008 vol.20)
pp: 156-171
ABSTRACT
Associative classification is a promising technique to build accurate classifiers. However, in large or correlated datasets, association rule mining may yield huge rule sets. Hence, several pruning techniques have been proposed to select a small subset of high quality rules. We argue that rule pruning should be reduced to a minimum, since the availability of a "rich" rule set may improve the accuracy of the classifier. The L^3 associative classifier is built by means of a lazy pruning technique which discards exclusively rules that only misclassify training data. Classification of unlabeled data is performed in two steps. A small subset of high quality rules is first considered. When this set is not able to classify the data, a larger rule set is exploited. This second set includes rules usually discarded by previous approaches. To cope with the need of mining large rule sets and efficiently use them for classification, a compact form is proposed to represent a complete rule set in a space-efficient way and without information loss. An extensive experimental evaluation on real and synthetic datasets shows that L^3 improves the classification accuracy with respect to previous approaches.
INDEX TERMS
Clustering, classification, and association rules, Data mining
CITATION
Elena Baralis, Silvia Chiusano, Paolo Garza, "A Lazy Approach to Associative Classification", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 2, pp. 156-171, February 2008, doi:10.1109/TKDE.2007.190677
REFERENCES
[1] D. Lewis, “Naïve (Bayes) at Forty: The Independence Assumption in Information Retrieval,” Proc. 10th European Conf. Machine Learning (ECML '98), Apr. 1998.
[2] J. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[3] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proc. 10th European Conf. Machine Learning (ECML '98), Apr. 1998.
[4] R. Agrawal, T. Imilienski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '93), May 1993.
[5] G. Dong, X. Zhang, L. Wong, and J. Li, “CAEP: Classification by Aggregating Emerging Patterns,” Proc. Second Int'l Conf. Discovery Science (DS '99), Dec. 1999.
[6] W. Li, J. Han, and J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,” Proc. IEEE Int'l Conf. Data Mining (ICDM '01), Nov. 2001.
[7] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD '98), Aug. 1998.
[8] K. Wang, S. Zhou, and Y. He, “Growing Decision Trees on Support-Less Association Rules,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '00), Aug. 2000.
[9] X. Yin and J. Han, “CPAR: Classification Based on Predictive Association Rules,” Proc. Third SIAM Int'l Conf. Data Mining (SDM '03), May 2003.
[10] B. Liu, Y. Ma, and K. Wong, “Improving an Association Rule Based Classifier,” Proc. Fourth European Conf. Principles of Data Mining and Knowledge Discovery (PKDD '00), Sept. 2000.
[11] E. Baralis and P. Garza, “A Lazy Approach to Pruning Classification Rules,” Proc. IEEE Int'l Conf. Data Mining (ICDM '02), Dec. 2002.
[12] E. Baralis, S. Chiusano, and P. Garza, “On Support Thresholds in Associative Classification,” Proc. ACM Symp. Applied Computing (SAC '04), Mar. 2004.
[13] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Efficient Mining of Association Rules Using Closed Itemsets Lattice,” Information Systems, vol. 24, no. 1, 1999.
[14] N. Pasquier, Y. Bastide, R. Taouil, G. Stumme, and L. Lakhal, “Mining Minimal Non-Redundant Association Rules Using Frequent Closed Itemsets,” Proc. Sixth Int'l Conf. Rules and Objects in Databases (DOOD '00), July 2000.
[15] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategy for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '03), Aug. 2003.
[16] M.J. Zaki and C.-J. Hsiao, “Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 4, Apr. 2005.
[17] E. Baralis and S. Chiusano, “Essential Classification Rule Sets,” ACM Trans. Database Systems, vol. 29, no. 4, Dec. 2004.
[18] D. Burdick, M. Calimlim, and J. Gehrke, “Mafia: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. 17th Int'l Conf. Data Eng. (ICDE '01), Apr. 2001.
[19] M.J. Zaki, “Generating Non-Redundant Association Rules,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '00), Aug. 2000.
[20] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Closed Set Based Discovery of Small Covers for Association Rules,” Networking and Information Systems, vol. 3, no. 2, 2001.
[21] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '00), May 2000.
[22] J. Li, G. Dong, K. Ramamohanarao, and L. Wong, “DeEPs: A New Instance-Based Lazy Discovery and Classification System,” Machine Learning, vol. 54, no. 2, 2004.
[23] X. Zhang, G. Dong, and K. Ramamohanarao, “Information-Based Classification by Aggregating Emerging Patterns,” Proc. Second Int'l Conf. Intelligent Data Eng. and Automated Learning (IDEAL '00), Dec. 2000.
[24] W.W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l. Conf. Machine Learning (ICML '95), July 1995.
[25] P. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, 1967.
[26] T.G. Dietterich, “Approximate Statistical Test for Comparing Supervised Classification Learning Algorithms,” Neural Computation, vol. 10, no. 7, 1998.
[27] G. Cong, K.-L. Tan, A.K.H. Tung, and X. Xu, “Mining Top-K Covering Rule Groups for Gene Expression Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), June 2005.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool