This Article 
 Bibliographic References 
 Add to: 
On Mining Instance-Centric Classification Rules
November 2006 (vol. 18 no. 11)
pp. 1497-1511
Jianyong Wang, IEEE Computer Society
George Karypis, IEEE Computer Society
Many studies have shown that rule-based classifiers perform well in classifying categorical and sparse high-dimensional databases. However, a fundamental limitation with many rule-based classifiers is that they find the rules by employing various heuristic methods to prune the search space and select the rules based on the sequential database covering paradigm. As a result, the final set of rules that they use may not be the globally best rules for some instances in the training database. To make matters worse, these algorithms fail to fully exploit some more effective search space pruning methods in order to scale to large databases. In this paper, we present a new classifier, HARMONY, which directly mines the final set of classification rules. HARMONY uses an instance-centric rule-generation approach and it can assure that, for each training instance, one of the highest-confidence rules covering this instance is included in the final rule set, which helps in improving the overall accuracy of the classifier. By introducing several novel search strategies and pruning methods into the rule discovery process, HARMONY also has high efficiency and good scalability. Our thorough performance study with some large text and categorical databases has shown that HARMONY outperforms many well-known classifiers in terms of both accuracy and computational efficiency and scales well with regard to the database size.

[1] R. Agarwal, C. Aggarwal, and V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Item Sets,” J. Parallel and Distributed Computing, vol. 61, no. 3, 2001.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD '93, 1993.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases, pp. 487-499, 1994.
[4] K. Ali, S. Manganaris, and R. Srikant, “Partial Calssification Using Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 115-118, 1997.
[5] M. Antonie and O. Zaiane, “Text Document Categorization by Term Association,” Proc. Int'l Conf. Data Mining (ICDM '02), pp.19-26, 2002.
[6] C. Apte, F. Damerau, and S.M. Weiss, “Towards Language Independent Automated Learning of Text Categorization Models,” Proc. ACM SIGIR '94, pp. 23-30, 1994.
[7] R.J. Bayardo, “Brute-Force Mining of High-Confidence Classification Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 123-126, 1997.
[8] R.J. Bayardo and R. Agrawal, “Mining the Most Interesting Rules,” Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), 1999.
[9] R. Bekkerman, R. EI-Yaniv, N. Tishby, and Y. Winter, “On Feature Distribution Clustering for Text Categorization,” Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 146-153, 2001.
[10] S. Bergsma, “The Reuters-21578 (ModApte) Dataset,” Dept. of Computer Science, Univ. of Alberta,, 2004.
[11] S. Bergsma and D. Lin, “Title Similarity-Based Feature Weighting for Text Categorization,” CMPUT 650 Research Project Report, Dept. of Computer Science, Univ. of Alberta, 2004.
[12] F. Coenen, “The LUCS-KDD Implementations of the FOIL, PRM, and CPAR Algorithms,” FOIL-PRM-CPARfoilPrmC par.html, Computer Science Dept., Univ. of Liverpool, U.K., 2004.
[13] W. Cohen, “Fast Effective Rule Induction,” Proc. Int'l Conf. Machine Learning (ICML '95), pp. 115-123, 1995.
[14] G. Cong, K. Tan, A. Tung, and X. Xin, “Mining Top-k Covering Rule Groups for Gene Expression Data,” Proc. 2005 ACM SIGMOD Int'l Conf. Management of Data, pp. 670-681, 2005.
[15] G. Cong, X. Xu, F. Pan, A. Tung, and J. Yang, “FARMER: Finding Interesting Rule Groups in Microarray Datasets,” Proc. 2004 ACM SIGMOD Int'l Conf. Management of Data, pp. 143-154, 2004.
[16] M. Deshpande and G. Karypis, “Using Conjunction of Attribute Values for Classification,” Proc. 11th Int'l Conf. Information and Knowledge Management, pp. 356-364, 2002.
[17] G. Dong, X. Zhang, L. Wong, and J. Li, “CAEP: Classification by Aggregating Emerging Patterns,” Proc. Second Int'l Conf. Discovery Science, pp. 30-42, 1999.
[18] S. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive Learning Algorithms and Representations for Text Categorization,” Proc. Seventh Int'l Conf. Information and Knowledge Management, pp. 148-155, 1998.
[19] T. Fukuda, Y. Morimoto, and S. Motishita, “Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules,” Proc. 22th Int'l Conf. Very Large Data Bases, pp. 146-155, 1996.
[20] K. Gade, J. Wang, and G. Karypis, “Efficient Closed Pattern Mining in the Presence of Tough Block Constraints,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp.138-147, 2004.
[21] G. Grahne and J. Zhu, “Efficiently Using Prefix-Trees in Mining Frequent Itemsets,” Proc. Workshop Frequent Itemset Mining Implementations (FIMI '03), 2003.
[22] J. Han et al., “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 2000.
[23] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proc. 10th European Conf. Machine Learning (ECML '98), pp. 137-142, 1998.
[24] B. Lent, A. Swami, and J. Widom, “Clustering Association Rules,” Proc. 1997 Int'l Conf. Data Eng. (ICDE '97), pp. 220-231, 1997.
[25] N. Lesh, M. Zaki, and M. Ogihara, “Mining Features for Sequence Classification,” Proc. ACM Int'l Conf. Knowledge Discover and Data Mining (KDD '99), pp. 342-346, 1999.
[26] J. Li, G. Dong, K. Ramamohanarao, and L. Wong, “DeEPs: A New Instance-Based Discovery and Classification System,” Machine Learning, vol. 54, no. 2, 2004.
[27] W. Li, J. Han, and J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,” Proc. Int'l Conf. Data Mining (ICDM '01), pp. 369-376, 2001.
[28] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 80-86, 1998.
[29] J. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
[30] J. Quinlan and R. Cameron-Jones, “FOIL: A Midterm Report,” Proc. European Conf. Machine Learning, pp. 3-20, 1993.
[31] J. Wang and G. Karypis, “BAMBOO: Accelerating Closed Itemset Mining by Deeply Pushing the Length-Decreasing Support Constraint,” Proc. Fourth SIAM Int'l Conf. Data Mining, 2004.
[32] Y. Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Information Retrieval, vol. 1, nos. 1-2, 1999.
[33] X. Yin and J. Han, “CPAR: Classification based on Predictive Association Rules,” Proc. Third SIAM Int'l Conf. Data Mining, 2003.
[34] M. Zaki and C. Aggarwal, “XRULES: An Effective Structural Classifier for XML Data,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 316-325, 2003.

Index Terms:
Data mining, classification rule, instance-centric, classifier.
Jianyong Wang, George Karypis, "On Mining Instance-Centric Classification Rules," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1497-1511, Nov. 2006, doi:10.1109/TKDE.2006.179
Usage of this product signifies your acceptance of the Terms of Use.