This Article 
 Bibliographic References 
 Add to: 
Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning
September 2005 (vol. 17 no. 9)
pp. 1263-1273
Tie-Yan Liu, IEEE Computer Society
Wei-Ying Ma, IEEE Computer Society
Multiclass classification has been investigated for many years in the literature. Recently, the scales of real-world multiclass classification applications have become larger and larger. For example, there are hundreds of thousands of categories employed in the Open Directory Project (ODP) and the Yahoo! directory. In such cases, the scalability of classification methods turns out to be a major concern. To tackle this problem, hierarchical classification is proposed and widely adopted to get better trade-off between effectiveness and efficiency. Unfortunately, many data sets are not explicitly organized in hierarchical forms and, therefore, hierarchical classification cannot be used directly. In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification. In particular, we first compute matrices to represent the relations among categories, documents, and terms. And, then, we cocluster the three substances at different scales through consistent bipartite spectral graph copartitioning, which is formulated as a generalized singular value decomposition problem. At last, a hierarchical taxonomy is constructed from the category clusters. Our experiments showed that the proposed algorithm could discover very reasonable taxonomy hierarchy and help improve the classification accuracy.

[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 1999.
[2] I.S. Dhillon, “Coclustering Documents and Words Using Bipartite Spectral Graph Partitioning,” Proc. SIGKDD '01, 2001.
[3] I.S. Dhillon, S. Mallela, and D.S. Modha, “Information-Theoretic Co-Clustering,” Proc. SIGKDD '03, pp. 89-98, 2003.
[4] C. Ding, X. He, H. Zha, M. Gu, and H. Simon, “A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering,” Proc. IEEE Int'l Conf. Data Mining, 2001.
[5] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons Inc., 2001.
[6] S.T. Dumais and H. Chen, “Hierarchical Classification of Web Content,” Proc. SIGIR '00, 2000.
[7] C. Elkan, “Using the Triangle Inequality to Accelerate $k$ -Means,” Proc. Int'l Conf. Machine Learning, 2003.
[8] S. Godbole, “Exploiting Confusion Matrices for Automatic Generation of Topic Hierarchies and Scaling Up Multi-Way Classifiers,” technical report, IIT Bombay, 2002.
[9] S. Godbole, S. Sarawagi, and S. Chakrabarti, “Scaling Multi-Class Support Vector Machines Using Inter-Class Confusion,” Proc. SIGKDD '02, 2002.
[10] G.H. Golub and C.F.V. Loan, Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.
[11] L. Hagen and A.B. Kahng, “New Spectral Methods for Ratio Cut Partitioning and Clustering,” IEEE Trans. Computer Aided Design, vol. 11, pp. 1074-1085, 1992.
[12] G. Hamerly and C. Elkan, “Learning the $k$ in $k$ -Means,” Proc. Neural Information Processing Systems Conf., 2003.
[13] K. Kummamuru, A. Dhawale, and R. Krishnapuram, “Fuzzy Co-Clustering of Documents and Keywords,” Proc. IEEE Int'l Conf. Fuzzy Systems, pp. 772-777, 2003.
[14] D.D. Lewis, “Evaluating Text Categorization,” Proc. Speech and Natural Language Workshop, 1991.
[15] D.D Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Classification Research,” J. Machine Learning Research, vol. 5, pp. 361-397, 2004.
[16] T. Liu, Y. Yang, H. Wan, Q. Zhou, B. Gao, H. Zeng, Z. Chen, and W. Ma, “An Experimental Study on Large-Scale Web Categorization,” Proc. Int'l World Wide Web Conf., 2005.
[17] M. Meila and L. Xu, “Multiway Cuts and Spectral Clustering,”, 2004.
[18] F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys (CSUR), vol. 34, no. 1, pp. 1-47, 2002.
[19] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 888-905, 2000.
[20] G.W. Stewart and J.G. Sun, Matrix Perturbation Theory. Academic Press, 1990.
[21] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer Verlag, 1995.
[22] V. Vural and J.G Dy, “A Hierarchical Method for Multi-Class Support Vector Machines,” Proc. Int'l Conf. Machine Learning, 2004.
[23] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W. Ma, “ReCoM: Reinforcement Clustering of Multi-Type Interrelated Data Objects,” Proc. SIGIR '03, 2003.
[24] Y. Yang, “A Scalability Analysis of Classifiers in Text Classification,” Proc. SIGIR '03, 2003.
[25] H. Zha, C. Ding, and M. Gu, “Bipartite Graph Partitioning and Data Clustering,” Proc. Conf. Information and Knowledge Management, 2001.

Index Terms:
Index Terms- Clustering, data mining, singular value decomposition, text processing.
Bin Gao, Tie-Yan Liu, Guang Feng, Tao Qin, Qian-Sheng Cheng, Wei-Ying Ma, "Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1263-1273, Sept. 2005, doi:10.1109/TKDE.2005.147
Usage of this product signifies your acceptance of the Terms of Use.