This Article 
 Bibliographic References 
 Add to: 
Blocking Reduction Strategies in Hierarchical Text Classification
October 2004 (vol. 16 no. 10)
pp. 1305-1308
Wee-Keong Ng, IEEE Computer Society
One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. In this paper, we propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, Threshold Reduction, Restricted Voting, and Extended Multiplicative. Our experiments using Support Vector Machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.

[1] D. Koller and M. Sahami, Hierarchically Classifying Documents Using Very Few Words Proc. Int'l Conf. Machine Learning (ICML '97), pp. 170-178, July 1997.
[2] S. Chakrabarti, B.E. Dom, and P. Indyk, Enhanced Hypertext Categorization Using Hyperlinks Proc. ACM SIGMOD '98, pp. 307-318, 1998.
[3] A.S. Weigend, E.D. Wiener, and J.O. Pedersen, Exploiting Hierarchy in Text Categorization Information Retrieval, vol. 1, no. 3, pp. 193-216, Oct. 1999.
[4] S.T. Dumais and H. Chen, Hierarchical Classification of Web Content Proc. ACM SIGIR '00, pp. 256-263, July 2000.
[5] A. Sun and E.-P. Lim, Hierarchical Text Classification and Evaluation Proc. IEEE Int'l Conf. Data Mining (ICDM '01), pp. 521-528, Nov. 2001.
[6] R. Greiner, A. Grove, and D. Schuurmans, On Learning Hierarchical Classifications available at learning.html , 1997.
[7] L.S. Larkey and W.B. Croft, Combining Classifiers in Text Categorization Proc. ACM SIGIR '96, pp. 289-297, Aug. 1996.
[8] Y.H. Li and A.K. Jian, Classification of Text Documents The Computer J., vol. 41, no. 8, pp. 537-546, 1998.
[9] F. Sebastiani, Machine Learning in Automated Text Categorization ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[10] M.I. Jordan and R.A. Jacobs, Hierarchical Mixtures of Experts and the EM Algorithm Neural Computation, vol. 6, 1994.
[11] S.T. Dumais, J. Platt, D. Heckerman, and M. Sahami, Inductive Learning Algorithms and Representations for Text Categorization Proc. ACM Conf. Information and Knowledge Management (CIKM '98), pp. 148-155, Nov. 1998.
[12] T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features Proc. 10th European Conf. Machine Learning, pp. 137-142, Apr. 1998.
[13] A. Sun, E.-P. Lim, and W.-K. Ng, Performance Measurement Framework for Hierarchical Text Classification J. Am. Soc. for Information Science and Technology, vol. 54, no. 11, pp. 1014-1028, Sept. 2003.
[14] J.C. Platt, Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods Advances in Large-Margin Classifiers, P.J. Bartlett, B. Schölkopf, D. Schuurmans, and A.J. Smola, eds., pp. 61-74, MIT Press, 2000.
[15] Y. Yang and X. Liu, A Re-Examination of Text Categorization Methods Proc. ACM SIGIR '99, pp. 42-49, Aug. 1999.

Index Terms:
Data mining, text mining, classification.
Aixin Sun, Ee-Peng Lim, Wee-Keong Ng, Jaideep Srivastava, "Blocking Reduction Strategies in Hierarchical Text Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1305-1308, Oct. 2004, doi:10.1109/TKDE.2004.50
Usage of this product signifies your acceptance of the Terms of Use.