The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2014 vol.26)
pp: 500-513
Xiao-Lin Wang , Shanghai Jiao Tong University, Shanghai
Hai Zhao , Shanghai Jiao Tong University, Shanghai
Bao-Liang Lu , Shanghai Jiao Tong University, Shanghai
ABSTRACT
Recent large-scale hierarchical classification tasks typically have tens of thousands of classes on which the most widely used approach to multiclass classification--one-versus-rest--becomes intractable due to computational complexity. The top-down methods are usually adopted instead, but they are less accurate because of the so-called error-propagation problem in their classifying phase. To address this problem, this paper proposes a meta-top-down method that employs metaclassification to enhance the normal top-down classifying procedure. The proposed method is first analyzed theoretically on complexity and accuracy, and then applied to five real-world large-scale data sets. The experimental results indicate that the classification accuracy is largely improved, while the increased time costs are smaller than most of the existing approaches.
INDEX TERMS
text classification, Large-scale hierarchical classification, metalearning, ensemble learning, metaclassification, top-down method,
CITATION
Xiao-Lin Wang, Hai Zhao, Bao-Liang Lu, "A Meta-Top-Down Method for Large-Scale Hierarchical Classification", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 3, pp. 500-513, March 2014, doi:10.1109/TKDE.2013.30
REFERENCES
[1] F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[2] Y. Yang, "A Study of Thresholding Strategies for Text Categorization," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 137-145, 2001.
[3] Z.H. Zhou, Ensemble Methods: Foundations and Algorithms. Chapman and Hall Press, 2012.
[4] C.J. Fall, A. Törcsvári, K. Benzineb, and G. Karetka, "Automated Categorization in the International Patent Classification," ACM SIGIR Forum, vol. 37, no. 1, pp. 10-25, 2003.
[5] A. Fujii, M. Iwayama, and N. Kando, "Introduction to the Special Issue on Patent Processing," Information Processing and Management, vol. 43, no. 5, pp. 1149-1153, 2007.
[6] C. Ma, B.L. Lu, and M. Utiyama, "Incorporating Prior Knowledge into Task Decomposition for Large-Scale Patent Classification," Proc. Sixth Int'l Symp. Neural Networks: Advances in Neural Networks (ISNN '09), pp. 784-793, 2009.
[7] Y. Labrou and T. Finin, "Yahoo! as an Ontology: Using Yahoo! Categories to Describe Documents," Proc. Eighth Int'l Conf. Information and Knowledge Management, pp. 180-187, 1999.
[8] T.Y. Liu, Y. Yang, H. Wan, H.J. Zeng, Z. Chen, and W. Ma, "Support Vector Machines Classification with a Very Large-Scale Taxonomy," ACM SIGKDD Explorations, vol. 7, no. 1, pp. 36-43, 2005.
[9] D. Koller and M. Sahami, "Hierarchically Classifying Documents Using Very Few Words," Proc. Int'l Conf. Machine Learning (ICML '97), pp. 170-178, 1997.
[10] A. Sun and E.P. Lim, "Hierarchical Text Classification and Evaluation," Proc. IEEE Int'l Conf. Data Mining (ICDM '01), pp. 521-528, 2001.
[11] Y. Yang, J. Zhang, and B. Kisiel, "A Scalability Analysis of Classifiers in Text Categorization," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Informaion Retrieval (SIGIR '03), pp. 96-103, 2003.
[12] A. Montejo-Ráez and L. Ureña-López, "Selection Strategies for Multi-Label Text Categorization," Proc. Advances in Natural Language Processing, pp. 585-592, 2006.
[13] G.R. Xue, D. Xing, Q. Yang, and Y. Yu, "Deep Classification in Large-Scale Text Hierarchies," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '08), pp. 619-626, 2008.
[14] P.N. Bennett and N. Nguyen, "Refined Experts: Improving Classification in Large Taxonomies," Proc. 32nd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '09), pp. 11-18, 2009.
[15] M. Ceci and D. Malerba, "Classifying Web Documents in a Hierarchy of Categories: A Comprehensive Study," J. Intelligent Information Systems, vol. 28, no. 1, pp. 37-78, 2007.
[16] L. Xu, A. Krzyzak, and C.Y. Suen, "Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition," IEEE Trans. Systems, Man, and Cybernetics, vol. 22, no. 3, pp. 418-435, May/June 1992.
[17] P. Brazdil, J. Gama, and B. Henery, "Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning," Proc. European Conf. Machine Learning, pp. 83-102, 1994.
[18] T.K. Ho, J.J. Hull, and S.N. Srihari, "Decision Combination in Multiple Classifier Systems," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, Jan. 1994.
[19] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, "On Combining Classifiers," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
[20] B. Lu and M. Ito, "Task Decomposition and Module Combination Based on Class Relations: A Modular Neural Network for Pattern Classification," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1244-1256, Sept. 1999.
[21] K.M. Ting and I.H. Witten, "Issues in Stacked Generalization," J. Artificial Intelligence Research, vol. 10, pp. 271-289, 1999.
[22] R.P.W. Duin, "The Combining Classifier: To Train or Not to Train?" Proc. 16th Int'l Conf. Pattern Recognition, vol. 2, pp. 765-770, 2002.
[23] W.H. Lin, R. Jin, and A. Hauptmann, "Meta-Classification of Multimedia Classifiers," Proc. Int'l Workshop Knowledge Discovery in Multimedia and Complex Data, 2002.
[24] L. Todorovski and S. Džeroski, "Combining Classifiers with Meta Decision Trees," Machine Learning, vol. 50, no. 3, pp. 223-249, 2003.
[25] S. Džeroski and B. Ženko, "Is Combining Classifiers with Stacking Better than Selecting the Best One?" Machine Learning, vol. 54, no. 3, pp. 255-273, 2004.
[26] C.L. Liu, H. Hao, and H. Sako, "Confidence Transformation for Combining Classifiers," Pattern Analysis and Applications, vol. 7, no. 1, pp. 2-17, 2004.
[27] C.L. Liu, "Classifier Combination Based on Confidence Transformation," Pattern Recognition, vol. 38, no. 1, pp. 11-28, 2005.
[28] S. Tulyakov, S. Jaeger, V. Govindaraju, and D. Doermann, "Review of Classifier Combination Methods," Machine Learning in Document Analysis and Recognition, pp. 361-386, Springer, 2008.
[29] Q. Kong, H. Zhao, and B.L. Lu, "Adaptive Ensemble Learning Strategy Using an Assistant Classifier for Large-Scale Imbalanced Patent Categorization," Proc. 17th Int'l Conf. Neural Information Processing: Theory and Algorithms, pp. 601-608, 2010.
[30] X.L. Wang and B.L. Lu, "Flatten Hierarchies for Large-Scale Hierarchical Text Categorization," Proc. Fifth Int'l Conf. Digital Information Management, pp. 139-144, 2010.
[31] A. Kosmopoulos, E. Gaussier, G. Paliouras, and S. Aseervatham, "The ECIR 2010 Large Scale Hierarchical Classification Workshop," ACM SIGIR Forum, vol. 44, no. 1, pp. 23-32, 2010.
[32] A. Sun, E.P. Lim, W.K. Ng, and J. Srivastava, "Blocking Reduction Strategies in Hierarchical Text Classification," IEEE Trans. Knowledge and Data Engineering, vol. 16, no. 10, pp. 1305-1308, Oct. 2004.
[33] S. Dumais and H. Chen, "Hierarchical Classification of Web Content," Proc. 23rd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '00), pp. 256-263, 2000.
[34] A.A. Freitas and A.C. de Carvalho, "A Tutorial on Hierarchical Classification with Applications in Bioinformatics," Research and Trends in Data Mining Technologies and Applications, D. Taniar, ed., pp. 175-208, Idea Group, 2007.
[35] C.N. Silla and A.A. Freitas, "A Survey of Hierarchical Classification across Different Application Domains," Data Mining and Knowledge Discovery, vol. 22, pp. 1-42, 2010.
[36] D.D. Lewis, Y. Yang, T.G. Rose, and F. Li, "RCV1: A New Benchmark Collection for Text Categorization Research," J. Machine Learning Research, vol. 5, pp. 361-397, 2004.
[37] C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001.
[38] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods," Proc. Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61-74, 1999.
[39] H.T. Lin, C.J. Lin, and R.C. Weng, "A Note on Platt's Probabilistic Outputs for Support Vector Machines," Machine Learning, vol. 68, no. 3, pp. 267-276, 2007.
[40] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni, "Hierarchical Classification: Combining Bayes with SVM," Proc. Int'l Conf. Machine Learning (ICML '06), pp. 177-184, 2006.
[41] A. Kosmopoulos, E. Gaussier, G. Paliouras, and S. Aseervatham, "The ECIR 2010 Large Scale Hierarchical Classification Workshop," ACM SIGIR Forum, vol. 44, no. 1, pp. 23-32, 2010.
[42] G. Paliouras, E. Gaussier, A. Kosmopoulos, I. Androutsopoulos, T. Artieres, and P. Gallinari, Proc. Joint ECML/PKDD PASCAL Workshop Large-Scale Hierarchical Classification, 2011.
[43] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, "DBpedia: A Nucleus for a Web of Open Data," Proc. Sixth Int'l The Semantic Web and Second Asian Conf. Asian Semantic Web Conf., pp. 722-735, 2007.
[44] K. Wu, B.L. Lu, M. Utiyama, and H. Isahara, "An Empirical Comparison of Min-Max-Modular K-NN with Different Voting Methods to Large-Scale Text Categorization," Soft Computing, vol. 12, no. 7, pp. 647-655, 2008.
[45] X. Chu, C. Ma, J. Li, B. Lu, M. Utiyama, and H. Isahara, "Large-Scale Patent Classification with Min-Max Modular Support Vector Machines," IEEE Int'l Joint Conf. Neural Networks (IJCNN '08), pp. 3973-3980, 2008.
[46] S. Fujita, "Technology Survey and Invalidity Search: A Comparative Study of Different Tasks for Japanese Patent Document Retrieval," Information Processing Management, vol. 43, no. 5, pp. 1154-1172, 2007.
[47] P. Willett, "The Porter Stemming Algorithm: Then and Now," Program: Electronic Library and Information Systems, vol. 40, no. 3, pp. 219-223, 2006.
[48] M. Porter, "Snowball: A Language for Stemming Algorithms," 2001.
[49] Y. Matsumoto, A. Kitauchi, T. Yamashita, Y. Hirano, H. Matsuda, K. Takaoka, and M. Asahara, "Morphological Analysis System Chasen Version 2.2. 1 Manual," Nara Inst. of Science and Technology, 2000.
[50] R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin, "LIBLINEAR: A Library for Large Linear Classification," J. Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
[51] O. Madani and J. Huang, "On Updates That Constrain the Features' Connections during Learning," Proc. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD '08), pp. 515-523, 2008.
[52] O. Madani and J. Huang, "Large-Scale Many-Class Prediction via Flat Techniques," Proc. ECIR Large-Scale Hierarchical Classification Workshop, 2010.
[53] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, "Online Passive-Aggressive Algorithms," J. Machine Learning Research, vol. 7, pp. 551-585, 2006.
[54] C. Brouard, "ECHO at the LSHTC Pascal Challenge 2," Proc. Joint ECML/PKDD PASCAL Workshop Large-Scale Hierarchical Classification, pp. 49-57, 2011.
[55] V.N. Rao and R. Uppuluri, "A Two Stage Classification Approach for Large Scale Heirarchical Classification," http://lshtc.iit. demokritos.grlshtc2_shortpapers , 2011.
[56] X.L. Wang, H. Zhao, and B. Lu, "Enhance K-Nearest Neighbour Algorithm for Large-Scale Multi-Labeled Hierarchical Classification," Proc. Joint ECML/PKDD PASCAL Workshop Large-Scale Hierarchical Classification, 2011.
[57] X. Han, J. Liu, Z. Shen, and C. Miao, "An Optimized K-Nearest Neighbor Algorithm for Large Scale Hierarchical Text Classification," Proc. Joint ECML/PKDD PASCAL Workshop Large-Scale Hierarchical Classification, pp. 2-12, 2011.
[58] H. Malik, "Improving Hierarchical SVMS by Hierarchy Flattening and Lazy Classification," Proc. ECIR Large-Scale Hierarchical Classification Workshop, 2010.
[59] C. Manning and D. Klein, "Optimization, Maxent Models, and Conditional Estimation Without Magic," Proc. Conf. North Am. Chapter of the Assoc. for Computational Linguistics on Human Language Technology: Tutorials, pp. 8-8, 2003.
[60] T. Joachims, Making Large-Scale Support Vector Machine Learning Practical, pp. 169-184, MIT Press, 1999.
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool