This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel
May 2003 (vol. 25 no. 5)
pp. 628-633

Abstract—We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.

[1] H. Drucker, D. Wu, and V.N. Vapnik, “Support Vector Machines for Spam Categorization,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1048–1054, 1999.
[2] S. Dumais et al., "Inductive Learning Algorithms and Representations for Text Categorization, to be published in Proc. Conf. Information and Knowledge Management, 1998; .
[3] T. Joachims, “A Statistical Learning Model of Text Classification for Support Vector Machines,” Proc. 19th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 128-136, 2001.
[4] S. Knerr, L. Personnaz, and G. Dreyfus, "Handwritten Digit Recognition by Neural Networks with Single-Layer Training," IEEE Trans. Neural Networks, vol. 3, no. 6, pp. 962-968, Nov. 1992.
[5] L. Lam, C. Suen, “Optimal Combination of Pattern Classifiers,” Pattern Recognition Letters, vol. 16, pp. 945-954, 1995.
[6] W. Lam and C.Y. Ho, “Using a Generalized Instance Set for Automatic Text Categorization,” Proc. 21st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 81–89, 1998.
[7] D.D. Lewis, R.E. Schapire, J.P. Callan, and R. Papka, “Training Algorithms for Linear Text Classifiers,” Proc. 19th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 298–306, 1996.
[8] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” Proc. AAAI-98 Workshop Learning for Text Categorization, 1998.
[9] J.J. Rocchio, “Relevance Feedback in Information Retrieval,” The SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton, ed., Englewood Cliffs, N.J.: Prentice-Hall, Inc, 1971.
[10] R.E. Schapire and Y. Singer, BoosTexter: A Boosting-Based System for Text Categorization, to be published in Machine Learning, 1999.
[11] F. Sebastiani, A. Sperduti, and N. Valdambrini, “An Improved Boosting Algorithm and Its Application to Text Categorization,” Proc. Ninth Int'l Conf. Information Knowledge Management (CIKM 2000), pp. 78-85, 2000.
[12] S. Wermter, “Neural Network Agents for Learning Semantic Text Classification,” Information Retrieval, vol. 3, no. 2, pp. 87-103, 2000.
[13] B. Widrow and S. Stearns, Adaptive Signal Processing. Englewood Cliffs, N.J.: Prentice-Hall, 1985.
[14] Y. Yang, “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval,” Proc. 17th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 13–22, 1994.
[15] Y. Yang, T. Ault, and T. Pierce, “Combining Multiple Learning Strategies for Effective Cross Validation,” Proc. Int'l Conf. Machine Learning (ICML 2000), pp. 1167-1174, 2000.
[16] Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods,” Proc. 21st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 42-49, 1999.

Index Terms:
Text classification, instance-based learning, metamodel learning.
Citation:
Wai Lam, Yiqiu Han, "Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 628-633, May 2003, doi:10.1109/TPAMI.2003.1195997
Usage of this product signifies your acceptance of the Terms of Use.