The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2006 vol.18)
pp: 1156-1165
ABSTRACT
Text categorization is continuing to be one of the most researched NLP problems due to the ever-increasing amounts of electronic documents and digital libraries. In this paper, we present a new text categorization method that combines the distributional clustering of words and a learning logic technique, called Lsquare, for constructing text classifiers. The high dimensionality of text in a document has not been fruitful for the task of categorization, for which reason, feature clustering has been proven to be an ideal alternative to feature selection for reducing the dimensionality. We, therefore, use distributional clustering method (IB) to generate an efficient representation of documents and apply Lsquare for training text classifiers. The method was extensively tested and evaluated. The proposed method achieves higher or comparable classification accuracy and {\rm F}_1 results compared with SVM on exact experimental settings with a small number of training documents on three benchmark data sets WebKB, 20Newsgroup, and Reuters-21578. The results prove that the method is a good choice for applications with a limited amount of labeled training data. We also demonstrate the effect of changing training size on the classification performance of the learners.
INDEX TERMS
Text categorization, feature selection, machine learning.
CITATION
Hisham Al-Mubaid, Syed A. Umair, "A New Text Categorization Technique Using Distributional Clustering and Learning Logic", IEEE Transactions on Knowledge & Data Engineering, vol.18, no. 9, pp. 1156-1165, September 2006, doi:10.1109/TKDE.2006.135
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool