The Community for Technology Leaders
Green Image
Issue No. 08 - August (2009 vol. 21)
ISSN: 1041-4347
pp: 1118-1132
Chiara Cumbo , Exeura S.r.l., Rende
Veronica Lucia Policicchio , University of Calabria, Rende
Pasquale Rullo , University of Calabria, Rende
Salvatore Iiritano , Exeura S.r.l., Rende
This paper describes Olex, a novel method for the automatic induction of rule-based text classifiers. Olex supports a hypothesis language of the form "if T_{1} or \cdots or T_{n} occurs in document d, and none of T_{n + 1}, \ldots T_{n + m} occurs in d, then classify d under category c,” where each T_{i} is a conjunction of terms. The proposed method is simple and elegant. Despite this, the results of a systematic experimentation performed on the Reuters-21578, the Ohsumed, and the ODP data collections show that Olex provides classifiers that are accurate, compact, and comprehensible. A comparative analysis conducted against some of the most well-known learning algorithms (namely, Naive Bayes, Ripper, C4.5, SVM, and Linear Logistic Regression) demonstrates that it is more than competitive in terms of both predictive accuracy and efficiency.
Data mining, text mining, clustering, classification, and association rules, mining methods and algorithms.
Chiara Cumbo, Veronica Lucia Policicchio, Pasquale Rullo, Salvatore Iiritano, "Olex: Effective Rule Learning for Text Categorization", IEEE Transactions on Knowledge & Data Engineering, vol. 21, no. , pp. 1118-1132, August 2009, doi:10.1109/TKDE.2008.206
104 ms
(Ver 3.1 (10032016))