Issue No. 12 - December (2007 vol. 19)
Text categorization systems often induce document classifiers from pre-classified examples by the use of machine learning techniques. The circumstance that each example-document can belong to many different classes often leads to impractically high computational costs that sometimes grow exponentially in the number of features. Looking for ways to reduce these costs, we explored the possibility of running a ``baseline induction algorithm'' separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.
Machine Learning, text categorization, multi-label examples, data fusion, Dempster-Shafer Theory.
M. Kubat and K. Sarinnapakorn, "Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study," in IEEE Transactions on Knowledge & Data Engineering, vol. 19, no. , pp. 1638-1651, 2007.