This Article 
 Bibliographic References 
 Add to: 
Random k-Labelsets for Multilabel Classification
July 2011 (vol. 23 no. 7)
pp. 1079-1089
Grigorios Tsoumakas, Aristotle University of Thessaloniki, Thessaloniki
Ioannis Katakis, Aristotle University of Thessaloniki, Thessaloniki
Ioannis Vlahavas, Aristotle Univesity of Thessaloniki, Thessaloniki
A simple yet effective multilabel learning method, called label powerset (LP), considers each distinct combination of labels that exist in the training set as a different class value of a single-label classification task. The computational efficiency and predictive performance of LP is challenged by application domains with large number of labels and training examples. In these cases, the number of classes may become very large and at the same time many classes are associated with very few training examples. To deal with these problems, this paper proposes breaking the initial set of labels into a number of small random subsets, called labelsets and employing LP to train a corresponding classifier. The labelsets can be either disjoint or overlapping depending on which of two strategies is used to construct them. The proposed method is called {\rm RA}k{\rm EL} (RAndom k labELsets), where k is a parameter that specifies the size of the subsets. Empirical evidence indicates that {\rm RA}k{\rm EL} manages to improve substantially over LP, especially in domains with large number of labels and exhibits competitive performance against other high-performing multilabel learning methods.

[1] M. Boutell, J. Luo, X. Shen, and C. Brown, "Learning Multi-Label Scene Classification," Pattern Recognition, vol. 37, no. 9, pp. 1757-1771, 2004.
[2] M.-L. Zhang and Z.-H. Zhou, "Ml-Knn: A Lazy Learning Approach to Multi-Label Learning," Pattern Recognition, vol. 40, no. 7, pp. 2038-2048, 2007.
[3] C. Wang, S. Yan, L. Zhang, and H.-J. Zhang, "Multi-label Sparse Coding for Automatic Image Annotation," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '09), 2009.
[4] T. Li and M. Ogihara, "Toward Intelligent Music Information Retrieval," IEEE Trans. Multimedia, vol. 8, no. 3, pp. 564-574, 2006.
[5] A. Wieczorkowska, P. Synak, and Z. Ras, "Multi-Label Classification of Emotions in Music," Proc. Int'l Conf. Intelligent Information Processing and Web Mining (IIPWM '06), pp. 307-315, 2006.
[6] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas, "Multilabel Classification of Music into Emotions," Proc. Ninth Int'l Conf. Music Information Retrieval (ISMIR '08), 2008.
[7] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang, "Correlative Multi-Label Video Annotation," Proc. MULTIMEDIA '07: 15th Int'l Conf. Multimedia, pp. 17-26, 2007.
[8] C.G.M. Snoek, M. Worring, J.C. van Gemert, J.-M. Geusebroek, and A.W.M. Smeulders, "The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia," Proc. MULTIMEDIA '06: 14th Ann. ACM Int'l Conf. Multimedia, pp. 421-430, 2006.
[9] Y. Zhang, S. Burer, and W.N. Street, "Ensemble Pruning via Semi-Definite Programming," J. Machine Learning Research, vol. 7, pp. 1315-1338, 2006.
[10] I. Katakis, G. Tsoumakas, and I. Vlahavas, "Multilabel Text Classification for Automated Tag Suggestion," Proc. European Conf. Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD '08) Discovery Challenge, 2008.
[11] Y. Song, L. Zhang, and L.C. Giles, "A Sparse Gaussian Processes Classification Framework for Fast Tag Suggestions," Proc. CIKM '08: 17th ACM Conf. Information and Knowledge Management, pp. 93-102, 2008.
[12] G. Tsoumakas and I. Katakis, "Multi-Label Classification: An Overview," Int'l J. Data Warehousing and Mining, vol. 3, no. 3, pp. 1-13, 2007.
[13] G. Tsoumakas and I. Vlahavas, "Random K-Labelsets: An Ensemble Method for Multilabel Classification," Proc. 18th European Conf. Machine Learning (ECML '07), pp. 406-417, Sept. 2007.
[14] W. Chen, J. Yan, B. Zhang, Z. Chen, and Q. Yang, "Document Transformation for Multi-Label Feature Selection in Text Categorization," Proc. Seventh IEEE Int'l Conf. Data Mining, pp. 451-456, 2007.
[15] E. Hüllermeier, J. Fürnkranz, W. Cheng, and K. Bringer, "Label Ranking by Learning Pairwise Preferences," Artificial Intelligence, vol. 172, no. 16-17, pp. 1897-1916, Nov. 2008.
[16] E. Loza Mencia and J. Fürnkranz, "Pairwise Learning of Multilabel Classifications with Perceptrons," Proc. IEEE Int'l Joint Conf. Neural Networks (IJCNN '08), pp. 2900-2907, 2008.
[17] J. Fürnkranz, E. Hüllermeier, E.L. Mencia, and K. Brinker, "Multilabel Classification via Calibrated Label Ranking," Machine Learning, vol. 73, no. 2, pp. 133-153, Nov. 2008.
[18] A. Clare and R. King, "Knowledge Discovery in Multi-label Phenotype Data," Proc. Fifth European Conf. Principles of Data Mining and Knowledge Discovery (PKDD '01), pp. 42-53, 2001.
[19] Y. Schapire and R.E. Singer, "Boostexter: A Boosting-Based System for Text Categorization," Machine Learning, vol. 39, no. 2/3, pp. 135-168, 2000.
[20] F. de Comite, R. Gilleron, and M. Tommasi, "Learning Multi-Label Alternating Decision Trees from Texts and Data," Proc. Third Int'l Conf. Machine Learning and Data Mining in Pattern Recognition (MLDM '03), pp. 35-49, July 2003.
[21] A. McCallum, "Multi-Label Text Classification with a Mixture Model Trained by Em," Proc. Am. Assoc. for Artificial Intelligence (AAAI '99) Workshop Text Learning, 1999.
[22] N. Ueda and K. Saito, "Parametric Mixture Models for Multi-Labeled Text," Advances in Neural Information Processing Systems, vol. 15, pp. 721-728, 2003.
[23] A.P. Streich and J.M. Buhmann, "Classification of Multi-Labeled Data: A Generative Approach," Proc. 12th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '08), 2008.
[24] N. Ghamrawi and A. McCallum, "Collective Multi-Label Classification," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM '05), pp. 195-200, 2005.
[25] M.-L. Zhang and Z.-H. Zhou, "Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 10, pp. 1338-1351, Oct. 2006.
[26] M.-L. Zhang, "Ml-Rbf: Rbf Neural Networks for Multi-label Learning," Neural Processing Letters, vol. 29, no. 2, pp. 61-74, 2009.
[27] K. Crammer and Y. Singer, "A Family of Additive Online Algorithms for Category Ranking," J. Machine Learning Research, vol. 3, pp. 1025-1058, 2003.
[28] A. Elisseeff and J. Weston, "A Kernel Method for Multi-Labelled Classification," Proc. Advances in Neural Information Processing Systems 14, 2002.
[29] X. Luo and A. Zincir-Heywood, "Evaluation of Two Systems on Multi-Class Multi-Label Document Classification," Proc. 15th Int'l Symp. Methodologies for Intelligent Systems, pp. 161-169, 2005.
[30] K. Brinker and E. Hullermeier, "Case-Based Multilabel Ranking," Proc. 20th Int'l Joint Conf. Artificial Intelligence (IJCAI '07), pp. 702-707, Jan. 2007.
[31] E. Spyromitros, G. Tsoumakas, and I. Vlahavas, "An Empirical Study of Lazy Multilabel Classification Algorithms," Proc. Fifth Hellenic Conf. Artificial Intelligence (SETN '08), 2008.
[32] F. Thabtah, P. Cowling, and Y. Peng, "Mmac: A New Multi-class, Multi-Label Associative Classification Approach," Proc. Fourth IEEE Int'l Conf. Data Mining, (ICDM '04), pp. 217-224, 2004.
[33] A. Veloso, M.J. Wagner, M. Goncalves, and M. Zaki, "Multi-Label Lazy Associative Classification," Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '07), pp. 605-612, Sept. 2007.
[34] T.G. Dietterich and G. Bakiri, "Solving Multiclass Learning Problems via Error-Correcting Output Codes," J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.
[35] T.G. Dietterich, "Ensemble Methods in Machine Learning," Proc. First Int'l Workshop Multiple Classifier Systems, pp. 1-15, 2000.
[36] A. Srivastava and B. Zane-Ulman, "Discovering Recurring Anomalies Inc Text Reports Regarding Complex Space Systems," Proc. IEEE Aerospace Conf., 2005.
[37] M. Rogati and Y. Yang, "High-Performing Feature Selection for Text Classification," Proc. CIKM '02: Eleventh Int'l Conf. Information and Knowledge Management, pp. 659-661, 2002.
[38] D.D. Lewis, Y. Yang, T.G. Rose, and F. Li, "Rcv1: A New Benchmark Collection for Text Categorization Research," J. Machine Learning Research, vol. 5, pp. 361-397, 2004.
[39] Y. Yang, "An Evaluation of Statistical Approaches to Text Categorization," J. Information Retrieval, vol. 1, pp. 67-88, 1999.
[40] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
[41] J. Demsar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[42] C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector Machines, Software, 2001.
[43] J. Read, "A Pruned Problem Transformation Method for Multi-Label Classification," Proc. New Zealand Computer Science Research Student Conf. (NZCSRS '08), pp. 143-150, 2008.

Index Terms:
Categorization, multilabel, ensembles, labelset, classification.
Grigorios Tsoumakas, Ioannis Katakis, Ioannis Vlahavas, "Random k-Labelsets for Multilabel Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1079-1089, July 2011, doi:10.1109/TKDE.2010.164
Usage of this product signifies your acceptance of the Terms of Use.