The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2010 vol.22)
pp: 46-58
Ke Zhou , Shanghai Jiao-Tong University, Shanghai
Gui-Rong Xue , Shanghai Jiao-Tong University, Shanghai
Qiang Yang , Hong Kong University of Science and Technology, Hong Kong
Yong Yu , Shanghai Jiao-Tong University, Shanghai
It is often difficult and time-consuming to provide a large amount of positive and negative examples for training a classification system in many applications such as information retrieval. Instead, users often find it easier to indicate just a few positive examples of what he or she likes, and thus, these are the only labeled examples available for the learning system. A large amount of unlabeled data are easier to obtain. How to make use of the positive and unlabeled data for learning is a critical problem in machine learning and information retrieval. Several approaches for solving this problem have been proposed in the past, but most of these methods do not work well when only a small amount of labeled positive data are available. In this paper, we propose a novel algorithm called Topic-Sensitive pLSA to solve this problem. This algorithm extends the original probabilistic latent semantic analysis (pLSA), which is a purely unsupervised framework, by injecting a small amount of supervision information from the user. The supervision from users is in the form of indicating which documents fit the users' interests. The supervision is encoded into a set of constraints. By introducing the penalty terms for these constraints, we propose an objective function that trades off the likelihood of the observed data and the enforcement of the constraints. We develop an iterative algorithm that can obtain the local optimum of the objective function. Experimental evaluation on three data corpora shows that the proposed method can improve the performance especially only with a small amount of labeled positive data.
Semisupervised learning, topic-sensitive probabilistic latent semantic analysis, document classification.
Ke Zhou, Gui-Rong Xue, Qiang Yang, Yong Yu, "Learning with Positive and Unlabeled Examples Using Topic-Sensitive PLSA", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 1, pp. 46-58, January 2010, doi:10.1109/TKDE.2009.56
[1] T. Joachims, “Transductive Inference for Text Classification Using Support Vector Machines,” Proc. 16th Int'l Conf. Machine Learning (ICML '99), I. Bratko and S. Dzeroski, eds., pp. 200-209, 1999.
[2] T. Joachims, “Transductive Learning via Spectral Graph Partitioning,” Proc. 20th Int'l Conf. Machine Learning (ICML '03), 2003.
[3] K.P. Bennett and A. Demiriz, “Semi-Supervised Support Vector Machines,” Proc. 1998 Conf. Advances in Neural Information Processing Systems II, pp. 368-374, 1999.
[4] R. Ghani, “Combining Labeled and Unlabeled Data for Multiclass Text Categorization,” Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 187-194, 2002.
[5] K. Nigam, A.K. McCallum, S. Thrun, and T.M. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, nos. 2/3, pp. 103-134, 2000.
[6] B. Liu, Y. Dai, X. Li, W.S. Lee, and P.S. Yu, “Building Text Classifiers Using Positive and Unlabeled Examples,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 179-188, 2003.
[7] B. Liu, W.S. Lee, P.S. Yu, and X. Li, “Partially Supervised Classification of Text Documents,” Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 387-394, 2002.
[8] X. Li and B. Liu, “Learning to Classify Texts Using Positive and Unlabeled Data,” Proc. 18th Int'l Joint Conf. Artificial Intelligence (IJCAI '03), pp. 587-594, Aug. 2003.
[9] H. Yu, J. Han, and K.C.-C. Chang, “PEBL: Positive Example Based Learning for Web Page Classification Using SVM,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 239-248, 2002.
[10] W.S. Lee and B. Liu, “Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression,” Proc. 20th Int'l Conf. Machine Learning (ICML '03), pp. 448-455, 2003.
[11] T. Hofmann, “Probabilistic Latent Semantic Analysis,” Proc. Conf. Uncertainty in Artificial Intelligence (UAI '99), 1999.
[12] B. Ribeiro-Neto and R. Baeza-Yates, Modern Information Retrieval. Addison-Wesley, 1999.
[13] F. Denis, “PAC Learning from Positive Statistical Queries,” Proc. Ninth Int'l Conf. Algorithmic Learning Theory (ALT '98), vol. 1501, 1998.
[14] G.P.C. Fung and H. Lu, “Text Classification without Negative Examples Revisit,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 6-20, Jan. 2006.
[15] B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson, “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443-1471, 2001.
[16] K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl, “Constrained k-Means Clustering with Background Knowledge,” Proc. 18th Int'l Conf. Machine Learning (ICML '01), pp. 577-584, 2001.
[17] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[18] I. Davidson and S.S. Ravi, “Clustering with Constraints: Feasibility Issues and the k-Means Algorithm,” Proc. SIAM Int'l Conf. Data Mining, 2005.
[19] D. Pelleg and D. Baras, “K-Means with Large and Noisy Constraint Sets,” Proc. European Conf. Machine Learning (ECML '07), pp. 674-682, 2007.
[20] X. Ji and W. Xu, “Document Clustering with Prior Knowledge,” Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '06), 2006.
[21] S. Basu, M. Bilenko, and R.J. Mooney, “A Probabilistic Framework for Semi-Supervised Clustering,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 59-68, 2004.
[22] T. Hofmann, “Probabilistic Latent Semantic Indexing,” Proc. 22nd Ann. ACM Conf. Research and Development in Information Retrieval, 1999.
[23] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, no. 1, 1977.
[24] D.R. Hunter and K. Lange, “A Tutorial on MM Algorithms,” The Am. Statistician, vol. 58, no. 1, pp. 30-37, 2004.
[25] J.D. Leeuw and W.J. Heiser, “Convergence of Correction-Matrix Algorithms for Multidimensional Scaling,” Geometric Representations of Relational Data, Mathesis Press, 1977.
[26] D.R. Hunter and R. Li, “Variable Selection Using MM Algorithms,” Annals of Statistics, vol. 33, pp. 1617-1642, 2005.
[27] Z. Zhang, J.T. Kwok, and D.-Y. Yeung, “Surrogate Maximization/Minimization Algorithms for AdaBoost and the Logistic Regression Model,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), p.117, 2004.
[28] H. Yu, J. Han, and K.C.-C. Chang, “PEBL: Web Page Classification without Negative Examples,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, Jan. 2004.
[29] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[30] G.P.C. Fung, J.X. Yu, H. Lu, and P.S. Yu, “Text Classification without Labeled Negative Documents,” Proc. 21st Int'l Conf. Data Eng. (ICDE '05), pp. 594-605, Apr. 2005.
[31] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, 2002.
[32] F. Letouzey, F. Denis, and R. Gilleron, “Learning from Positive and Unlabeled Examples,” Proc. 11th Int'l Conf. Algorithmic Learning Theory (ALT '00), pp. 71-85, 2000.
[33] F.D. Comité, F. Denis, R. Gilleron, and F. Letouzey, “Positive and Unlabeled Examples Help Learning,” Proc. 10th Int'l Conf. Algorithmic Learning Theory (ALT '99), pp. 219-230, 1999.
9 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool