This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Manifold Adaptive Experimental Design for Text Categorization
April 2012 (vol. 24 no. 4)
pp. 707-719
Deng Cai, Zhejiang Univerisity, Hangzhou
Xiaofei He, Zhejiang University, Hangzhou
In many information processing tasks, labels are usually expensive and the unlabeled data points are abundant. To reduce the cost on collecting labels, it is crucial to predict which unlabeled examples are the most informative, i.e., improve the classifier the most if they were labeled. Many active learning techniques have been proposed for text categorization, such as {\rm SVM}_{Active} and Transductive Experimental Design. However, most of previous approaches try to discover the discriminant structure of the data space, whereas the geometrical structure is not well respected. In this paper, we propose a novel active learning algorithm which is performed in the data manifold adaptive kernel space. The manifold structure is incorporated into the kernel space by using graph Laplacian. This way, the manifold adaptive kernel space reflects the underlying geometry of the data. By minimizing the expected error with respect to the optimal classifier, we can select the most representative and discriminative data points for labeling. Experimental results on text categorization have demonstrated the effectiveness of our proposed approach.

[1] R. Angelova and G. Weikum, "Graph-Based Text Classification: Learning from Your Neighbors," Proc. 29th Int'l Conf. Research and Development in Information Retrieval, 2006.
[2] A.C. Atkinson and A.N. Donev, Optimum Experimental Designs, with SAS. Oxford Univ. Press, 2007.
[3] M. Belkin and P. Niyogi, "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering," Advances in Neural Information Processing Systems, vol. 14, pp. 585-591, 2001.
[4] M. Belkin, P. Niyogi, and V. Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Examples," J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.
[5] D. Cai, "Spectral Regression: A Regression Framework for Efficient Regularized Subspace Learning," PhD thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, May 2009.
[6] D. Cai, X. He, X. Wu, and J. Han, "Non-Negative Matrix Factorization on Manifold," Proc. Int'l Conf. Data Mining (ICDM '08), 2008.
[7] D. Cai, X. He, W.V. Zhang, and J. Han, "Regularized Locality Preserving Indexing via Spectral Regression," Proc. 16th ACM Conf. Information and Knowledge Management (CIKM '07), pp. 741-750, 2007.
[8] D. Cai, Q. Mei, J. Han, and C. Zhai, "Modeling Hidden Topics on Document Manifold," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM '08), pp. 911-920, 2008.
[9] D. Cai, X. Wang, and X. He, "Probabilistic Dyadic Data Analysis with Local and Global Consistency," Proc. 26th Ann. Int'l Conf. Machine Learning (ICML '09), pp. 105-112, 2009.
[10] O. Chapelle, "Active Learning for Parzen Window Classifier," Proc. Tenth Int'l Workshop Artificial Intelligence and Statistics, 2005.
[11] F.R.K. Chung, "Spectral Graph Theory," Regional Conference Series in Mathematics, vol. 92, AMS, 1997.
[12] D.A. Cohn, Z. Ghahramani, and M.I. Jordan, "Active Learning with Statistical Models," J. Artificial Intelligence Research, vol. 4, pp. 129-145, 1996.
[13] S. Dasgupta and D. Hsu, "Hierarchical Sampling for Active Learning," Proc. 25th Int'l Conf. Machine Learning (ICML '08), pp. 208-215, 2008.
[14] A. Dayanik, D.D. Lewis, D. Madigan, V. Menkov, and A. Genkin, "Constructing Informative Prior Distributions from Domain Knowledge in Text Classification," Proc. 29th Int'l Conf. Research and Development in Information Retrieval, 2006.
[15] P. Flaherty, M.I. Jordan, and A.P. Arkin, "Robust Design of Biological Experiments," Proc. Advances in Neural Information Processing Systems, vol. 18, 2005.
[16] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, "Selective Sampling Using the Query by Committee Algorithm," Machine Learning, vol. 28, nos. 2/3, pp. 133-168, 1997.
[17] B. Gao, G. Feng, T. Qin, Q.-S. Cheng, T.-Y. Liu, and W.-Y. Ma, "Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1263-1273, Sept. 2005.
[18] R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality Reduction by Learning an Invariant Mapping," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition (CVPR '06), pp. 1735-1742, 2006.
[19] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[20] X. He, D. Cai, H. Liu, and W.-Y. Ma, "Locality Preserving Indexing for Document Representation," Proc. 27th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '04), pp. 96-103, 2004.
[21] X. He, W. Min, D. Cai, and K. Zhou, "Laplacian Optimal Design for Image Retrieval," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), 2007.
[22] S.C. Hoi, R. Jin, and M.R. Lyu, "Large-Scale Text Categorization by Batch Mode Active Learning," Proc. 15th Int'l Conf. World Wide Web (WWW '06), 2006.
[23] T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proc. European Conf. Machine Learning (ICML), pp. 137-142, 1998.
[24] T. Joachims, "Transductive Inference for Text Classification Using Support Vector Machines," Proc. Int'l Conf. Machine Learning (ICML), pp. 200-209, 1999.
[25] T. Joachims, "Transductive Learning via Spectral Graph Partitioning," Proc. Int'l Conf. Machine Learning (ICML), pp. 290-297, 2003.
[26] J.M. Lee, Introduction to Smooth Manifolds. Springer, 2002.
[27] D.D. Lewis, Y. Yang, T.G. Rose, G. Dietterich, F. Li, and F. Li, "Rcv1: A New Benchmark Collection for Text Categorization Research," J. Machine Learning Research, vol. 5, pp. 361-397, 2004.
[28] N. Loeff, D. Frsyth, and D. Ramachandran, "Manifoldboost: Stagewise Function Approximation for Fully-, Semi- and Un-Supervised Learning," Proc. 25th Int'l Conf. Machine Learning (ICML '08), 2005.
[29] A. McCallum and K. Nigam, "Employing em in Pool-Based Active Learning for Text Classification," Proc. 15th Int'l Conf. Machine Learning (ICML '98), pp. 359-367, 1998.
[30] A.Y. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Advances in Neural Information Processing Systems, vol. 14, pp. 849-856, 2001.
[31] P. Niyogi, S. Smale, and S. Weinberger, "Finding the Homology of Submanifolds with High Confidence from Random Samples," Technical Report tr-2004-08, Dept. of Computer Science, Univ. of Chicago, 2004.
[32] H. Raghavan and J. Allan, "An Interative Algorithm for Asking and Incorporating Feature Feedback into Support Vector Machines," Proc. 30th Int'l Conf. Research and Development in Information Retrieval, 2007.
[33] S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[34] N. Roy and A. McCallum, "Toward Optimal Active Learning Through Sampling Estimation of Error Reduction," Proc. 18th Int'l Conf. Machine Learning (ICML), pp. 441-448, 2001.
[35] G. Schohn and D. Cohn, "Less is More: Active Learning with Support Vector Machines," Proc. 17th Int'l Conf. Machine Learning (ICML '00), 2000.
[36] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[37] B. Settles, "Active Learning Literature Survey," Computer Sciences Technical Report 1648, Univ. of Wisconsin-Madison, 2009.
[38] B. Settles and M. Craven, "An Analysis of Active Learning Strategies for Sequence Labeling Tasks," Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP '08), pp. 1069-1078, 2008.
[39] H. Seung, M. Opper, and H. Sompolinsky, "Query by Committee," Proc. Fifth Ann. Workshop Computational Learning Theory (COLT '92), pp. 287-294, 1992.
[40] V. Sindhwani, P. Niyogi, and M. Belkin, "Beyond the Point Cloud: From Transductive to Semi-Supervised Learning," Proc. Int'l Conf. Machine Learning (ICML '05), 2005.
[41] J. Tenenbaum, V. de Silva, and J. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[42] S. Tong and E. Chang, "Support Vector Machine Active Learning for Image Retrieval," Proc. Ninth ACM Int'l Conf. Multimedia (MULTIMEDIA '01), pp. 107-118, 2001.
[43] S. Tong and D. Koller, "Support Vector Machine Active Learning with Application to Text Classification," J. Machine Learning Research, vol. 2, pp. 45-66, 2001.
[44] Y. Yang, "An Evaluation of Statistical Approaches to Text Categorization," J. Information Retrieval, vol. 1, nos. 1/2, pp. 67-88, 1999.
[45] K. Yu, J. Bi, and V. Tresp, "Active Learning via Transductive Experimental Design," Proc. 23rd Int'l Conf. Machine Learning (ICML '06), 2006.
[46] K. Yu, S. Zhu, W. Xu, and Y. Gong, "Non-Greedy Active Learning for Text Categorization Using Convex Transductive Experimental Design," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '08), 2008.
[47] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, "Learning with Local and Global Consistency," Advances in Neural Information Processing Systems, vol. 16, pp. 321-328, 2003.
[48] X. Zhu, J. Lafferty, and Z. Ghahramani, "Combining Active Learning and Semisupervised Learning Using Gaussian Fields and Harmonic Functions," Proc. ICML Workshop the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 58-65, 2003.

Index Terms:
Text categorization, active learning, experimental design, manifold learning, kernel method.
Citation:
Deng Cai, Xiaofei He, "Manifold Adaptive Experimental Design for Text Categorization," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 707-719, April 2012, doi:10.1109/TKDE.2011.104
Usage of this product signifies your acceptance of the Terms of Use.