The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2008 vol.30)
pp: 1557-1571
ABSTRACT
There has been recently a growing interest in the use of transductive inference for learning. We expand here the scope of transductive inference to active learning in a stream-based setting. Towards that end this paper proposes Query-by-Transduction (QBT) as a novel active learning algorithm. QBT queries the label of an example based on the p-values obtained using transduction. We show that QBT is closely related to Query-by-Committee (QBC) using relations between transduction, Bayesian statistical testing, Kullback-Leibler divergence, and Shannon information. The feasibility and utility of QBT is shown on both binary and multi-class classification tasks using SVM as the choice classifier. Our experimental results show that QBT compares favorably, in terms of mean generalization, against random sampling, committee-based active learning, margin-based active learning, and QBC in the stream-based setting.
INDEX TERMS
Machine learning, Statistical
CITATION
Shen-Shyang Ho, Harry Wechsler, "Query by Transduction", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 9, pp. 1557-1571, September 2008, doi:10.1109/TPAMI.2007.70811
REFERENCES
[1] V.N. Vapnik, The Nature of Statistical Learning Theory, second ed. Springer, 2000.
[2] T. Joachims, “Transductive Inference for Text Classification Using Support Vector Machines,” Proc. 16th Int'l Conf. Machine Learning, I. Bratko and S. Dzeroski, eds., pp. 200-209, 1999.
[3] F. Li and H. Wechsler, “Open Set Face Recognition Using Transduction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 11, pp. 1686-1697, Nov. 2005.
[4] M. Okabe, K. Umemura, and S. Yamada, “Query Expansion with the Minimum User Feedback by Transductive Learning,” Proc. Human Language Technology Conf. and Conf. Empirical Methods in Natural Language Processing (HLT/EMNLP '05), pp. 963-970, 2005.
[5] R. Craig and L. Liao, “Protein Classification Using Transductive Learning on Phylogenetic Profiles,” Proc. ACM Symp. Applied Computing, pp. 161-166, 2006.
[6] S. Tong and D. Koller, “Support Vector Machine Active Learning with Applications to Text Classification,” J. Machine Learning Research, vol. 2, pp. 45-66, 2001.
[7] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic Learning in a Random World. Springer, 2005.
[8] M. Li and P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, second ed. Springer, 1997.
[9] A. Gammerman and V. Vovk, “Prediction Algorithms and Confidence Measures Based on Algorithmic Randomness Theory,” Theoretical Computer Science, vol. 287, no. 1, pp. 209-217, 2002.
[10] S. Kullback, Information Theory and Statistics. John Wiley & Sons, 1959.
[11] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, “Selective Sampling Using the Query by Committee Algorithm,” Machine Learning, vol. 28, nos. 2-3, pp. 133-168, 1997.
[12] D.A. Cohn, Z. Ghahramani, and M.I. Jordan, “Active Learning with Statistical Models,” J. Artificial Intelligence Research, vol. 4, pp.129-145, 1996.
[13] T. Zhang and F. Oles, “A Probability Analysis on the Value of Unlabeled Data for Classification Problems,” Proc. 17th Int'l Conf. Machine Learning, pp. 1191-1198, 2000.
[14] M. Li and I. Sethi, “Confidence-Based Active Learning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp.1251-1261, Aug. 2006.
[15] D.D. Lewis and J. Catlett, “Heterogeneous Uncertainty Sampling for Supervised Learning,” Proc. 11th Int'l Conf. Machine Learning, pp. 148-156, 1994.
[16] D. Mackay, “Information-Based Objective Functions for Active Data Selection,” Neural Computation, vol. 4, no. 4, pp. 590-604, 1992.
[17] C. Zhang and T. Chen, “An Active Learning Framework for Content-Based Information Retrieval,” IEEE Trans. Multimedia, vol. 4, no. 2, pp. 260-268, 2002.
[18] H.S. Seung, M. Opper, and H. Sompolinsky, “Query by Committee,” Proc. Fifth Ann. Conf. Learning Theory, pp. 287-294, 1992.
[19] X. Zhu, J. Lafferty, and Z. Ghahramani, “Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” Proc. ICML Workshop Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
[20] N. Roy and A. McCallum, “Toward Optimal Active Learning through Sampling Estimation of Error Reduction,” Proc. 18th Int'l Conf. Machine Learning, pp. 441-448, 2001.
[21] R. Yan, J. Yang, and A.G. Hauptmann, “Automatically Labeling Video Data Using Multi-Class Active Learning,” Proc. Ninth Int'l Conf. Computer Vision, pp. 516-523, 2003.
[22] G. Schohn and D. Cohn, “Less Is More: Active Learning with Support Vector Machines,” Proc. 17th Int'l Conf. Machine Learning, pp. 839-846, 2000.
[23] C. Campbell, N. Cristianini, and A.J. Smola, “Query Learning with Large Margin Classifiers,” Proc. 17th Int'l Conf. Machine Learning, pp. 111-118, 2000.
[24] K. Brinker, “Active Learning with Kernel Machines,” PhD dissertation, Univ. of Paderborn, 2004.
[25] P. Mitra, C.A. Murthy, and S.K. Pal, “A Probabilistic Active Support Vector Learning Algorithm,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 413-418, Mar. 2004.
[26] N. Abe and H. Mamitsuka, “Query Learning Strategies Using Boosting and Bagging,” Proc. 15th Int'l Conf. Machine Learning, pp.1-9, 1998.
[27] I. Dagan and S. Engelson, “Committee-Based Sampling for Training Probabilistic Classifiers,” Proc. 12th Int'l Conf. Machine Learning, pp. 150-157, 1995.
[28] G. Tur, R.E. Schapire, and D. Hakkani-Tur, “Active Learning for Spoken Language Understanding,” Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, 2003.
[29] P. Melville and R. Mooney, “Diverse Ensembles for Active Learning,” Proc. 21st Int'l Conf. Machine Learning, pp. 584-591, 2004.
[30] R. Gilad-Bachrach, A. Navot, and N. Tishby, “Query by Committee Made Real,” Proc. Ann. Conf. Advances in Neural Information Processing Systems (NIPS '05), 2005.
[31] S. Dasgupta, A. Kalai, and C. Monteleoni, “Analysis of Perceptron-Based Active Learning,” Proc. 18th Ann. Conf. Learning Theory, 2005.
[32] A. McCallum and K. Nigam, “Employing EM and Pool-Based Active Learning for Text Classification,” Proc. 15th Int'l Conf. Machine Learning, pp. 359-367, 1998.
[33] P. Melville, S. Yang, M. Saar-Tsechansky, and R. Mooney, “Active Learning for Probability Estimation Using Jensen-Shannon Divergence,” Proc. European Conf. Machine Learning, pp. 268-279, 2005.
[34] R. Yan, J. Yang, and A. Hauptmann, “Automatically Labeling Video Data Using Multi-Class Active Learning,” Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 516-523, 2003.
[35] K.-S. Goh, E.Y. Chang, and W.C. Lai, “Multimodal Concept-Dependent Active Learning for Image Retrieval,” Proc. 12th ACM Int'l Conf. Multimedia, pp. 564-571, 2004.
[36] S. Tong and E.Y. Chang, “Support Vector Machine Active Learning for Image Retrieval,” Proc. ACM Multimedia, pp. 107-118, 2001.
[37] T. Luo, K. Kramer, D.B. Goldgof, L.O. Hall, S. Samson, A. Remsen, and T. Hopkins, “Active Learning to Recognize Multiple Types of Plankton,” J. Machine Learning Research, vol. 6, pp. 589-613, 2005.
[38] M.K. Warmuth, J. Liao, G. Raetsch, M. Mathieson, S. Putta, and C. Lemmen, “Active Learning with Support Vector Machines in the Drug Discovery Process,” J. Chemical Information and Computer Sciences, vol. 43, pp. 667-673, 2003.
[39] C. Dima and M. Hebert, “Active Learning for Outdoor Obstacle Detection,” Robotics: Science and Systems, pp. 9-16, 2005.
[40] R. Yan and A. Hauptmann, “Multi-Class Active Learning for Video Semantic Feature Extraction,” Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 67-72, 2004.
[41] S.C.H. Hoi, R. Jin, J. Zhu, and M.R. Lyu, “Batch Mode Active Learning and Its Application to Medical Image Classification,” Proc. 23rd Int'l Conf. Machine Learning, pp. 417-424, 2006.
[42] S.C.H. Hoi, R. Jin, and M.R. Lyu, “Large-Scale Text Categorization by Batch Mode Active Learning,” Proc. 15th Int'l Conf. World Wide Web, pp. 633-642, 2006.
[43] Y. Baram, R. Yaniv, and K. Luz, “Online Choice of Active Learning Algorithms,” J. Machine Learning Research, pp. 255-291, 2004.
[44] R. Kothari and V. Jain, “Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries,” IEEE Trans. Neural Networks, vol. 14, no. 6, 2003.
[45] V. Cherkassky and F. Mulier, Learning from Data: Concepts, Theory, and Methods. John Wiley & Sons, 1998.
[46] Semi-Supervised Learning, O. Chapelle, B. Schölkopf, and A. Zien, eds. MIT Press, 2006.
[47] K. Yu, J. Bi, and V. Tresp, “Active Learning via Transductive Experimental Design,” Proc. 23rd Int'l Conf. Machine Learning, pp.1081-1088, 2006.
[48] S.-S. Ho and H. Wechsler, “Transductive Confidence Machines for Active Learning,” Proc. Int'l Joint Conf. Neural Network (IJCNN '03), 2003.
[49] V. Vovk, A. Gammerman, and C. Saunders, “Machine-Learning Applications of Algorithmic Randomness,” Proc. 16th Int'l Conf. Machine Learning, I. Bratko and S. Dzeroski, eds., pp. 444-453, 1999.
[50] S. Weerahandi, Exact Statistical Methods for Data Analysis. Springer, 1994.
[51] K. Proedrou, I. Nouretdinov, V. Vovk, and A. Gammerman, “Transductive Confidence Machines for Pattern Recognition,” Proc. 13th European Conf. Machine Learning, T. Elomaa, H. Mannila, and H. Toivonen, eds., pp. 381-390, 2002.
[52] C. Saunders, A. Gammerman, and V. Vovk, “Transduction with Confidence and Credibility,” Proc. 16th Int'l Joint Conf. Artificial Intelligence, T. Dean, ed., pp. 722-726, 1999.
[53] T. Melluish, C. Saunders, I. Nouretdinov, and V. Vovk, “Comparing the Bayes and Typicalness Frameworks,” Proc. 12th European Conf. Machine Learning, pp. 360-371, 2001.
[54] G. Cauwenberghs and T. Poggio, “Incremental Support Vector Machine Learning,” Advances in Neural Information Processing Systems 13, pp. 409-415. MIT Press, 2000.
[55] S.-S. Ho and H. Wechsler, “Learning from Data Streams via Online Transduction,” Proc. ICDM Workshop Temporal Data Mining: Algorithms, Theory and Applications (TDM '04), 2004.
[56] T. Sellke, M.J. Bayarri, and J.O. Berger, “Calibration of p-values for Testing Precise Null Hypotheses,” The Am. Statistician, vol. 55, pp.62-71, 2001.
[57] T. Graepel, R. Herbrich, and K. Obermayer, “Bayesian Transduction,” Proc. Ann. Conf. Advances in Neural Information Processing Systems (NIPS '99), S.A. Solla, T.K. Leen, and K.-R. Müler, eds., pp.456-462, 1999.
[58] T. Graepel and R. Herbrich, “The Kernel Gibbs Sampler,” Proc. Ann. Conf. Advances in Neural Information Processing Systems (NIPS '00), T.K. Leen, T.G. Dietterich, and V. Tresp, eds., pp. 514-520, 2000.
[59] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.J. Jackel, “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation, vol. 1, pp.541-551, 1989.
[60] G. Rätsch, T. Onoda, and K.-R. Müler, “Soft Margins for Adaboost,” Machine Learning, vol. 42, no. 3, pp. 287-320, 2001.
[61] P. Frey and D. Slate, “Letter Recognition Using Holland-Style Adaptive Classifiers,” Machine Learning, vol. 6, pp. 161-182, 1991.
37 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool