Subscribe
Issue No.05 - May (2011 vol.33)
pp: 958-977
Zhouyu Fu , Monash University, Victoria
Antonio Robles-Kelly , National ICT Australia, Canberra Research Laboratory, Canberra and Australian National University
Jun Zhou , National ICT Australia, Canberra Research Laboratory, Canberra and Australian National University
ABSTRACT
Multiple instance learning (MIL) is a paradigm in supervised learning that deals with the classification of collections of instances called bags. Each bag contains a number of instances from which features are extracted. The complexity of MIL is largely dependent on the number of instances in the training data set. Since we are usually confronted with a large instance space even for moderately sized real-world data sets applications, it is important to design efficient instance selection techniques to speed up the training process without compromising the performance. In this paper, we address the issue of instance selection in MIL. We propose MILIS, a novel MIL algorithm based on adaptive instance selection. We do this in an alternating optimization framework by intertwining the steps of instance selection and classifier learning in an iterative manner which is guaranteed to converge. Initial instance selection is achieved by a simple yet effective kernel density estimator on the negative instances. Experimental results demonstrate the utility and efficiency of the proposed approach as compared to the state of the art.
INDEX TERMS
Multiple instance learning, support vector machine, feature selection, alternating optimization.
CITATION
Zhouyu Fu, Antonio Robles-Kelly, Jun Zhou, "MILIS: Multiple Instance Learning with Instance Selection", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 5, pp. 958-977, May 2011, doi:10.1109/TPAMI.2010.155
REFERENCES
 [1] T.G. Dietterich, R.H. Lathrop, and T. Lozano-Perez, “Solving the Multiple-Instance Problem with Axis-Parallel Rectangles,” Artificial Intelligence, vol. 89, pp. 31-71, 1997. [2] S. Ray and M. Craven, “Supervised versus Multiple Instance Learning: An Empirical Comparison,” Proc. Int'l Conf. Machine Learning, pp. 697-704, 2005. [3] Y. Chen and J.Z. Wang, “Image Categorization by Learning and Reasoning with Regions,” J. Machine Learning Research, vol. 5, pp. 913-939, 2004. [4] Y. Chen, J. Bi, and J. Wang, “MILES: Multiple-Instance Learning via Embedded Instance Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006. [5] O. Maron and T. Lozano-Perez, “A Framework for Multiple-Instance Learning,” Proc. Conf. Advances in Neural Information Processing Systems, pp. 570-576, 1998. [6] Q. Zhang and S. Goldman, “Em-DD: An Improved Multiple-Instance Learning Technique,” Proc. Conf. Advances in Neural Information Processing Systems, pp. 1073-1080, 2002. [7] R. Rahmani, S.A. Goldman, H. Zhang, S.R. Cholleti, and J.E. Fritts, “Localized Content Based Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1902-2002, Nov. 2008. [8] J. Wang and J.D. Zucker, “Solving the Multiple-Instance Problem: A Lazy Learning Approach,” Proc. Int'l Conf. Machine Learning, pp. 1119-1125, 2000. [9] B.E. Boser, I. Guyon, and V. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. ACM Conf. Computational Learning Theory, pp. 144-152, 1992. [10] T. Gartner, A. Flach, A. Kowalczyk, and A.J. Smola, “Multi-Instance Kernels,” Proc. Int'l Conf. Machine Learning, pp. 179-186, 2002. [11] S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support Vector Machines for Multiple-Instance Learning,” Proc. Conf. Advances in Neural Information Processing Systems, pp. 561-568, 2003. [12] P.-M. Cheung and J.T. Kwok, “A Regularization Framework for Multiple-Instance Learning,” Proc. Int'l Conf. Machine Learning, pp. 193-200, 2006. [13] Z.-H. Zhou and J.-M. Xu, “On the Relation between Multi-Instance Learning and Semi-Supervised Learning,” Proc. Int'l Conf. Machine Learning, pp. 1167-1174, 2007. [14] O.L. Mangasarian and E.W. Wild, “Multiple Instance Classification via Successive Linear Programming,” J. Optimization Theory and Applications, vol. 137, no. 3, pp. 555-568, 2008. [15] P. Viola, J.C. Platt, and C. Zhang, “Multiple Instance Boosting for Object Detection,” Proc. Conf. Neural Information Processing Systems, 2005. [16] S.R. Cholleti, S.A. Goldman, and R. Rahmani, “Mi-Winnow: A New Multiple-Instance Learning Algorithm,” Proc. Int'l Conf. Tools with Artificial Intelligence, pp. 336-346, 2006. [17] Z. Fu and A. Robles-Kelly, “Fast Multiple Instance Learning via $l_{1,2}$ Logistic Regression,” Proc. Int'l Conf. Pattern Recognition, 2008. [18] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, “1-Norm Support Vector Machines,” Proc. Conf. Neural Information Processing Systems, 2003. [19] R.O. Duda and P.E. Hart, Pattern Classification. Wiley, 2000. [20] S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A.Y. Wu, “An Optimal Algorithm for Approximate Nearest Neighbor Searching Fixed Dimensions,” J. ACM, vol. 45, no. 6, pp. 891-923, 1998. [21] J. Platt, “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods,” Advances in Large Margin Classifiers, pp. 61-74, MIT Press, 2000. [22] O. Chapelle, “Training a Support Vector Machine in the Primal,” Neural Computation, vol. 19, pp. 1155-1178, 2007. [23] T. Joachims, “Training Linear SVMs in Linear Time,” Proc. ACM Conf. Knowledge Discovery and Data Mining, 2006. [24] R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin, “Liblinear: A Library for Large Linear Classification,” J. Machine Learning Research, vol. 9, pp. 1871-1874, 2008. [25] A. Dempster, N. Laird, and D. Rubin, “Maximum-Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. Series B (Methodological), vol. 39, pp. 1-38, 1977. [26] T. Minka, “Expectation-Maximization as Lower Bound Maximization,” 1998. [27] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, nos. 1-3, pp. 389-422, 2002. [28] T. Hastie and R. Tibshirani, “Classification by Pairwise Coupling,” Proc. Conf. Advances in Neural Information Processing Systems, vol. 10, 1998. [29] R. Rifkin and A. Klautau, “In Defense of One-vs-All Classification,” J. Machine Learning Research, vol. 5, pp. 101-141, 2004. [30] K. Crammer and Y. Singer, “On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines,” J. Machine Learning Research, vol. 2, pp. 265-292, 2001. [31] L. Bottou and O. Bousquet, “The Tradeoffs of Large Scale Learning,” Proc. Conf. Advances in Neural Information Processing Systems, vol. 20, pp. 161-168, 2008. [32] L. Fei-Fei, R. Fergus, and P. Perona, “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories,” Proc. IEEE CVPR Workshop Generative-Model Based Vision, 2004. [33] K. Grauman and T. Darrell, “The Pyramid Match Kernel: Efficient Learning with Sets of Features,” J. Machine Learning Research, vol. 8, pp. 725-760, 2007. [34] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006. [35] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study,” Int'l J. Computer Vision, vol. 73, no. 2, pp. 213-238, 2007. [36] A. Bosch, A. Zisserman, and X. Munoz, “Image Classification Using ROIs and Multiple Kernel Learning,” Int'l J. Computer Vision, 2008. [37] M. Varma and D. Ray, “Learning the Discriminative Power-Invariance Trade-Off,” Proc. Int'l Conf. Computer Vision, 2007.