This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
MILES: Multiple-Instance Learning via Embedded Instance Selection
December 2006 (vol. 28 no. 12)
pp. 1931-1947
Yixin Chen, IEEE
Jinbo Bi, IEEE
Multiple-instance problems arise from the situations where training class labels are attached to sets of samples (named bags), instead of individual samples within each bag (called instances). Most previous multiple-instance learning (MIL) algorithms are developed based on the assumption that a bag is positive if and only if at least one of its instances is positive. Although the assumption works well in a drug activity prediction problem, it is rather restrictive for other applications, especially those in the computer vision area. We propose a learning method, MILES (Multiple-Instance Learning via Embedded instance Selection), which converts the multiple-instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity measure. This feature mapping often provides a large number of redundant or irrelevant features. Hence, 1-norm SVM is applied to select important features as well as construct classifiers simultaneously. We have performed extensive experiments. In comparison with other methods, MILES demonstrates competitive classification accuracy, high computation efficiency, and robustness to labeling uncertainty.

[1] S. Agarwal and D. Roth, “Learning a Sparse Representation for Object Detection,” Proc. Seventh European Conf. Computer Vision, vol. 4, pp. 113-130, 2002.
[2] S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support Vector Machines for Multiple-Instance Learning,” Advances in Neural Information Processing Systems 15, pp. 561-568, 2003.
[3] S. Andrews and T. Hofmann, “Multiple-Instance Learning via Disjunctive Programming Boosting,” Advances in Neural Information Processing Systems 16, pp. 65-72, 2004.
[4] P. Auer, “On Learning from Mult-Instance Examples: Empirical Evaluation of a Theoretical Approach,” Proc. 14th Int'l Conf. Machine Learning, pp. 21-29, 1997.
[5] A. Bar-Hillel, T. Hertz, and D. Weinshall, “Object Class Recognition by Boosting a Part-Based Model,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 702-709, 2005.
[6] K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D.M. Blei, and M.I. Jordan, “Matching Words and Pictures,” J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[7] K.P. Bennett, “Combining Support Vector and Mathematical Programming Methods for Classification,” Advances in Kernel Methods-Support Vector Machines, B. Schölkopf, C. Burges, and A.Smola, eds., pp. 307-326, 1999.
[8] J. Bi, K.P. Bennett, M. Embrechts, C. Breneman, and M. Song, “Dimensionality Reduction via Sparse Support Vector Machines,” J. Machine Learning Research, vol. 3, pp. 1229-1243, 2003.
[9] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu~mlearn/, 1998.
[10] A. Blum, and A. Kalai, “A Note on Learning from Multiple-Instance Examples,” Machine Learning, vol. 30, no. 1, pp. 23-29, 1998.
[11] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp.123-140, 1996.
[12] M. Bressan and J. Vitria, “On the Selection and Classification of Independent Features,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1312-1317, Oct. 2003.
[13] B.G. Buchanan and T.M. Mitchell, “Model-Directed Learning of Production Rules,” Pattern-Directed Inference Systems, pp. 297-312, Academic Press, 1978.
[14] S.S. Chen, D.L. Donoho, and M.A. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computing, vol. 20, no. 1, pp. 33-61, 1998.
[15] Y. Chen and J.Z. Wang, “A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1252-1267, Sept. 2002.
[16] Y. Chen and J.Z. Wang, “Image Categorization by Learning and Reasoning with Regions,” J. Machine Learning Research, vol. 5, pp.913-939, 2004.
[17] G. Csurka, C. Bray, C. Dance, and L. Fan, “Visual Categorization with Bags of Keypoints,” Proc. ECCV '04 Workshop Statistical Learning in Computer Vision, pp. 59-74, 2004.
[18] L. De Raedt, “Attribute-Value Learning versus Inductive Logic Programming: The Missing Links,” Lecture Notes in Artificial Intelligence, vol. 1446, pp. 1-8, 1998.
[19] T.G. Dietterich, R.H. Lathrop, and T. Lozano-Pérez, “Solving the Multiple Instance Problem with Axis-Parallel Rectangles,” Artificial Intelligence, vol. 89, nos. 1-2, pp. 31-71, 1997.
[20] G. Dorkó and C. Schmid, “Selection of Scale-Invariant Parts for Object Class Recognition,” Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 634-639, 2003.
[21] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 264-271, 2003.
[22] T. Gärtner, A. Flach, A. Kowalczyk, and A.J. Smola, “Multi-Instance Kernels,” Proc. 19th Int'l Conf. Machine Learning, pp. 179-186, 2002.
[23] F.J. Iannarilli,Jr. and P.A. Rubin, “Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 6, pp.779-783, June 2003.
[24] ILOG, ILOG CPLEX 6.5 Reference Manual, ILOG CPLEX Division, Incline Village, NV, 1999.
[25] A. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.
[26] T. Kadir and M. Brady, “Scale, Saliency and Image Description,” Int'l J. Computer Vision, vol. 45, no. 2, pp. 83-105, 2001.
[27] T. Kadir, A. Zisserman, and M. Brady, “An Affine Invariant Salient Region Detector,” Proc. Eighth European Conf. Computer Vision, pp. 404-416, 2004.
[28] B. Krishnapuram, A.J. Hartemink, L. Carin, and M.A.T. Figueiredo, “A Bayesian Approach to Joint Feature Selection and Classifier Design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1105-1111, Sept. 2004.
[29] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[30] N. Kwak and C.-H. Choi, “Input Feature Selection by Mutual Information Based on Parzen Window,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1667-1671, Dec. 2002.
[31] M.H.C. Law, M.A.T. Figueiredo, and A.K. Jain, “Simultaneous Feature Selection and Clustering Using Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp.1154-1166, Sept. 2004.
[32] J. Li and J.Z. Wang, “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1075-1088, Sept. 2003.
[33] P.M. Long and L. Tan, “PAC Learning Axis-Aligned Rectangles with Respect to Product Distribution from Multiple-Instance Examples,” Machine Learning, vol. 30, no. 1, pp. 7-21, 1998.
[34] B.S. Manjunath and W.Y. Ma, “Texture Features for Browsing and Retrieval of Image Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837-842, Aug. 1996.
[35] O. Maron and T. Lozano-Pérez, “A Framework for Multiple-Instance Learning,” Advances in Neural Information Processing Systems 10, pp. 570-576, 1998.
[36] O. Maron, “Learning from Ambiguity,” Dept. of Electrical and Computer Science, Massachusetts Inst. of Technology, Cambridge, 1998.
[37] O. Maron and A.L. Ratan, “Multiple-Instance Learning for Natural Scene Classification,” Proc. 15th Int'l Conf. Machine Learning, pp. 341-349, 1998.
[38] K. Mikolajczyk and C. Schmid, “Scale & Affine Invariant Interest Point Detectors,” Int'l J. Computer Vision, vol. 60, no. 1, pp. 63-86, 2004.
[39] P. Mitra, C.A. Murthy, and S.K. Pal, “Unsupervised Feature Selection Using Feature Similarity,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301-312, Mar. 2002.
[40] A. Mohan, C. Papageorgiou, and T. Poggio, “Example-Based Object Detection in Images by Components,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 4, pp. 349-361, Apr. 2001.
[41] J. Novovicova, P. Pudil, and J. Kittler, “Divergence Based Feature Selection for Multimodal Class Densities,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 2, pp. 218-223, Feb. 1996.
[42] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer, “Weak Hypotheses and Boosting for Generic Object Detection and Recognition,” Proc. Eighth European Conf. Computer Vision, vol. 2, pp. 71-84, 2004.
[43] J. Ramon and L. De Raedt, “Multi Instance Neural Networks,” Proc. ICML-2000 Workshop Attribute-Value and Relational Learning, 2000.
[44] S. Ray and M. Craven, “Supervised versus Multiple Instance Learning: An Empirical Comparison,” Proc. 22nd Int'l Conf. Machine Learning, pp. 697-704, 2005.
[45] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, “3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 272-277, 2003.
[46] G. Ruffo, “Learning Single and Multiple Decision Trees for Security Applications,” PhD Dissertation, Dept. of Computer Science, Univ. of Turin, Italy, 2000.
[47] S.D. Scott, J. Zhang, and J. Brown, “On Generalized Multiple-Instance Learning,” Int'l J. Computational Intelligence and Applications, vol. 5, no. 1, pp. 21-35, 2005.
[48] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, “Discovering Object Categories in Image Collections,” Proc. Int'l Conf. Computer Vision, vol. I, pp. 370-377, 2005.
[49] A.J. Smola, B. Schölkopf, and G. Gätsch, “Linear Programs for Automatic Accuracy Control in Regression,” Proc. Int'l Conf. Artificial Neural Networks, pp. 575-580, 1999.
[50] P. Somol, P. Pudil, and J. Kittler, “Fast Branch & Bound Algorithms for Optimal Feature Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 7, pp. 900-912, July 2004.
[51] R. Tibshirani, “Regression Shrinkage and Selection via the LASSO,” J. Royal Statistical Soc., Series B, vol. 58, pp. 267-288, 1996.
[52] T. Tuytelaars and L. Van Gool, “Matching Widely Separated Views Based on Affine Invariant Regions,” Int'l J. Computer Vision, vol. 59, no. 1, pp. 61-85, 2004.
[53] J. Wang and J.-D. Zucker, “Solving the Multiple-Instance Problem: A Lazy Learning Approach,” Proc. 17th Int'l Conf. Machine Learning, pp. 1119-1125, 2000.
[54] J.Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, Sept. 2001.
[55] N. Weidmann, E. Frank, and B. Pfahringer, “A Two-Level Learning Method for Generalized Multi-Instance Problems,” Proc. European Conf. Machine Learning, pp. 468-479, 2003.
[56] C. Yang and T. Lozano-Pérez, “Image Database Retrieval with Multiple-Instance Learning Techniques,” Proc. IEEE Int'l Conf. Data Eng., pp. 233-243, 2000.
[57] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy,” J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[58] X. Xu and E. Frank, “Logistic Regression and Boosting for Labeled Bags of Instances,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 272-281, 2004.
[59] M.-L. Zhang and Z.-H. Zhou, “Improve Multi-Instance Neural Networks through Feature Selection,” Neural Processing Letters, vol. 19, no. 1, pp. 1-10, 2004.
[60] Q. Zhang, S.A. Goldman, W. Yu, and J. Fritts, “Content-Based Image Retrieval Using Multiple-Instance Learning,” Proc. 19th Int'l Conf. Machine Learning, pp. 682-689, 2002.
[61] Q. Zhang and S.A. Goldman, “EM-DD: An Improved Multiple-Instance Learning Technique,” Advances in Neural Information Processing Systems 14, pp. 1073-1080, 2002.
[62] Y. Zhang, H. Zha, C. Chu, and X. Ji, “Towards Inferring Protein Interactions: Challenges and Solutions,” EURASIP J. Applied Signal Processing, special issue on advanced signal/image processing techniques for bioinformatics, 2006.
[63] Z.-H. Zhou and M.-L. Zhang, “Ensembles of Multi-Instance Learners,” Lecture Notes in Artificial Intelligence, vol. 2837, pp.492-502, 2003.
[64] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, “1-Norm Support Vector Machines,” Advances in Neural Information Processing Systems 16, pp. 49-56, 2004.
[65] J.-D. Zucker and Y. Chevaleyre, “Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets, Application to the Mutagenesis Problem,” Lecture Notes in Artificial Intelligence, vol. 2056, pp. 204-214, 2001.

Index Terms:
Multiple-instance learning, feature subset selection, 1-norm support vector machine, image categorization, object recognition, drug activity prediction.
Citation:
Yixin Chen, Jinbo Bi, James Z. Wang, "MILES: Multiple-Instance Learning via Embedded Instance Selection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006, doi:10.1109/TPAMI.2006.248
Usage of this product signifies your acceptance of the Terms of Use.