This Article 
 Bibliographic References 
 Add to: 
Kernels for Generalized Multiple-Instance Learning
December 2008 (vol. 30 no. 12)
pp. 2084-2098
Qingping Tao, GC Image, LLC, Lincoln
Stephen D. Scott, University of Nebraska, Lincoln
N. V. Vinodchandran, University of Nebraska, Lincoln
Thomas Takeo Osugi, Sphere Communications, Lincolnshire
Brandon Mueller, Gallup, Inc., Omaha
The multiple-instance learning (MIL) model has been successful in numerous application areas. Recently, a generalization of this model and an algorithm for it were introduced, showing significant advantages over the conventional MIL model on certain application areas. Unfortunately, that algorithm is not scalable to high dimensions. We adapt that algorithm to one using a support vector machine with our new kernel k\wedge. This reduces the time complexity from exponential in the dimension to polynomial. Computing our new kernel is equivalent to counting the number of boxes in a discrete, bounded space that contain at least one point from each of two multisets. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) for it. We then extend k\wedge by enriching its representation into a new kernel kmin, and also consider a normalized version of k\wedge that we call k\wedge/\vee (which may or may not not be a kernel, but whose approximation yielded positive semidefinite Gram matrices in practice). We then empirically evaluate all three measures on data from content-based image retrieval, biological sequence analysis, and the musk data sets. We found that our kernels performed well on all data sets relative to algorithms in the conventional MIL model.

[1] Q. Tao , S. Scott , N.V. Vinodchandran , and T. Osugi , “SVM-Based Generalized Multiple-Instance Learning via Approximate Box Counting,” Proc. 21st Int'l Conf. Machine Learning, pp. 799-806, 2004.
[2] Q. Tao , S. Scott , N.V. Vinodchandran , T. Osugi , and B. Mueller , “An Extended Kernel for Generalized Multiple-Instance Learning,” Proc. 16th IEEE Int'l Conf. Tools with Artificial Intelligence, pp.272-277, 2004.
[3] T.G. Dietterich , R.H. Lathrop , and T. Lozano-Perez , “Solving the Multiple-Instance Problem with Axis-Parallel Rectangles,” Artificial Intelligence, vol. 89, nos. 1-2, pp. 31-71, 1997.
[4] O. Maron and A.L. Ratan , “Multiple-Instance Learning for Natural Scene Classification,” Proc. 15th Int'l Conf. Machine Learning, pp. 341-349, 1998.
[5] Q. Zhang , S.A. Goldman , W. Yu , and J.E. Fritts , “Content-Based Image Retrieval Using Multiple-Instance Learning,” Proc. 19th Int'l Conf. Machine Learning, pp. 682-689, 2002.
[6] Y. Chen and J.Z. Wang , “Image Categorization by Learning and Reasoning with Regions,” J. Machine Learning Research, vol. 5, pp.913-939, Aug. 2004.
[7] Z. Zhou , M. Zhang , and K. Chen , “A Novel Bag Generator for Image Database Retrieval with Multi-Instance Learning Techniques,” Proc. 15th IEEE Int'l Conf. Tools with Artificial Intelligence, pp. 565-569, 2003.
[8] C. Yang and T. Lozano-Pérez , “Image Database Retrieval with Multiple-Instance Learning Techniques,” Proc. 16th Int'l Conf. Data Eng., pp. 233-243, 2000.
[9] S. Scott , J. Zhang , and J. Brown , “On Generalized Multiple-Instance Learning,” Int'l J. Computational Intelligence and Applications, vol. 5, no. 1, pp. 21-35, Mar. 2005.
[10] S.A. Goldman , S.K. Kwek , and S.D. Scott , “Agnostic Learning of Geometric Patterns,” J. Computer and System Sciences, vol. 6, no. 1, pp. 123-151, Feb. 2001.
[11] N. Littlestone , “Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm,” Machine Learning, vol. 2, no. 4, pp. 285-318, 1988.
[12] Q. Tao and S. Scott , “A Faster Algorithm for Generalized Multiple-Instance Learning,” Proc. 17th Int'l Florida Artificial Intelligence Research Soc. Conf., pp. 550-555, 2004.
[13] B. Schölkopf , Support Vector Learning. R. Oldenbourg Verlag, 1997.
[14] T. Haasdonk , “Feature Space Interpretation of SVMs with Indefinite Kernels,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 482-492, Apr. 2005.
[15] N. Weidmann , E. Frank , and B. Pfahringer , “A Two-Level Learning Method for Generalized Multi-Instance Problems,” Proc. European Conf. Machine Learning, pp. 468-479, 2003.
[16] S. Andrews , I. Tsochantaridis , and T. Hofmann , “Support Vector Machines for Multiple-Instance Learning,” Advances in Neural Information Processing Systems, vol. 15, pp. 561-568, 2002.
[17] C. Papadimitriou , Computational Complexity. Addison-Wesley, 1994.
[18] D. Du and K. Ko , Theory of Computational Complexity. John Wiley & Sons, 2000.
[19] P. Auer , “On Learning from Multi-Instance Examples: Empirical Evaluation of a Theoretical Approach,” Proc. 14th Int'l Conf. Machine Learning, pp. 21-29, 1997.
[20] O. Maron and T. Lozano-Pérez , “A Framework for Multiple-Instance Learning,” Advances in Neural Information Processing Systems, vol. 10, pp. 570-576, 1998.
[21] P.M. Long and L. Tan , “PAC Learning Axis-Aligned Rectangles with Respect to Product Distributions from Multiple-Instance Examples,” Machine Learning, vol. 30, pp. 7-21, 1998.
[22] A. Blum and A. Kalai , “A Note on Learning from Multiple-Instance Examples,” Machine Learning, vol. 30, pp. 23-29, 1998.
[23] J. Wang and J.-D. Zucker , “Solving the Multiple-Instance Problem: A Lazy Learning Approach,” Proc. 17th Int'l Conf. Machine Learning, pp. 1119-1125, 2000.
[24] Q. Zhang and S.A. Goldman , “EM-DD: An Improved Multiple-Instance Learning Technique,” Neural Information Processing Systems, vol. 14, pp. 1073-1080, 2001.
[25] J. Ramon and L. de Raedt , “Multi Instance Neural Networks,” Proc. ICML Workshop Attribute-Value and Relational Learning, 2000.
[26] P. Auer , P.M. Long , and A. Srinivasan , “Approximating Hyper-Rectangles: Learning and Pseudo-Random Sets,” Proc. 29th Ann. ACM Symp. Theory of Computing, pp. 314-323, 1997.
[27] H. Blockeel , D. Page , and A. Srinivasan , “Multi-Instance Tree Learning,” Proc. 22nd Int'l Conf. Machine Learning, pp. 57-64, 2005.
[28] P. Auer and R. Ortner , “A Boosting Approach to Multiple Instance Learning,” Proc. 15th European Conf. Machine Learning, pp. 63-74, 2004.
[29] T. Gärtner , P.A. Flach , A. Kowalczyk , and A.J. Smola , “Multi-Instance Kernels,” Proc. 19th Int'l Conf. Machine Learning, pp. 179-186, 2002.
[30] D.R. Dooly , Q. Zhang , S.A. Goldman , and R.A. Amar , “Multiple-Instance Learning of Real-Valued Data,” J. Machine Learning Research, vol. 3, pp. 651-678, Dec. 2002.
[31] S. Ray and D. Page , “Multiple Instance Regression,” Proc. 18th Int'l Conf. Machine Learning, pp. 425-432, 2001.
[32] S. Ray and M. Craven , “Supervised versus Multiple-Instance Learning: An Empirical Comparison,” Proc. 22nd Int'l Conf. Machine Learning, pp. 697-704, 2005.
[33] L. De Raedt , “Attribute-Value Learning versus Inductive Logic Programming: The Missing Links,” Proc. Eighth Int'l Conf. Inductive Logic Programming, pp. 1-8, 1998.
[34] Y. Chen , J. Bi , and J.Z. Wang , “MILES: Multiple-Instance Learning via Embedded Instance Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006.
[35] W. Maass and M.K. Warmuth , “Efficient Learning with Virtual Threshold Gates,” Information and Computation, vol. 141, no. 1, pp.66-83, 1998.
[36] N. Littlestone , “Redundant Noisy Attributes, Attribute Errors, and Linear Threshold Learning Using Winnow,” Proc. Fourth Ann. Workshop Computational Learning Theory, pp. 147-156, 1991.
[37] F. Rosenblatt , “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychological Rev., vol. 65, pp. 386-407, 1958 (reprinted in Neurocomputing (MIT Press, 1988)).
[38] V. Vapnik , Statistical Learning Theory, John Wiley & Sons, 1998.
[39] N. Cristianini and J. Shawe-Taylor , An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[40] B. Schölkopf and A.J. Smola , Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[41] M. Warmuth and S.V.N. Vishwanathan , “Leaving the Span,” Proc. 18th Ann. Conf. Learning Theory, pp. 366-381, 2005.
[42] R. Khardon , D. Roth , and R. Servedio , “Efficiency versus Convergence of Boolean Kernels for Online Learning Algorithms,” J. Artificial Intelligence Research, vol. 24, pp. 341-356, Sept. 2005.
[43] R. Khardon and R. Servedio , “Maximum Margin Algorithms with Boolean Kernels,” J. Machine Learning Research, vol. 6, pp. 1405-1429, 2005.
[44] T. Zhang , “Regularized Winnow Methods,” Advances in Neural Information Processing Systems, pp. 703-709, 2000.
[45] E. Takimoto and M.K. Warmuth , “Path Kernels and Multiplicative Updates,” J. Machine Learning Research, vol. 4, pp. 773-818, 2003.
[46] L.G. Valiant , “The Complexity of Enumeration and Reliability Problems,” SIAM J. Computing, vol. 8, pp. 410-421, 1979.
[47] R. Karp , M. Luby , and N. Madras , “Monte-Carlo Approximation Algorithms for Enumeration Problems,” J. Algorithms, vol. 10, pp.429-448, 1989.
[48] B. Schölkopf , J. Weston , E. Eskin , C. Leslie , and W.S. Noble , “A Kernel Approach for Learning from Almost Orthogonal Patterns,” Proc. 13th European Conf. Machine Learning, pp. 511-528, 2002.
[49] T. Joachims , “Making Large-Scale SVM Learning Practical,” Advances in Kernel Methods: Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola, eds., chapter 11, pp. 169-184, MIT Press, 1999.
[50] I. Daubechies , “Orthonormal Bases of Compactly Supported Wavelets,” Comm. Pure and Applied Math., vol. 41, pp. 909-996, 1988.
[51] J.A. Hartigan and M.A. Wong , “Algorithm AS136: A K-Means Clustering Algorithm,” Applied Statistics, vol. 28, pp. 100-108, 1979.
[52] J.Z. Wang , J. Li , and G. Wiederhold , “SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, Sept. 2001.
[53] C. Wang , S. Scott , J. Zhang , Q. Tao , D.E. Fomenko , and V.N. Gladyshev , “A Study in Modeling Low-Conservation Protein Superfamilies,” Technical Report TR-UNL-CSE-2004-3, Dept. of Computer Science, Univ. of Nebraska, 2004.
[54] J. Kim , E.N. Moriyama , C.G. Warr , P.J. Clyne , and J.R. Carlson , “Identification of Novel Multi-Transmembrane Proteins from Genomic Databases Using Quasi-Periodic Structural Properties,” Bioinformatics, vol. 16, no. 9, pp. 767-775, 2000.
[55] D.M. Engelman , T.A. Steitz , and A. Goldman , “Identifying Non-Polar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins,” Ann. Rev. Biophysics and Biophysical Chemistry, vol. 15, pp. 321-353, 1986.
[56] G.V. Heijne , “Membrane Protein Structure Prediction: Hydrophobicity Analysis and the Positive-Inside Rule,” J. Molecular Biology, vol. 225, pp. 487-494, 1992.
[57] T. Brown , Molecular Biology Labfax, second ed. Academic Press, 1998.
[58] J. Kyte and R.F. Doolittle , “A Simple Method for Displaying the Hydropathic Character of a Protein,” J. Molecular Biology, vol. 157, pp. 105-132, 1982.
[59] G. Deleage and B. Roux , “An Algorithm for Protein Secondary Structure Prediction Based on Class Prediction,” Protein Eng., vol. 1, pp. 289-294, 1987.

Index Terms:
Machine learning, kernels, support vector machines, generalized multiple-instance learning
Qingping Tao, Stephen D. Scott, N. V. Vinodchandran, Thomas Takeo Osugi, Brandon Mueller, "Kernels for Generalized Multiple-Instance Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2084-2098, Dec. 2008, doi:10.1109/TPAMI.2007.70846
Usage of this product signifies your acceptance of the Terms of Use.