This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning
January/February 2012 (vol. 9 no. 1)
pp. 98-112
Ying-Xin Li, Nanjing University, Nanjing
Shuiwang Ji, Old Dominion University, Norfolk
Sudhir Kumar, Arizona State University, Tempe
Jieping Ye, Arizona State University, Tempe
Zhi-Hua Zhou, Nanjing University, Nanjing
In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.

[1] B. Bakker and T. Heskes, “Task Clustering and Gating for Bayesian Multitask Learning,” J. Machine Learning Research, vol. 4, pp. 83-99, 2003.
[2] M.R. Boutell, J. Luo, X. Shen, and C.M. Brown, “Learning Multi-Label Scene Classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757-1771, 2004.
[3] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[4] G. Carneiro, A.B. Chan, P.J. Moreno, and N. Vasconcelos, “Supervised Learning of Semantic Classes for Image Annotation and Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 394-410, Mar. 2007.
[5] C.-C. Chang and C.-J. Lin, “LIBSVM : A Library for Support Vector Machines,” http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.
[6] F.R.K. Chung, Spectral Graph Theory. Am. Math. Soc. Press, 1997.
[7] C. Cortes and V. Vapnik, “Support Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[8] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[9] S. Daskalaki, I. Kopanas, and N.M. Avouris, “Evaluation of Classifiers for an Uneven Class Distribution Problem,” Applied Artificial Intelligence, vol. 20, no. 5, pp. 381-417, 2006.
[10] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, Inc., 2001.
[11] T. Evgeniou, C.A. Micchelli, and M. Pontil, “Learning Multiple Tasks with Kernel Methods,” J. Machine Learning Research, vol. 6, pp. 615-637, 2005.
[12] T. Evgeniou and M. Pontil, “Regularized Multi-Task Learning,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 109-117, 2004.
[13] T. Gärtner, P.A. Flach, A. Kowalczyk, and A.J. Smola, “Multi-Instance Kernels,” Proc. 19th Int'l Conf. Machine Learning, pp. 179-186, 2002.
[14] E. Glory and R.F. Murphy, “Automated Subcellular Location Determination and High-Throughput Microscopy,” Developmental Cell, vol. 12, no. 1, pp. 7-14, 2007.
[15] K. Grauman and T. Darrell, “Approximate Correspondences in High Dimensions,” Advances in Neural Information Processing Systems 19, B. Schölkopf, J. Platt, and T. Hofmann, eds., pp. 505-512, MIT Press, 2007.
[16] M. Gribskov and N.L. Robinson, “Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching,” Computers and Chemistry, vol. 20, no. 1, pp. 25-33, 1996.
[17] R. Gurunathan, B.V. Emden, S. Panchanathan, and S. Kumar, “Identifying Spatially Similar Gene Expression Patterns in Early Stage Fruit Fly Embryo Images: Binary Feature versus Invariant Moment Digital Representations,” BMC Bioinformatics, vol. 5, article no. 202, 2004.
[18] D.R. Hardoon, S. Szedmák, and J. Shawe-Taylor, “Canonical Correlation Analysis: An Overview with Application to Learning Methods,” Neural Computation, vol. 16, no. 12, pp. 2639-2664, 2004.
[19] D. Haussler, “Convolution Kernels on Discrete Structures,” Technical Report UCSC-CRL-99-10, Dept. of Computer Science, Univ. of California at Santa Cruz, Santa Cruz, CA, July 1999.
[20] K. Huang and R.F. Murphy, “From Quantitative Microscopy to Automated Image Understanding,” J. Biomedical Optics, vol. 9, no. 5, pp. 893-912, 2004.
[21] K.S. Imai, K. Hino, K. Yagi, N. Satoh, and Y. Satou, “Gene Expression Profiles of Transcription Factors and Signaling Molecules in the Ascidian Embryos: Towards a Comprehensive Understanding of Gene Networks,” Development, vol. 131, no. 16, pp. 4047-4058, 2004.
[22] S. Ji, Y.-X. Li, Z.-H. Zhou, S. Kumar, and J. Ye, “A Bag-of-Words Approach for Drosophila Gene Expression Pattern Annotation,” BMC Bioinformatics, vol. 10, article no. 119, 2009.
[23] S. Ji, L. Sun, R. Jin, S. Kumar, and J. Ye, “Automated Annotation of Drosophila Gene Expression Patterns Using a Controlled Vocabulary,” Bioinformatics, vol. 24, no. 17, pp. 1881-1888, 2008.
[24] S. Ji, L. Yuan, Y.-X. Li, Z.-H. Zhou, S. Kumar, and J. Ye, “Drosophila Gene Expression Pattern Annotation Using Sparse Features and Term-Term Interactions,” Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 407-416, 2009.
[25] Y.-G. Jiang, C.-W. Ngo, and J. Yang, “Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval,” Proc. Sixth ACM Int'l Conf. Image and Video Retrieval, pp. 494-501, 2007.
[26] T. Joachims, “Making Large-Scale SVM Learning Practical,” Advances in Kernel Methods—Support Vector Learning, B. Schölkopf, C.J.C. Burges, and A.J. Smola, eds., pp. 41-56, MIT Press, 1998.
[27] S. Kumar, K. Jayaramanc, S. Panchanathan, R. Gurunatha, A. Marti-Subirana, and S.J. Newfeld, “Best: A Novel Computational Approach for Comparing Gene Expression Patterns from Early Stages of Drosophlia Melanogaster Development,” Genetics, vol. 162, no. 4, pp. 2037-2047, 2002.
[28] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2169-2178, 2006.
[29] E.S. Lein et al, “Genome-Wide Atlas of Gene Expression in the Adult Mouse Brain,” Nature, vol. 445, no. 7124, pp. 168-176, 2006.
[30] Y.-X. Li, S. Ji, S. Kumar, J. Ye, and Z.-H. Zhou, “Drosophila Gene Expression Pattern Annotation Through Multi-Instance Multi-Label Learning,” Proc. 21st Int'l Joint Conf. Artificial Intelligence, pp. 1445-1450, 2009.
[31] Y. Liu, R. Jin, and L. Yang, “Semi-Supervised Multi-Label Learning by Constrained Non-Negative Matrix Factorization,” Proc. 21st Nat'l Conf. Artificial Intelligence, pp. 421-426, 2006.
[32] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[33] K. Mikolajczyk and C. Schmid, “A Performance Evaluation of Local Descriptors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.
[34] R.F. Murphy, M. Velliste, and G. Porreca, “Robust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope,” J. Very Large Scale Integration Signal Processing, vol. 35, no. 3, pp. 311-321, 2003.
[35] H. Peng, “Bioimage Informatics: A New Area of Engineering Biology,” Bioinformatics, vol. 24, no. 17, pp. 1827-1836, 2008.
[36] F. Provost and T. Fawcett, “Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions,” Proc. Third ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-48, 1997.
[37] Drosophila: A Practical Approach, D.B. Roberts, ed. Oxford IRL Press, 1998.
[38] J. Sivic and A. Zisserman, “Efficient Visual Search of Videos Cast as Text Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 591-606, Apr. 2009.
[39] J. Sprague et al., “The Zebrafish Information Network: The Zebrafish Model Organism Database,” Nucleic Acids Research, vol. 34, suppl 1, pp. D581-D585, 2006.
[40] O. Tassy et al., “The ANISEED Database: Digital Representation, Formalization, and Elucidation of a Chordate Developmental Program,” Genome Research, vol. 20, no. 10, pp. 1459-1468, 2010.
[41] P. Tomancak, B.P. Berman, A. Beaton, R. Weiszmann, E. Kwan, V. Hartenstein, S.E. Celniker, and G.M. Rubin, “Global Analysis of Patterns of Gene Expression During Drosophila Embryogenesis,” Genome Biology, vol. 8, no. 7, p. R145, 2007.
[42] P. Tomancak et al., “Systematic Determination of Patterns of Gene Expression during Drosophila Embryogenesis,” Genome Biology, vol. 3, no. 12, p. R088, 2002.
[43] N. Ueda and K. Saito, “Parametric Mixture Models for Multi-Labeled Text,” Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, eds., pp. 721-728, MIT Press, 2003.
[44] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[45] G.M. Weiss, “Mining with Rarity: A Unifying Framework,” SIGKDD Explorations, vol. 6, no. 1, pp. 7-19, 2004.
[46] J. Yang, Y.-G. Jiang, A.G. Hauptmann, and C.-W. Ngo, “Evaluating Bag-of-Visual-Words Representations in Scene Classification,” Proc. Ninth ACM SIGMM Int'l Workshop Multimedia Information Retrieval, pp. 197-206, 2007.
[47] J. Zhang, Z. Ghahramani, and Y. Yang, “Flexible Latent Variable Models for Multi-Task Learning,” Machine Learning, vol. 73, no. 3, pp. 221-242, 2008.
[48] M.-L. Zhang and Z.-H. Zhou, “ML-kNN: A Lazy Learning Approach to Multi-Label Learning,” Pattern Recognition, vol. 40, no. 7, pp. 2038-2048, 2007.
[49] Y. Zhang, R. Jin, and Z.-H. Zhou, “Understanding Bag-of-Words Model: A Statistical Framework,” Int'l J. Machine Learning and Cybernetics, vol. 1, no. 1, pp. 43-52, 2010.
[50] Y. Zhang and Z.-H. Zhou, “Multi-Label Dimensionality Reduction via Dependency Maximization,” ACM Trans. Knowledge Discovery from Data, vol. 4, no. 3,article no. 14, Oct. 2010.
[51] T. Zhao, M. Velliste, M.V. Boland, and R.F. Murphy, “Object Type Recognition for Automated Analysis of Protein Subcellular Location,” IEEE Trans. Image Processing, vol. 14, no. 9, pp. 1351-1359, Sept. 2005.
[52] J. Zhou and H. Peng, “Automatic Recognition and Annotation of Gene Expression Patterns of Fly Embryos,” Bioinformatics, vol. 23, no. 5, pp. 589-596, 2007.
[53] Z.-H. Zhou and X.-Y. Liu, “Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 63-77, Jan. 2006.
[54] Z.-H. Zhou and X.-Y. Liu, “On Multi-Class Cost-Sensitive Learning,” Computational Intelligence, vol. 26, no. 3, pp. 232-257, 2010.
[55] Z.-H. Zhou, Y.-Y. Sun, and Y.-F. Li, “Multi-Instance Learning by Treating Instances as Non-i.i.d. Samples,” Proc. 26th Int'l Conf. Machine Learning, pp. 1249-1256, 2009.
[56] Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, and Y.-F. Li, “MIML: A Framework for Learning with Ambiguous Objects,” Computer Research Repository (CoRR), vol. abs/0808.3231, 2008.
[57] Z.-H. Zhou and M.-L. Zhang, “Multi-Instance Multi-Label Learning with Application to Scene Classification,” Advances in Neural Information Processing Systems 19, B. Schölkopf, J. Platt, and T. Hofmann, eds., pp. 1609-1616, MIT Press, 2007.

Index Terms:
Gene expression pattern, image annotation, machine learning, multi-instance multi-label (MIML) learning, support vector machine, Drosophila.
Citation:
Ying-Xin Li, Shuiwang Ji, Sudhir Kumar, Jieping Ye, Zhi-Hua Zhou, "Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 98-112, Jan.-Feb. 2012, doi:10.1109/TCBB.2011.73
Usage of this product signifies your acceptance of the Terms of Use.