The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July-Aug. (2013 vol.10)
pp: 1045-1057
Guoxian Yu , Coll. of Comput. & Inf. Sci., Southwest Univ., Beibei, China
Huzefa Rangwala , Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Carlotta Domeniconi , Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Guoji Zhang , Sch. of Sci., South China Univ. of Technol., Guangzhou, China
Zhiwen Yu , Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
ABSTRACT
High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or nonlinearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multilabel learning methods are also adapted for protein function prediction. We develop a transductive multilabel classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multilabel ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. The TMEC trains a graph-based multilabel classifier on each single data source, and then combines the predictions of the individual classifiers. We use a directed birelational graph to capture the relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of the TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels. The code, data sets used in this paper and supplemental material are available at https://sites.google.com/site/guoxian85/tmec.
INDEX TERMS
proteomics, benchmark testing, biology computing, genomics, benchmark, protein function prediction, multilabel ensemble classification, high throughput experimental techniques, heterogeneous proteomic data sets, heterogeneous genomic data sets, computational annotation, composite kernel, transductive multilabel ensemble classifier, transductive multilabel classifier, TMEC method, TMC method, Proteins, Kernel, Correlation, Bioinformatics, Vectors, IEEE transactions, Computational biology, protein function prediction, Multilabel ensemble classifiers, directed birelational graph
CITATION
Guoxian Yu, Huzefa Rangwala, Carlotta Domeniconi, Guoji Zhang, Zhiwen Yu, "Protein Function Prediction Using Multilabel Ensemble Classification", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 4, pp. 1045-1057, July-Aug. 2013, doi:10.1109/TCBB.2013.111
REFERENCES
[1] G.O. Consortium et al., "Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, no. 1, pp. 25-29, 2000.
[2] R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, no. 1,article 88, 2007.
[3] G. Pandey, V. Kumar, and M. Steinbach, "Computational Approaches for Protein Function Prediction," Technical Report TR 06-028, Dept. of Computer Science and Eng., Univ. of Minnesota, 2006.
[4] W. Noble and A. Ben-Hur, "Integrating Information for Protein Function Prediction," Bioinformatics—From Genomes to Therapies, vol. 3, T. Lengauer, ed., Wiley-VCH, pp. 1297-1314, 2007.
[5] G. Lanckriet, T. De Bie, N. Cristianini, M. Jordan, and W. Noble, "A Statistical Framework for Genomic Data Fusion," Bioinformatics, vol. 20, no. 16, pp. 2626-2635, 2004.
[6] D. Lewis, "Combining Kernels for Classification," PhD dissertation, Columbia Univ., 2006.
[7] C. Leslie, E. Eskin, A. Cohen, J. Weston, and W. Noble, "Mismatch String Kernels for Discriminative Protein Classification," Bioinformatics, vol. 20, no. 4, pp. 467-476, 2004.
[8] P. Pavlidis, J. Weston, J. Cai, and W. Noble, "Learning Gene Functional Classifications from Multiple Data Types," J. Computational Biology, vol. 9, no. 2, pp. 401-411, 2002.
[9] S. Mostafavi and Q. Morris, "Fast Integration of Heterogeneous Data Sources for Predicting Gene Function with Limited Annotation," Bioinformatics, vol. 26, no. 14, pp. 1759-1765, 2010.
[10] K. Tsuda, H. Shin, and B. Schölkopf, "Fast Protein Classification with Multiple Networks," Bioinformatics, vol. 21, no. suppl. 2, pp. 59-65, 2005.
[11] M. Re and G. Valentini, "Ensemble Based Data Fusion for Gene Function Prediction," Proc. Eighth Int'l Workshop Multiple Classifier Systems, pp. 448-457, 2009.
[12] O. Chapelle et al., Semi-Supervised Learning, vol. 2, MIT Press, 2006.
[13] H. Shin, K. Tsuda, and B. Schölkopf, "Protein Functional Class Prediction with a Combined Graph," Expert Systems with Applications, vol. 36, no. 2, pp. 3284-3292, 2009.
[14] J. Weston, C. Leslie, E. Ie, D. Zhou, A. Elisseeff, and W. Noble, "Semi-Supervised Protein Classification Using Cluster Kernels," Bioinformatics, vol. 21, no. 15, pp. 3241-3247, 2005.
[15] J. Jiang, "Learning Protein Functions from Bi-Relational Graph of Proteins and Function Annotations," Proc. 11th Int'l Conf. Algorithms in Bioinformatics, pp. 128-138, 2011.
[16] J. Jiang and L. McQuay, "Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1059-1069, July/Aug. 2012.
[17] G. Pandey, C. Myers, and V. Kumar, "Incorporating Functional Inter-Relationships into Protein Function Prediction Algorithms," BMC Bioinformatics, vol. 10, no. 1,article 142, 2009.
[18] X. Zhang and D. Dai, "A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 740-753, May/June 2012.
[19] L. Tang, J. Chen, and J. Ye, "On Multiple Kernel Learning with Multiple Labels," Proc. 21st Int'l Joint Conf. Artifical Intelligence (IJCAI '09), pp. 1255-1260, 2009.
[20] S. Bucak, R. Jin, and A. Jain, "Multi-Label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition," Proc. Advances in Neural Information Processing Systems (NIPS '10), pp. 1145-1154, 2010.
[21] G. Yu, C. Domeniconi, H. Rangwala, G. Zhang, and Z. Yu, "Transductive Multi-Label Ensemble Classification for Protein Function Prediction," Proc. 18th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '12), pp. 1077-1085, 2012.
[22] H. Wang, H. Huang, and C. Ding, "Image Annotation Using Bi-Relational Graph of Images and Semantic Labels," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '11), pp. 793-800, 2011.
[23] A. Ruepp et al., "The FunCat, a Functional Annotation Scheme for Systematic Classification of Proteins from Whole Genomes," Nucleic Acids Research, vol. 32, no. 18, pp. 5539-5545, 2004.
[24] G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining Multi-Label Data," Data Mining and Knowledge Discovery Handbook, pp. 667-685, Springer, 2010.
[25] A. Elisseeff and J. Weston, "A Kernel Method for Multi-Labeled Classification," Proc. Advances in Neural Information Processing Systems (NIPS '01), pp. 681-687, 2001.
[26] G. Chen, J. Zhang, F. Wang, C. Zhang, and Y. Gao, "Efficient Multi-Label Classification with Hypergraph Regularization," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '09), pp. 1658-1665, 2009.
[27] Z. Barutcuoglu, R. Schapire, and O. Troyanskaya, "Hierarchical Multi-Label Prediction of Gene Function," Bioinformatics, vol. 22, no. 7, pp. 830-836, 2006.
[28] G. Valentini, "True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 832-847, May/June 2011.
[29] D. Lin, "An Information-Theoretic Definition of Similarity," Proc. 15th Int'l Conf. Machine Learning, pp. 296-304, 1998.
[30] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, "Learning with Local and Global Consistency," Proc. Advances in Neural Information Processing Systems (NIPS '04), pp. 321-328, 2004.
[31] M. Belkin, P. Niyogi, and V. Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples," J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.
[32] A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood from Incomplete Data via the Em Algorithm," J. Royal Statistical Soc. Series B (Methodological), vol. 39, pp. 1-38, 1977.
[33] S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris, "GeneMANIA: A Real-Time Multiple Association Network Integration Algorithm for Predicting Gene Function," Genome Biology, vol. 9, no. suppl. 1, article S4, 2008.
[34] N. Cesa-Bianchi, M. Re, and G. Valentini, "Synergy of Multi-Label Hierarchical Ensembles, Data Fusion, and Cost-Sensitive Methods for Gene Functional Inference," Machine Learning, vol. 88, nos. 1/2, pp. 1-33, 2012.
[35] M. Gönen and E. Alpaydin, "Multiple Kernel Learning Algorithms," J. Machine Learning Research, vol. 12, pp. 2211-2268, 2011.
[36] Z. Zha, T. Mei, J. Wang, Z. Wang, and X. Hua, "Graph-Based Semi-Supervised Learning with Multiple Labels," J. Visual Comm. and Image Representation, vol. 20, no. 2, pp. 97-103, 2009.
[37] H. Tong, C. Faloutsos, and J. Pan, "Random Walk with Restart: Fast Solutions and Applications," Knowledge and Information Systems, vol. 14, no. 3, pp. 327-346, 2008.
[38] B. Schwikowski et al., "A Network of Protein-Protein Interactions in Yeast," Nature Biotechnology, vol. 18, no. 12, pp. 1257-1261, 2000.
[39] P. Bogdanov and A. Singh, "Molecular Function Prediction Using Neighborhood Features," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 208-217, Apr.-June 2010.
[40] L. Kuncheva and C. Whitaker, "Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy," Machine Learning, vol. 51, no. 2, pp. 181-207, 2003.
[41] R. Fan and C. Lin, "A Study on Threshold Selection for Multi-Label Classification," techical report, Dept. of Computer Science, Nat'l Taiwan Univ., 2007.
[42] S. Bucak, R. Jin, and A. Jain, "Multi-Label Learning with Incomplete Class Assignments," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '11), pp. 2801-2808, 2011.
[43] L.I. Kuncheva, J.C. Bezdek, and R. Duin, "Decision Templates for Multiple Classifier Fusion: An Experimental Comparison," Pattern Recognition, vol. 34, no. 2, pp. 299-314, 2001.
[44] G. Yu, G. Zhang, Z. Zhang, Z. Yu, and L. Deng, "Semi-Supervised Classification Based on Subspace Sparse Representation," Knowledge and Information Systems, http://link.springer.com/article10.1007%2Fs10115-013-0702-2 , 2013.
58 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool