Issue No. 04 - July-Aug. (2013 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.111
Guoxian Yu , Coll. of Comput. & Inf. Sci., Southwest Univ., Beibei, China
Huzefa Rangwala , Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Carlotta Domeniconi , Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Guoji Zhang , Sch. of Sci., South China Univ. of Technol., Guangzhou, China
Zhiwen Yu , Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or nonlinearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multilabel learning methods are also adapted for protein function prediction. We develop a transductive multilabel classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multilabel ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. The TMEC trains a graph-based multilabel classifier on each single data source, and then combines the predictions of the individual classifiers. We use a directed birelational graph to capture the relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of the TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels. The code, data sets used in this paper and supplemental material are available at https://sites.google.com/site/guoxian85/tmec.
Proteins, Bioinformatics, Computational biology
Guoxian Yu, H. Rangwala, C. Domeniconi, Guoji Zhang and Zhiwen Yu, "Protein Function Prediction Using Multilabel Ensemble Classification," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 4, pp. 1045-1057, 2013.