CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2013 vol.10 Issue No.04 - July-Aug.
Issue No.04 - July-Aug. (2013 vol.10)
Guoxian Yu , Coll. of Comput. & Inf. Sci., Southwest Univ., Beibei, China
Huzefa Rangwala , Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Carlotta Domeniconi , Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Guoji Zhang , Sch. of Sci., South China Univ. of Technol., Guangzhou, China
Zhiwen Yu , Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.111
High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or nonlinearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multilabel learning methods are also adapted for protein function prediction. We develop a transductive multilabel classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multilabel ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. The TMEC trains a graph-based multilabel classifier on each single data source, and then combines the predictions of the individual classifiers. We use a directed birelational graph to capture the relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of the TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels. The code, data sets used in this paper and supplemental material are available at https://sites.google.com/site/guoxian85/tmec.
proteomics, benchmark testing, biology computing, genomics, benchmark, protein function prediction, multilabel ensemble classification, high throughput experimental techniques, heterogeneous proteomic data sets, heterogeneous genomic data sets, computational annotation, composite kernel, transductive multilabel ensemble classifier, transductive multilabel classifier, TMEC method, TMC method, Proteins, Kernel, Correlation, Bioinformatics, Vectors, IEEE transactions, Computational biology, protein function prediction, Multilabel ensemble classifiers, directed birelational graph
Guoxian Yu, Huzefa Rangwala, Carlotta Domeniconi, Guoji Zhang, Zhiwen Yu, "Protein Function Prediction Using Multilabel Ensemble Classification", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 4, pp. 1045-1057, July-Aug. 2013, doi:10.1109/TCBB.2013.111