Issue No. 04 - July-Aug. (2012 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.156
L. J. McQuay , Pharmacoepidemiology & Risk Manage. Div., RTI Health Solutions, Research Triangle Park, NC, USA
J. Q. Jiang , Dept. of Inf. Syst., City Univ. of Hong Kong, Kowloon, China
Assigning biological functions to uncharacterized proteins is a fundamental problem in the postgenomic era. The increasing availability of large amounts of data on protein-protein interactions (PPIs) has led to the emergence of a considerable number of computational methods for determining protein function in the context of a network. These algorithms, however, treat each functional class in isolation and thereby often suffer from the difficulty of the scarcity of labeled data. In reality, different functional classes are naturally dependent on one another. We propose a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network. The guiding intuition is that the classification function should be sufficiently smooth on subgraphs where the respective topologies of these two networks are a good match. We encode this intuition as regularized learning with intraclass and interclass consistency, which can be understood as an extension of the graph-based learning with local and global consistency (LGC) method. Cross validation on the yeast proteome illustrates that MCSL consistently outperforms several state-of-the-art methods. Most notably, it effectively overcomes the problem associated with scarcity of label data. The supplementary files are freely available at http://sites.google.com/site/csaijiang/MCSL.
proteins, bioinformatics, genetics, graph theory, learning (artificial intelligence), pattern classification, yeast proteome, protein function prediction, multilabel correlated semisupervised learning, biological function, postgenomic era, protein-protein interaction, intrinsic correlation, functional class network, classification function, subgraph, topology, regularized learning, intraclass consistency, interclass consistency, graph-based learning, local consistency method, global consistency method, cross validation, Proteins, Kernel, Prediction algorithms, Bioinformatics, Correlation, Computational biology, Electronic mail, functional class correlation., Protein function prediction, semi-supervised learning, multi-label learning, kernel learning
L. J. McQuay and J. Q. Jiang, "Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. , pp. 1059-1069, 2012.