The Community for Technology Leaders
2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) (2017)
Okinawa
Nov. 24, 2017 to Nov. 26, 2017
ISSN: 2189-8723
ISBN: 978-1-5090-6665-0
pp: 255-261
Yoichi Murakami , Department of Informatics, Tokyo University of Information Sciences, Chiba, Japan
Kenji Mizuguchi , Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and, Nutrition. Osaka, Japan
ABSTRACT
A better understanding of biological processes, pathways and functions requires reliable information about protein-protein interactions (PPIs). However, it is still a difficult task to identify complete PPI-networks experimentally in a cell or organism. To supplement the limitations of current experimental techniques, we have proposed PSOPIA, a computational method to predict whether two proteins interact or not (http://mizuguchilab.org/PSOPIA/) [1]. The selection of datasets is a big issue for the PPI prediction [2, 3]. It is generally believed that increasing the size and diversity of examples makes the dataset more representative and reduces the noise effects; however, for many algorithms, it is impractical to use a large-scale dataset at the proteome level because of the memory and CPU time requirements. In this study, PSOPIA was retrained on a highly imbalanced large-scale dataset having a diverse set of examples at the proteome level. The dataset consisted of 43,060 high confidence direct physical PPIs obtained from TargetMine [4] (as positives being only 0.13% of the total) and 33,098,951 negative PPIs. As a result, the new prediction model achieved the higher AUC of 0.89 (pAUCfpr<o.5% = 0.24) than the previous model of PSOPIA. Furthermore, it was applied to the problem of filtering out protein pairs incorrectly determined as interacting (false positives) from a low-confidence human PPI dataset. Here, we suggest that a diverse set of large-scale examples is a key toward more reliable PPI prediction, demonstrating the performance of PSOPIA at the proteome level.
INDEX TERMS
Proteins, Training, Reliability, Predictive models, Prediction algorithms, Computational modeling, Big Data
CITATION

Y. Murakami and K. Mizuguchi, "PSOPIA: Toward more reliable protein-protein interaction prediction from sequence information," 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Okinawa, 2017, pp. 255-261.
doi:10.1109/ICIIBMS.2017.8279749
92 ms
(Ver 3.3 (11022016))