This Article 
 Bibliographic References 
 Add to: 
Hash Subgraph Pairwise Kernel for Protein-Protein Interaction Extraction
July-Aug. 2012 (vol. 9 no. 4)
pp. 1190-1202
Zhihao Yang, Coll. of Comput. Sci., Dalian Univ. of Technol., Dalian, China
Hongfei Lin, Coll. of Comput. Sci., Dalian Univ. of Technol., Dalian, China
Yijia Zhang, Coll. of Comput. Sci., Dalian Univ. of Technol., Dalian, China
Jian Wang, Coll. of Comput. Sci., Dalian Univ. of Technol., Dalian, China
Yanpeng Li, Coll. of Comput. Sci., Dalian Univ. of Technol., Dalian, China
Extracting protein-protein interaction (PPI) from biomedical literature is an important task in biomedical text mining (BioTM). In this paper, we propose a hash subgraph pairwise (HSP) kernel-based approach for this task. The key to the novel kernel is to use the hierarchical hash labels to express the structural information of subgraphs in a linear time. We apply the graph kernel to compute dependency graphs representing the sentence structure for protein-protein interaction extraction task, which can efficiently make use of full graph structural information, and particularly capture the contiguous topological and label information ignored before. We evaluate the proposed approach on five publicly available PPI corpora. The experimental results show that our approach significantly outperforms all-path kernel approach on all five corpora and achieves state-of-the-art performance.

[1] A.M. Cohen and W.R. Hersh, "A Survey of Current Work in Biomedical Text Mining," Briefings in Bioinformatics, vol. 6, no. 1, pp. 57-71, 2005.
[2] W. Hersh, A. Cohen, P. Roberts, and H.K. Rekapalli, "TREC 2006 Genomics Track Overview," Proc. 15th Text Retrieval Conf. (TREC '06), 2006.
[3] C. Blaschke and A. Valencia, "The Frame-Based Module of the Suiseki Information Extraction System," IEEE Intelligent Systems, vol. 17, no. 2, pp. 14-20, Mar./Apr. 2002.
[4] D.P. Corney, B.F. Buxton, W.B. Langdon, and D.T. Jones, "BioRAT: Extracting Biological Information from Full-Length Papers," Bioinformatics, vol. 20, no. 17, pp. 3206-3213, 2004.
[5] T. Ono, H. Hishigaki, A. Tanigam, and T. Takagi, "Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature," Bioinformatics, vol. 17, no. 2, pp. 156-161, 2001.
[6] D. Kim, T. Ohta, and J. Tsujii, "Corpus Annotation for Mining Biomedical Events from Literature," BMC Bioinformatics, vol. 9, article 10, 2008.
[7] R. Bunescu, R. Ge, R. Kate, E. Marcotte, R. Mooney, A. Ramani, and Y. Wong, "Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions," Artificial Intelligence in Medicine, vol. 33, no. 2, pp. 139-155, 2005.
[8] S. Choi and S. Myaeng, "Simplicity is Better: Revisiting Single Kernel PPI Extraction," Proc. 23rd Computational Linguistics Conf. (Coling '10), pp. 206-214, 2010.
[9] Z. Yang, N. Tang, X. Zhang, H. Lin, Y. Li, and Z. Yang, "Multiple Kernel Learning in Protein-Protein Interaction Extraction from Biomedical Literature," Artificial Intelligence in Medicine, vol. 51, pp. 163-173, 2011.
[10] M. Miwa, R. Soetre, Y. Miyao, and J. Tsujii, "Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers," J. Medical Informatics, vol. 78, no. 12, pp. 39-46, 2009.
[11] A. Airola, S. Pyysalo, J. Björne, T. Pahikkala, F. Ginter, and T. Salakoski, "All-Paths Graph Kernel for Protein-Protein Interaction Extraction with Evaluation of Cross-Corpus Learning," BMC Bioinformatics, vol. 9, no. Suppl. 11, article S2, 2008.
[12] D. Haussler, "Convolution Kernels on Discrete Structures," Technical Report UCS-CRL-99-10, UC Santa Cruz, 1999.
[13] T. Gärtner, P.A. Lach, and S. Wrobel, "On Graph Kernels: Hardness Results and Efficient Alternatives," Proc. 16th Learning Theory Conf., pp. 129-143, 2003.
[14] R. Bunescu and R. Mooney, "Subsequence Kernels for Relation Extraction," Proc. 18th Advances in Neural Information Processing Systems (NIPS '06), pp. 171-178, 2006.
[15] S. Kim, J. Yoon, J. Yang, and S. Park, "Walk-Weighted Subsequence Kernels for Protein-Protein Interaction Extraction," BMC Bioinformatics, vol. 11, article 107, 2010.
[16] D. Zelenko, C. Aone, and A. Richardella, "Kernel Methods for Relation Extraction," J. Machine Learning Research, vol. 4, no. 3, pp. 1083-1106, 2003.
[17] R. Bunescu and R. Mooney, "A Shortest Path Dependency Kernel for Relation Extraction," Proc. Human Language Technology Conf. and Conf. Empirical Methods in Natural Language Processing Assoc. for Computational Linguistics (ACL '05), pp. 724-731, 2005.
[18] C. Blaschke, M. Andrade, C. Ouzounis, and A. Valencia, "Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions," Proc. Int'l Conf. Intelligent Systems for Molecular Biology, pp. 60-67, 1999.
[19] K. Fundel, R. Kuffner, and R. Zimmer, "RelEx—Relation Extraction Using Dependency Parse Trees," Bioinformatics, vol. 23, no. 3, pp. 365-371, 2007.
[20] Y. Li, X. Hu, H. Lin, and Z. Yang, "Learning an Enriched Representation from Unlabeled Data for Protein-Protein Interaction Extraction," BMC Bioinformatics, vol. 11, no. Suppl. 2, article S7, 2010.
[21] G. Zhou, M. Zhang, D.H. Ji, and Q. Zhu, "Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Pares Tree Information," Proc. EMNLP and CNLL Prague, pp. 728-736, 2007.
[22] M. Lease and E. Charniak, "Parsing Biomedical Literature," Proc. Second Int'l Joint Conf. Natural Language Processing, pp. 58-69, 2005.
[23] S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, and T. Salakoski, "BioInfer: A Corpus for Information Extraction in the Biomedical Domain," BMC Bioinformatics, vol. 8, article 50, 2007.
[24] J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, "Mining MEDLINE: Abstracts, Sentences, or Phrases?," Proc. Pacific Symp. Biocomputing, pp. 326-337, 2002.
[25] C. Nédellec, "Learning Language in Logic—Genic Interaction Extraction Challenge," Proc. Fourth Learning Language in Logic Workshop, pp. 31-37, 2005.
[26] S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski, "Comparative Analysis of Five Protein-Protein Interaction Corpora," BMC Bioinformatics, vol. 9, no. Suppl. 3, article S6, 2008.
[27] J.A. Hanley and B.J. McNeil, "The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve," Radiology, vol. 143, no.1, pp. 29-36, 1982.
[28] A.P. Bradley, "The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms," Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.
[29] Y. Zhang, H. Lin, Z. Yang, and Y. Li, "Neighborhood Hash Graph Kernel for Protein-Protein Interaction Extraction," J. Biomedical Informatics, vol. 44, no. 6, pp. 1086-1092, 2011.
[30] S.V. Landeghem, Y. Saeys, Y. Peer, and B.D. Baets, "Extracting Protein-Protein Interactions from Text Using Rich Feature Vectors and Feature Selection," Proc. Third Int'l Symp. Semantic Mining in Biomedicine, pp. 77-84, 2008.
[31] R. Sætre, K. Sagae, and J. Tsujii, "Syntactic Features for Protein-Protein Interaction Extraction," Proc. Second Int'l Symp. Languages in Biology and Medicine, pp. 6.1-6.14, 2007.
[32] R. Kabiljo, A.B. Clegg, and A.J. Shepherd, "A Realistic Assessment of Methods for Extracting Gene/Protein Interactions from Free Text," BMC Bioinformatics, vol. 10, article 233, 2009.
[33] K. Sagae and J. Tsujii, "Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles," Proc. CoNLL Shared Task Session of EMNLP-CoNLL (EMNLP-CoNLL '07), pp. 1044-1050, 2007.

Index Terms:
proteins,data mining,graph theory,medical information systems,PPI corpora,hash subgraph pairwise kernel,protein-protein interaction extraction,biomedical literature,biomedical text mining,graph kernel,dependency graphs,sentence structure,Kernel,Syntactics,Proteins,Protein engineering,Feature extraction,Bioinformatics,Arrays,graph kernel.,Biomedical text mining,hash,interaction extraction
Zhihao Yang, Hongfei Lin, Yijia Zhang, Jian Wang, Yanpeng Li, "Hash Subgraph Pairwise Kernel for Protein-Protein Interaction Extraction," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1190-1202, July-Aug. 2012, doi:10.1109/TCBB.2012.50
Usage of this product signifies your acceptance of the Terms of Use.