The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 1690-1695
Richard Tzong-Han Tsai , Dept. of Comput. Sci. & Eng., Yuan Ze Univ., Chungli, Taiwan
ABSTRACT
Protein-protein interaction (PPI) database curation requires text-mining systems that can recognize and normalize interactor genes and return a ranked list of PPI pairs for each article. The order of PPI pairs in this list is essential for ease of curation. Most of the current PPI pair ranking approaches rely on association analysis between the two genes in the pair. However, we propose that ranking an extracted PPI pair by considering both the association between the paired genes and each of those genes' global associations with all other genes mentioned in the paper can provide a more reliable ranked list. In this work, we present a composite interaction score that considers not only the association score between two interactors (pair association score) but also their global association scores. We test three representative data fusion algorithms to estimate this global association score-two Borda-Fuse models and one linear combination model (LCM). The three estimation methods are evaluated using the data set of the BioCreative II.5 Interaction Pair Task (IPT) in terms of area under the interpolated precision/recall curve (AUC iP/R). Our experimental results indicate that using LCM to estimate the global association score can boost the AUC iP/R score from 0.0175 to 0.2396, outperforming the best BioCreative II.5 IPT system.
INDEX TERMS
Text mining, Databases, Bioinformatics, Proteins, Mutual information, Bioinformatics, Protein engineering,bioinformatics databases, Text mining
CITATION
Richard Tzong-Han Tsai, "Improving Protein-Protein Interaction Pair Ranking with an Integrated Global Association Score", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 6, pp. 1690-1695, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.99
REFERENCES
[1] L. Hirschman, M. Colosimo, A. Morgan, and A. Yeh, “Overview of BioCreAtIvE Task 1B: Normalized Gene Lists,” BMC Bioinformatics, vol. 6, article S11, 2005.
[2] A. Morgan, Z. Lu, X. Wang, A. Cohen, J. Fluck, P. Ruch, A. Divoli, K. Fundel, R. Leaman, J. Hakenberg, C. Sun, H.-h. Liu, R. Torres, M. Krauthammer, W. Lau, H. Liu, C.-N. Hsu, M. Schuemie, K.B. Cohen, and L. Hirschman, “Overview of BioCreative II Gene Normalization,” Genome Biology, vol. 9, no. Suppl 2, article S3, 2008.
[3] M. Krallinger, F. Leitner, and A. Valencia, “The BioCreative II.5 Challenge Overview,” Proc. BioCreative II.5 Workshop Digital Annotations, p. 19, 2009.
[4] F. Leitner, S.A. Mardis, M. Krallinger, G. Cesareni, L.A. Hirschman, and A. Valencia, “An Overview of BioCreative II.5,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 385-399, July-Sept. 2010.
[5] J.D. Wren, “Extending the Mutual Information Measure to Rank Inferred Literature Relationships,” BMC Bioinformatics, vol. 5, article 145, Oct., 2004.
[6] H. Chen and B.M. Sharp, “Content-Rich Biological Network Constructed by Mining Pubmed Abstracts,” BMC Bioinformatics, vol. 5, article 147, Oct., 2004.
[7] J.-R. Wen, N. Lao, and W.-Y. Ma, “Probabilistic Model for Contextual Retrieval,” Proc. 27th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2004.
[8] A. Uzgur, T. Vu, G. Erkan, and D.R. Radev, “Identifying Gene-Disease Associations Using Centrality on a Literature Mined Gene-Interaction Network,” Bioinformatics, vol. 24, pp. I277-I285, July, 2008.
[9] R.T.H. Tsai, P.T. Lai, H.J. Dai, C.H. Huang, Y.Y. Bow, Y.C. Chang, W.H. Pan, and W.L. Hsu, “HypertenGene: Extracting Key Hypertension Genes from Biomedical Literature with Position and Automatically-Generated Template Features,” BMC Bioinformatics, vol. 10, article 1, 2009.
[10] D.M. Wilkinson and B.A. Huberman, “A Method for Finding Communities of Related Genes,” Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 5241-5248, Apr., 2004.
[11] B.T. Bartell, G.W. Cottrell, and R.K. Belew, “Automatic Combination of Multiple Ranked Retrieval Systems,” Proc. 17th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 1994.
[12] C.C. Vogt and G.W. Cottrell, “Fusion via a Linear Combination of Scores,” Information Retrieval, vol. 1, pp. 151-173, 1999.
[13] W.A. BaumgartnerJr., Z. Lu, H.L. Johnson, J.G. Caporaso, J. Paquette, A. Lindemann, E.K. White, O. Medvedeva, K.B. Cohen, and L. Hunter, “Concept Recognition for Extracting Protein Interaction Relations from Biomedical Text,” Genome Biology, vol. 9, no. Suppl 2, article S9, 2008.
[14] H.J. Dai, P.T. Lai, and R.T. Tsai, “Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles,” IEEE/ACM Trans Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 412-420, July-Sept. 2010.
[15] D.F. Hsu and I. Taksa, “Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval,” Information Retrieval, vol. 8, pp. 449-480, 2005.
[16] Y.-T. Liu, T.-Y. Liu, T. Qin, Z.-M. Ma, and H. Li, “Supervised Rank Aggregation,” Proc. 16th Int'l Conf. World Wide Web, 2007.
[17] J.A. Aslam and M. Montague, “Models for Metasearch,” Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2001.
[18] F. Leitner, M. Krallinger, G. Cesareni, and A. Valencia, “The FEBS Letters SDA Corpus: A Collection of Protein Interaction Articles with High Quality Annotations for the Biocreative II.5 Online Challenge and the Text Mining Community,” FEBS Letters, vol. 584, pp. 4129-30, Oct., 2010.
[19] F. Leitner, S.A. Mardis, M. Krallinger, G. Cesareni, L.A. Hirschman, and A. Valencia, “An Overview of BioCreative II.5,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 385-399, July-Sept. 2010.
[20] A.P. Bradley, “The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms,” Pattern Recognition, vol. 30, pp. 1145-1159, 1997.
67 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool