Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles
Issue No. 03 - July-September (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.45
Hong-Jie Dai , National Tsing-Hua University, Hsinchu, Taiwan
Po-Ting Lai , Yuan Ze University, Ching-Li, Taiwan
Richard Tzong-Han Tsai , Yuan Ze University, Ching-Li, Taiwan
The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.
Data mining, feature evaluation and selection, mining methods and algorithms, text mining, scientific databases.
H. Dai, P. Lai and R. Tzong-Han Tsai, "Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 412-420, 2010.