CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010 vol.7 Issue No.03 - July-September
Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features
Issue No.03 - July-September (2010 vol.7)
Artemy Kolchinsky , Indiana University, Bloomington
Alaa Abi-Haidar , Indiana University, Bloomington
Jasleen Kaur , Indiana University, Bloomington
Ahmed Abdeen Hamed , Indiana University, Bloomington
Luis M. Rocha , Indiana University, Bloomington
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.55
We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.
Text mining, literature mining, binary classification, protein-protein interaction, citation network.
Artemy Kolchinsky, Alaa Abi-Haidar, Jasleen Kaur, Ahmed Abdeen Hamed, Luis M. Rocha, "Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 3, pp. 400-411, July-September 2010, doi:10.1109/TCBB.2010.55