loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
IEEE Computer Society Bioinformatics Conference (CSB'03)
A Probabilistic Model for Identifying Protein Names and their Name Boundaries
Stanford, California
August 11-August 14
ISBN: 0-7695-2000-6
Kazuhiro Seki, Indiana University
Javed Mostafa, Indiana University
This paper proposes a method for identifying protein names in biomedical texts with an emphasis on detecting protein name boundaries. We use a probabilistic model which exploits several surface clues characterizing protein names and incorporates word classes for generalization. In contrast to previously proposed methods, our approach does not rely on natural language processing tools such as part-of-speech taggers and syntactic parsers, so as to reduce processing overhead and the potential number of probabilistic parameters to be estimated. A notion of certainty is also proposed to improve precision for identification. We implemented a protein name identification system based on our proposed method, and evaluated the system on real-world biomedical texts in conjunction with the previous work. The results showed that overall our system performs comparably to the state-of-the-art protein name identification system and that higher performance is achieved for compound names. In addition, it is demonstrated that our system can further improve precision by restricting the system output to those names with high certainties.
Citation:
Kazuhiro Seki, Javed Mostafa, "A Probabilistic Model for Identifying Protein Names and their Name Boundaries," csb, pp.251, IEEE Computer Society Bioinformatics Conference (CSB'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.