loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2004 IEEE Computational Systems Bioinformatics Conference (CSB'04)
AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names
Stanford, California
August 16-August 19
ISBN: 0-7695-2194-0
Raf M. Podowski, AstraZeneca R&D Boston and Karolinska Institutet
John G. Cleary, Reel Two, Ltd. and University of Waikato
Nicholas T. Goncharoff, Reel Two, Inc.
Gregory Amoutzias, AstraZeneca
William S. Hayes, AstraZeneca R&D Boston
Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system?s internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.
Citation:
Raf M. Podowski, John G. Cleary, Nicholas T. Goncharoff, Gregory Amoutzias, William S. Hayes, "AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names," csb, pp.415-424, 2004 IEEE Computational Systems Bioinformatics Conference (CSB'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.