loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
IEEE Computer Society Bioinformatics Conference (CSB'02)
Towards Automatic Clustering of Protein Sequences
Stanford, California
August 14-August 16
ISBN: 0-7695-1653-X
Jiong Yang, IBM T.J. Watson Research Center
Wei Wang, University of North Carolina at Chapel Hill
Analyzing protein sequence data becomes increasingly important recently. Most previous work on this area has mainly focused on building classification models. In this paper, we investigate in the problem of automatic clustering of unlabeled protein sequences. As a widely recognized technique in statistics and computer science, clustering has been proven very useful in detecting unknown object categories and revealing hidden correlations among objects. One difficulty that prevents clustering from being performed directly on protein sequence is the lack of an effective similarity measure that can be computed efficiently. Therefore, we propose a novel model for protein sequence cluster by exploring significant statistical properties possessed by the sequences. The concept of imprecise probabilities are introduced to the original probabilistic suffix tree to monitor the convergence of the empirical measurement and to guide the clustering process. It has been demonstrated that the proposed method can successfully discover meaningful families without the necessity of learning models of different families from pre-labeled "training data".
Citation:
Jiong Yang, Wei Wang, "Towards Automatic Clustering of Protein Sequences," csb, pp.175, IEEE Computer Society Bioinformatics Conference (CSB'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.