loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
19th International Conference on Data Engineering (ICDE'03)
CLUSEQ: Efficient and Effective Sequence Clustering
Bangalore, India
March 05-March 08
ISBN: 0-7803-7665-X
Jiong Yang, University of Illinois at Urbana-Champaign
Wei Wang, University of North Carolina at Chapel Hill
Analyzing sequence data has become increasingly important recently in the area of biological sequences, text documents, web access logs, etc. In this paper, we investigate the problem of clustering sequences based on their sequential features. As a widely recognized technique, clustering has proven to be very useful in detecting unknown object categories and revealing hidden correlations among objects. One difficulty that prevents clustering from being performed extensively on sequence data (in categorical domain) is the lack of an effective yet efficient similarity measure. Therefore, we propose a novel model (CLUSEQ) for sequence cluster by exploring significant statistical properties possessed by the sequences. The conditional probability distribution (CPD) of the next symbol given a preceding segment is derived and used to characterize sequence behavior and to support the similarity measure. A variation of the suffix tree, namely probabilistic suffix tree, is employed to organize (the significant portion of) the CPD in a concise way. A novel algorithm is devised to efficiently discover clusters with high quality and is able to automatically adjust the number of clusters to its optimal range via a unique combination of successive new cluster generation and cluster consolidation. The performance of CLUSEQ has been demonstrated via extensive experiments on several real and synthetic sequence databases.
Citation:
Jiong Yang, Wei Wang, "CLUSEQ: Efficient and Effective Sequence Clustering," icde, pp.101, 19th International Conference on Data Engineering (ICDE'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.