loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fifth International Conference on Computer and Information Technology (CIT'05)
Efficient Algorithms for Mining Maximal Frequent Concatenate Sequences in Biological Datasets
Shanghai, China
September 21-September 23
ISBN: 0-7695-2432-X
Jin Pan, Fudan University
Peng Wang, Fudan University
Wei Wang, Fudan University
Baile Shi, Fudan University
Genxing Yang, Shanghai Software Test Key Lab

The growth of bioinformatics has resulted in datasets with new characteristics. The DNA sequences typically contain a large number of items. From them biologists assemble a whole genome of species based on frequent concatenate sequences, which ordinarily have hundreds of items.

Such datasets pose a great challenge for existing frequent pattern discovery algorithms. Almost all of them are Apriori-like and so have an exponential dependence on the average sequence length. PrefixSpan is the most efficient algorithm, which presented the projection-based sequential patterngrowth approach. However it grows sequential patterns by exploring length-1 frequent patterns and so is not suitable for biological dataset with long frequent concatenate sequences.

In this paper, we propose two novel algorithms, called MacosFSpan and MacosVSpan, to mine maximal frequent concatenate sequences. They are specially designed to handle datasets having long frequent concatenate sequences. Our performance study shows that MacosFSpan outperforms the traditional methods with length-1 sequences exploration and MacosVSpan is more efficient than MacosVSpan.

Citation:
Jin Pan, Peng Wang, Wei Wang, Baile Shi, Genxing Yang, "Efficient Algorithms for Mining Maximal Frequent Concatenate Sequences in Biological Datasets," cit, pp.98-104, Fifth International Conference on Computer and Information Technology (CIT'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.