loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2007 Data Compression Conference (DCC'07)
A Simple Statistical Algorithm for Biological Sequence Compression
Snowbird, Utah
March 27-March 29
ISBN: 0-7695-2791-4
Minh Duc Cao, Monash University, Australia
Trevor I. Dix, Monash University, Australia
Lloyd Allison, Monash University, Australia
Chris Mears, Monash University, Australia
This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time.
Citation:
Minh Duc Cao, Trevor I. Dix, Lloyd Allison, Chris Mears, "A Simple Statistical Algorithm for Biological Sequence Compression," dcc, pp.43-52, 2007 Data Compression Conference (DCC'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.