
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Shanika Kuruppu, Bryan BeresfordSmith, Thomas Conway, Justin Zobel, "Iterative Dictionary Construction for Compression of Large DNA Data Sets," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 137149, January/February, 2012.  
BibTex  x  
@article{ 10.1109/TCBB.2011.82, author = {Shanika Kuruppu and Bryan BeresfordSmith and Thomas Conway and Justin Zobel}, title = {Iterative Dictionary Construction for Compression of Large DNA Data Sets}, journal ={IEEE/ACM Transactions on Computational Biology and Bioinformatics}, volume = {9}, number = {1}, issn = {15455963}, year = {2012}, pages = {137149}, doi = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.82}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE/ACM Transactions on Computational Biology and Bioinformatics TI  Iterative Dictionary Construction for Compression of Large DNA Data Sets IS  1 SN  15455963 SP137 EP149 EPD  137149 A1  Shanika Kuruppu, A1  Bryan BeresfordSmith, A1  Thomas Conway, A1  Justin Zobel, PY  2012 KW  Dictionary construction KW  compression KW  DNA KW  large data sets. VL  9 JA  IEEE/ACM Transactions on Computational Biology and Bioinformatics ER   
[1] D. Wheeler et al., “The Complete Genome of an Individual by Massively Parallel DNA Sequencing,” Nature, vol. 452, no. 7189, pp. 872876, 2008.
[2] D. Bentley et al., “Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry,” Nature, vol. 456, no. 7218, pp. 5359, 2008.
[3] J. Wang et al., “The Diploid Genome Sequence of an Asian Individual,” Nature, vol. 456, no. 7218, pp. 6065, 2008.
[4] S. Schuster et al., “Complete Khoisan and Bantu Genomes from Southern Africa,” Nature, vol. 463, no. 7283, pp. 943947, 2010.
[5] A. Cannane and H. Williams, “GeneralPurpose Compression for Efficient Retrieval,” J. Am. Soc. for Information Science and Technology, vol. 52, no. 5, pp. 430437, 2001.
[6] B. Behzadi and F.L. Fessant, “DNA Compression Challenge Revisited: A Dynamic Programming Approach,” CPM '05: Proc. 16th Ann. Symp. Combinatorial Pattern Matching, pp. 190200, 2005.
[7] M.D. Cao, T. Dix, L. Allison, and C. Mears, “A Simple Statistical Algorithm for Biological Sequence Compression,” DCC '07: Proc. Data Compression Conf., pp. 4352, 2007.
[8] X. Chen, S. Kwong, and M. Li, “A Compression Algorithm for DNA Sequences and Its Applications in Genome Comparison,” RECOMB '00: Proc. Fourth Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 107117, 2000.
[9] X. Chen, M. Li, B. Ma, and J. Tromp, “DNACompress: Fast and Effective DNA Sequence Compression,” Bioinformatics, vol. 18, no. 12, pp. 16961698, 2002.
[10] D. Loewenstern and P. Yianilos, “Significantly Lower Entropy Estimates for Natural DNA Sequences,” DCC '97: Proc. Data Compression Conf., p. 151, 1997.
[11] T. Matsumoto, K. Sadakane, and H. Imai, “Biological Sequence Compression Algorithms,” Genome Informatics, vol. 11, pp. 4352, 2000.
[12] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. Information Theory, vol. IT23, no. 3, pp. 337343, May 1977.
[13] J. Cleary and I. Witten, “Data Compression Using Adaptive Coding and Partial String Matching,” IEEE Trans. Comm., vol. COM32, no. 4, pp. 396402, Apr. 1984.
[14] P. Deutsch, “Gzip File Format Specification Version 4.3,” 1996.
[15] S. Grumbach and F. Tahi, “Compression of DNA Sequences,” DCC '93: Proc. Data Compression Conf., pp. 340350, 1993.
[16] E. Rivals, J. Delahaye, M. Dauchet, and O. Delgrange, “A Guaranteed Compression Scheme for Repetitive DNA Sequences,” DCC '96: Proc. Data Compression Conf., p. 453, 1996.
[17] A. Apostolico and S. Lonardi, “Compression of Biological Sequences by Greedy OffLine Textual Substitution,” DCC '00: Proc. Data Compression Conf., pp. 143152, 2000.
[18] G. Korodi and I. Tabus, “An Efficient Normalized Maximum Likelihood Algorithm for DNA Sequence Compression,” ACM Trans. Information Systems, vol. 23, no. 1, pp. 334, 2005.
[19] S. Christley, Y. Lu, C. Li, and X. Xie, “Human Genomes as Email Attachments,” Bioinformatics, vol. 25, no. 2, pp. 274275, 2009.
[20] M. Brandon, D. Wallace, and P. Baldi, “Data Structures and Compression Algorithms for Genomic Sequence Data,” Bioinformatics, vol. 25, no. 14, pp. 17311738, 2009.
[21] J. Sirén, N. Välimäki, V. Mäkinen, and G. Navarro, “RunLength Compressed Indexes Are Superior for Highly Repetitive Sequence Collections,” SPIRE '08: Proc. 15th Int'l Symp. String Processing and Information Retrieval, pp. 164175, 2009.
[22] V. Mäkinen, G. Navarro, J. Sirén, and N. Välimäki, “Storage and Retrieval of Individual Genomes,” RECOMB '09: Proc. 13th Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 121137, 2009.
[23] V. Mäkinen, G. Navarro, J. Sirén, and N. Välimäki, “Storage and Retrieval of Highly Repetitive Sequence Collections,” J. Computational Biology, vol. 17, no. 3, pp. 281308, 2010.
[24] F. Claude, A. Fariña, M. MartínezPrieto, and G. Navarro, “Compressed $q$ Gram Indexing for Highly Repetitive Biological Sequences,” Proc. 10th IEEE Conf. Bioinformatics and Bioeng., pp. 8691, 2010.
[25] N.J. Larsson and A. Moffat, “Offline DictionaryBased Compression,” DCC '99: Proc. Data Compression Conf., pp. 296305, 1999.
[26] F. Claude and G. Navarro, “SelfIndexed Text Compression Using StraightLine Programs,” MFCS '09: Proc. 34th Int'l Symp. Math. Foundations of Computer Science, pp. 235246, 2009.
[27] S. Kreft and G. Navarro, “LZ77Like Compression with Fast Random Access,” DCC '10: Proc. 20th Data Compression Conf., pp. 239248, 2010.
[28] S. Kuruppu, S.J. Puglisi, and J. Zobel, “Relative LempelZiv Compression of Genomes for LargeScale Storage and Retrieval,” SPIRE '10: Proc. 16th Int'l Symp. String Processing and Information Retrieval, E. Chavez and S. Lonardi, eds., pp. 201206, 2010.
[29] S. Kuruppu, S.J. Puglisi, and J. Zobel, “Optimized Relative LempelZiv Compression of Genomes,” ACSC '11: Proc. 34th Australasian Computer Science Conf., M. Reynolds, ed., pp. 9198, 2011.
[30] C. NevilleManning and I. Witten, “Compression and Explanation Using Hierarchical Grammars,” The Computer J., vol. 40, nos. 2/3, pp. 103116, 1997.
[31] G. Manzini and M. Rastero, “A Simple and Fast DNA Compressor,” Software—Practice and Experience, vol. 34, pp. 13971411, 2004.
[32] S. Hirschberg and D. Lelewer, “Efficient Decoding of Prefix Coding,” Comm. ACM, vol. 33, no. 4, pp. 449459, 1990.
[33] M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, and A. Shelat, “The Smallest Grammar Problem,” IEEE Trans. Information Theory, vol. 51, no. 7, pp. 25542576, July 2005.
[34] D. Okanohara and K. Sadakane, “Practical EntropyCompressed Rank/Select Dictionary,” ALENEX '07: Proc. Workshop Algorithm Eng. and Experiments, 2007.
[35] S. Levy et al., “The Diploid Genome Sequence of an Individual Human,” PLoS Biology, vol. 5, no. 10, p. e254, 2007.
[36] S.M. Ahn et al., “The First Korean Genome Sequence and Analysis: Full Genome Sequencing for a SocioEthnic Group,” Genome Research, vol. 19, no. 9, pp. 16221629, 2009.