|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Mark Howison, "High-Throughput Compression of FASTQ Data with SeqDB," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 1, pp. 213-218, Jan.-Feb., 2013. | |||
| BibTex | x | ||
| @article{ 10.1109/TCBB.2012.160, author = {Mark Howison}, title = {High-Throughput Compression of FASTQ Data with SeqDB}, journal ={IEEE/ACM Transactions on Computational Biology and Bioinformatics}, volume = {10}, number = {1}, issn = {1545-5963}, year = {2013}, pages = {213-218}, doi = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.160}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics TI - High-Throughput Compression of FASTQ Data with SeqDB IS - 1 SN - 1545-5963 SP213 EP218 EPD - 213-218 A1 - Mark Howison, PY - 2013 KW - Throughput KW - Arrays KW - Bandwidth KW - Libraries KW - Bioinformatics KW - Instruction sets KW - Genomics KW - FASTQ KW - Compression KW - data storage KW - next-generation sequencing VL - 10 JA - IEEE/ACM Transactions on Computational Biology and Bioinformatics ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.160
Compression has become a critical step in storing next-generation sequencing (NGS) data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of sequence data with minimal sacrifice in compression ratio. It achieves this by combining the existing multithreaded Blosc compressor with a new data-parallel byte-packing scheme, called SeqPack, which interleaves sequence data and quality scores.
Index Terms:
Throughput,Arrays,Bandwidth,Libraries,Bioinformatics,Instruction sets,Genomics,FASTQ,Compression,data storage,next-generation sequencing
Citation:
Mark Howison, "High-Throughput Compression of FASTQ Data with SeqDB," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 1, pp. 213-218, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.160
Usage of this product signifies your acceptance of the Terms of Use.

