This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Genomic Region Operation Kit for Flexible Processing of Deep Sequencing Data
Jan.-Feb. 2013 (vol. 10 no. 1)
pp. 200-206
Kristian Ovaska, Genome-Scale Biol. & Inst. of Biomed., Univ. of Helsinki, Helsinki, Finland
Lauri Lyly, Genome-Scale Biol. & Inst. of Biomed., Univ. of Helsinki, Helsinki, Finland
Biswajyoti Sahu, Inst. of Biomed., Univ. of Helsinki, Helsinki, Finland
Olli A. Janne, Inst. of Biomed., Physiol., Biomedicum, Univ. of Helsinki, Helsinki, Finland
Sampsa Hautaniemi, Genome-Scale Biol. & Inst. of Biomed., Univ. of Helsinki, Helsinki, Finland
Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from http://csbi.ltdk.helsinki.fi/grok/.
Index Terms:
SQL,algebra,bioinformatics,cancer,genomics,prostate cancer,flexible processing,deep sequencing data,computational analysis,data volume,set algebra,biomedical research questions,Genomic Region Operation Kit,GROK tool,preprocessing,filtering,file conversion,sample comparison,red-black trees,SQL database,transcription factor,Bioinformatics,Genomics,Databases,Benchmark testing,Algebra,Software,Complexity theory,software,Bioinformatics,deep sequencing,genomic data analysis,region set algebra
Citation:
Kristian Ovaska, Lauri Lyly, Biswajyoti Sahu, Olli A. Janne, Sampsa Hautaniemi, "Genomic Region Operation Kit for Flexible Processing of Deep Sequencing Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 1, pp. 200-206, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.170
Usage of this product signifies your acceptance of the Terms of Use.