A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition
2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (2012)
Philadelphia, USA USA
Oct. 4, 2012 to Oct. 7, 2012
Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.
palindromes, base composition, spectrum sets, restriction sites
"A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition," 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops(BIBMW), Philadelphia, USA USA, 2012, pp. 696-703.