The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May-June (2013 vol.10)
pp: 793-798
S. P. Garcia , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
J. M. O. S. Rodrigues , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
S. Santos , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
D. Pratas , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
V. Afreixo , Dept. of Math., Univ. of Aveiro, Aveiro, Portugal
C. A. C. Bastos , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
P. J. S. G. Ferreira , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
A. J. Pinho , Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
ABSTRACT
Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.
INDEX TERMS
Bioinformatics, Genomics, Assembly, Sequential analysis, Computational biology, Materials,maximal exact matches, genomics, bioinformatics, compressed maximal exact matches, genome sequencing, human genome assemblies, nesting structure, multiple finite-context models, genome sequences, relative compressibility, information-theoretic measure, normalized compression distance, perfect repeats, assembly quality metrics, alignment-free genomic distance, Bioinformatics, Genomics, Assembly, Sequential analysis, Computational biology, Materials, normalized compression distance, Genome sequencing and assembly
CITATION
S. P. Garcia, J. M. O. S. Rodrigues, S. Santos, D. Pratas, V. Afreixo, C. A. C. Bastos, P. J. S. G. Ferreira, A. J. Pinho, "A Genomic Distance for Assembly Comparison Based on Compressed Maximal Exact Matches", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 3, pp. 793-798, May-June 2013, doi:10.1109/TCBB.2013.77
93 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool