This Article 
 Bibliographic References 
 Add to: 
Scaffold Filling under the Breakpoint and Related Distances
July-Aug. 2012 (vol. 9 no. 4)
pp. 1220-1229
Chunfang Zheng, Dept. d'Inf. et de Rech. Operationnelle, Univ. de Montreal, Montreal, QC, Canada
Haitao Jiang, Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
D. Sankoff, Dept. of Math. & Stat., Univ. of Ottawa, Ottawa, ON, Canada
Binhai Zhu, Dept. of Comput. Sci., Montana State Univ., Bozeman, MT, USA
Motivated by the trend of genome sequencing without completing the sequence of the whole genomes, a problem on filling an incomplete multichromosomal genome (or scaffold) I with respect to a complete target genome G was studied. The objective is to minimize the resulting genomic distance between I' and G, where I' is the corresponding filled scaffold. We call this problem the one-sided scaffold filling problem. In this paper, we conduct a systematic study for the scaffold filling problem under the breakpoint distance and its variants, for both unichromosomal and multichromosomal genomes (with and without gene repetitions). When the input genome contains no gene repetition (i.e., is a fragment of a permutation), we show that the two-sided scaffold filling problem (i.e., G is also incomplete) is polynomially solvable for unichromosomal genomes under the breakpoint distance and for multichromosomal genomes under the genomic (or DCJ-Double-Cut-and-Join) distance. However, when the input genome contains some repeated genes, even the one-sided scaffold filling problem becomes NP-complete when the similarity measure is the maximum number of adjacencies between two sequences. For this problem, we also present efficient constant-factor approximation algorithms: factor-2 for the general case and factor 1.33 for the one-sided case.

[1] G. Blin, G. Fertin, F. Sikora, and S. Vialette, "The Exemplar Breakpoint Distance for Non-Trivial Genomes Cannot be Approximated," Proc. Third Workshop Algorithm and Computation (WALCOM '09), pp. 357-368, 2009.
[2] P. Chain et al., "Genome Project Standards in a New Era of Sequencing," Science, vol. 326, pp. 236-237, 2009.
[3] Z. Chen, B. Fu, R. Fowler, and B. Zhu, "On the Inapproximability of the Exemplar Conserved Interval Distance Problem of Genomes," J. Combinatorial Optimization, vol. 15, no. 2, pp. 201-221, 2008.
[4] Z. Chen, B. Fu, B. Yang, J. Xu, Z. Zhao, and B. Zhu, "Non-breaking Similarity of Genomes with Gene Repetitions," Proc. 18th Symp. Combinatorial Pattern Matching (CPM '07), pp. 119-130, 2007.
[5] Z. Chen, B. Fu, and B. Zhu, "The Approximability of the Exemplar Breakpoint Distance Problem," Proc. Second Int'l Conf. Algorithmic Aspects in Information and Management (AAIM '06), pp. 291-302, 2006.
[6] R. Downey and M. Fellows, Parameterized Complexity. Springer-Verlag, 1999.
[7] J. Flum and M. Grohe, Parameterized Complexity Theory. Springer-Verlag, 2006.
[8] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman 1979.
[9] M. Jiang, "The Zero Exemplar Distance Problem," Proc. Int'l RECOMB-CG Workshop (RECOMB-CG '10), pp. 74-82, 2010.
[10] A. Muñoz, C. Zheng, Q. Zhu, V. Albert, S. Rounsley, and D. Sankoff, "Scaffold Filling, Contig Fusion and Gene Order Comparison," BMC Bioinformatics, vol. 11, article no. 304, 2010.
[11] A. Rissman, B. Mau, B. Biehl, A. Darling, J. Glasner, and N. Perna, "Reordering Contigs of Draft Genomes Using the Mauve Aligner," Bioinformatics, vol. 25, no. 16, pp. 2071-2073, 2009.
[12] D. Sankoff, "Genome Rearrangement with Gene Families," Bioinformatics, vol. 16, no. 11, pp. 909-917, 1999.
[13] G. Tesler, "Efficient Algorithms for Multichromosomal Genome Rearrangements," J. Computer and System Sciences, vol. 65, pp. 587-609, 2002.
[14] G. Watterson, W. Ewens, T. Hall, and A. Morgan, "The Chromosome Inversion Problem," J. Theoretical Biology, vol. 99, pp. 1-7, 1982.
[15] S. Yancopoulos, O. Attie, and R. Friedberg, "Efficient Sorting of Genomic Permutations by Translocation, Inversion and Block Interchange," Bioinformatics, vol. 21, pp. 3340-3346, 2005.

Index Terms:
genomics,biology computing,cellular biophysics,computational complexity,constant-factor approximation algorithms,genome sequencing,incomplete multichromosomal genome I,one-sided scaffold filling problem,breakpoint distance,unichromosomal genomes,two-sided scaffold filling problem,genomic distance,NP-complete problem,Bioinformatics,Genomics,Approximation methods,Polynomials,Approximation algorithms,Educational institutions,Computational biology,algorithms.,Comparative genomics,scaffold filling,breakpoint distance,genomic distance,DCJ,NP-completeness
Chunfang Zheng, Haitao Jiang, D. Sankoff, Binhai Zhu, "Scaffold Filling under the Breakpoint and Related Distances," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1220-1229, July-Aug. 2012, doi:10.1109/TCBB.2012.57
Usage of this product signifies your acceptance of the Terms of Use.