Subscribe
Issue No.04 - July-Aug. (2013 vol.10)
pp: 905-913
Nan Liu , Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
Haitao Jiang , Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
Daming Zhu , Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
Binhai Zhu , Dept. of Comput. Sci., Montana State Univ., Bozeman, MT, USA
ABSTRACT
Scaffold filling is a new combinatorial optimization problem in genome sequencing. The one-sided scaffold filling problem can be described as given an incomplete genome I and a complete (reference) genome G, fill the missing genes into I such that the number of common (string) adjacencies between the resulting genome I' and G is maximized. This problem is NP-complete for genome with duplicated genes and the best known approximation factor is 1.33, which uses a greedy strategy. In this paper, we prove a better lower bound of the optimal solution, and devise a new algorithm by exploiting the maximum matching method and a local improvement technique, which improves the approximation factor to 1.25. For genome with gene repetitions, this is the only known NP-complete problem which admits an approximation with a small constant factor (less than 1.5).
INDEX TERMS
Bioinformatics, Genomics, Approximation methods, Approximation algorithms, Educational institutions, Algorithm design and analysis, Sequential analysis,algorithms, Comparative genomics, scaffold filling, breakpoints, adjacencies, NP-completeness
CITATION
Nan Liu, Haitao Jiang, Daming Zhu, Binhai Zhu, "An Improved Approximation Algorithm for Scaffold Filling to Maximize the Common Adjacencies", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 4, pp. 905-913, July-Aug. 2013, doi:10.1109/TCBB.2013.100
REFERENCES
 [1] S. Angibaud, G. Fertin, I. Rusu, A. Thevenin, and S. Vialette, "On the Approximability of Comparing Genomes with Duplicates," J. Graph Algorithms and Applications, vol. 13, no. 1, pp. 19-53, 2009. [2] G. Blin, G. Fertin, F. Sikora, and S. Vialette, "The Exemplar Breakpoint Distance for Nontrivial Genomes Cannot Be Approximated," Proc. Third Workshop Algorithm and Computation, pp. 357-368, 2009. [3] G. Cormode and S. Muthukrishnan, "The String Edit Distance Matching Problem with Moves," Proc. 13th ACM-SIAM Symp. Discrete Algorithms (SODA '02), pp. 667-676, 2002. [4] Z. Chen, R. Fowler, B. Fu, and B. Zhu, "On the Inapproximability of the Exemplar Conserved Interval Distance Problem of Genomes," J. Combinatorial Optimization, vol. 15, no. 2, pp. 201-221, 2008. [5] Z. Chen, B. Fu, B. Yang, J. Xu, Z. Zhao, and B. Zhu, "Non-Breaking Similarity of Genomes with Gene Repetitions," Proc. 18th Ann. Symp. Combinatorial Pattern Matching (CPM '07), pp. 119-130, 2007. [6] Z. Chen, B. Fu, and B. Zhu, "The Approximability of the Exemplar Breakpoint Distance Problem," Proc. Second Intl. Conf. Algorithmic Aspects in Information and Management (AAIM '06), pp. 291-302, 2006. [7] A. Goldstein, P. Kolman, and J. Zheng, "Minimum Common String Partitioning Problem: Hardness and Approximations," Proc. 15th Intl. Symp. Algorithms and Computation (ISAAC '04), pp. 473-484, 2004. [8] J. Hopcroft and R. Karp, "An $n^{5/2}$ Algorithm for Maximum Matchings in Bipartite Graphs," SIAM J. Computer, vol. 2, no. 4, pp. 225-231, 1973. [9] M. Jiang, "The Zero Exemplar Distance Problem," Proc. Int'l Conf. Comparative Genomics (RECOMB-CG '10), pp. 74-82, 2010. [10] H. Jiang, C. Zheng, D. Sankoff, and B. Zhu, "Scaffold Filling under the Breakpoint Distance," Proc. International Conf. Comparative Genomics (RECOMB-CG '10), pp. 83-92, 2010. [11] H. Jiang, F. Zhong, and B. Zhu, "Filling Scaffolds with Gene Repetitions: Maximizing the Number of Adjacencies," Proc. 22nd Ann. Symp. Combinatorial Pattern Matching (CPM '11), pp. 55-64, 2011. [12] H. Jiang, C. Zheng, D. Sankoff, and B. Zhu, "Scaffold Filling Under the Breakpoint and Related Distances," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1220-1229, July/Aug. 2012. [13] A. Muñoz, C. Zheng, Q. Zhu, V. Albert, S. Rounsley, and D. Sankoff, "Scaffold Filling, Contig Fusion and Gene Order Comparison," BMC Bioinformatics, vol. 11, article 304, 2010. [14] D. Sankoff, "Genome Rearrangement with Gene Families," Bioinformatics, vol. 15, no. 11, pp. 909-917, 1999.