The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2011 vol.8)
pp: 1400-1410
Zhi-Zhong Chen , Tokyo Denki University, Hatomaya, Saitama
Lusheng Wang , City University of Hong Kong, Hong Kong
ABSTRACT
We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd\cdot 17.97^d) time for DNA strings and in O(nL + nd\cdot 61.86^d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + nd\cdot 13.92^d) time for DNA strings and in O(nL + nd\cdot 47.21^d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n-1)m^2(L + d\cdot 17.97^d\cdot m^{\lfloor \log_2(d+1)\rfloor })) time for DNA strings and in O((n-1)m^2(L + d\cdot 61.86^d\cdot m^{\lfloor \log_2(d+1)\rfloor })) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too.
INDEX TERMS
Parameterized algorithm, closest string, closest substring, DNA motif discovery.
CITATION
Zhi-Zhong Chen, Lusheng Wang, "Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1400-1410, September/October 2011, doi:10.1109/TCBB.2011.21
REFERENCES
[1] A. Andoni, P. Indyk, and M. Patrascu, “On the Optimality of the Dimensionality Reduction Method,” Proc. 47th IEEE Symp. Foundations of Computer Science, pp. 449-458, 2006.
[2] A. Ben-Dor, G. Lancia, J. Perone, and R. Ravi, “Banishing Bias from Consensus Sequences,” Proc. Eighth Symp. Combinatorial Pattern Matching, pp. 247-261, 1997.
[3] J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” J. Computational Biology, vol. 9, pp. 225-242, 2002.
[4] J. Davila, S. Balla, and S. Rajasekaran, “Fast and Practical Algorithms for Planted (l, d) Motif Search,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, Oct.-Dec. 2007.
[5] X. Deng, G. Li, Z. Li, B. Ma, and L. Wang, “Genetic Design of Drugs without Side-Effects,” SIAM J. Computing, vol. 32, pp. 1073-1090, 2003.
[6] J. Dopazo, A. Rodríguez, J.C. Sáiz, and F. Sobrino, “Design of Primers for PCR Amplification of Highly Variable Genomes,” Computer Applications in the Biosciences, vol. 9, pp. 123-125, 1993.
[7] R.G. Downey and M.R. Fellows, Parameterized Complexity. Springer, 1999.
[8] M.R. Fellows, J. Gramm, and R. Niedermeier, “On the Parameterized Intractability of Motif Search Problems,” Combinatorica, vol. 26, pp. 141-167, 2006.
[9] M. Frances and A. Litman, “On Covering Problems of Codes,” Theoretical Computer Science, vol. 30, pp. 113-119, 1997.
[10] J. Gramm, F. Huffner, and R. Niedermeier, “Closest Strings, Primer Design, and Motif Search,” Currents in Computational Molecular Biology, poster abstracts of RECOMB 2002, pp. 74-75, 2002.
[11] J. Gramm, R. Niedermeier, and P. Rossmanith, “Fixed-Parameter Algorithms for Closest String and Related Problems,” Algorithmica, vol. 37, pp. 25-42, 2003.
[12] K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang, “Distinguishing String Selection Problems,” Information and Computation, vol. 185, pp. 41-55, 2003.
[13] M. Li, B. Ma, and L. Wang, “Finding Similar Regions in Many Sequences,” J. Computer and System Sciences, vol. 65, pp. 73-96, 2002.
[14] M. Li, B. Ma, and L. Wang, “On the Closest String and Substring Problems,” J. ACM, vol. 49, pp. 157-171, 2002.
[15] X. Liu, H. He, and O. Sýkora, “Parallel Genetic Algorithm and Parallel Simulated Annealing Algorithm for the Closest String Problem,” Proc. First Int'l Conf. Advanced Data Mining and Applications (ADMA '05), pp. 591-597, 2005.
[16] K. Lucas, M. Busch, S. össinger, and J.A. Thompson, “An Improved Microcomputer Program for Finding Gene- or Gene Family-Specific Oligonucleotides Suitable as Primers for Polymerase Chain Reactions or as Probes,” Computer Applications in the Biosciences, vol. 7, pp. 525-529, 1991.
[17] B. Ma and X. Sun, “More Efficient Algorithms for Closest String and Substring Problems,” Proc. 12th Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 396-409, 2008.
[18] D. Marx, “The Closest Substring Problem with Small Distances,” Proc. 46th IEEE Symp. Foundations of Computer Science, pp. 63-72, 2005.
[19] H. Mauch, M.J. Melzer, and J.S. Hu, “Genetic Algorithm Approach for the Closest String Problem,” Proc. Second IEEE Computer Soc. Bioinformatics Conf., pp. 560-561, 2003.
[20] C.N. Meneses, Z. Lu, C.A.S. Oliveira, and P.M. Pardalos, “Optimal Solutions for the Closest String Problem via Integer Programming,” INFORMS J. Computing, vol. 16, pp. 419-429, 2004.
[21] P. Pevzner and S.-H. Sze, “Combinatorial Approaches to Finding Subtle Signals in DNA Sequences,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 269-278, 2000.
[22] V. Proutski and E.C. Holme, “Primer Master: A New Program for the Design and Analysis of PCR Primers,” Computer Applications in the Biosciences, vol. 12, pp. 253-255, 1996.
[23] N. Stojanovic, P. Berman, D. Gumucio, R. Hardison, and W. Miller, “A Linear-Time Algorithm for the 1-Mismatch Problem,” Proc. Fifth Int'l Workshop Algorithms and Data Structures, pp. 126-135, 1997.
[24] Y. Wang, W. Chen, X. Li, and B. Cheng, “Degenerated Primer Design to Amplify the Heavy Chain Variable Region from Immunoglobulin cDNA,” BMC Bioinformatics, vol. 7, suppl. 4, p. S9, 2006.
[25] L. Wang and L. Dong, “Randomized Algorithms for Motif Detection,” J. Bioinformatics and Computational Biology, vol. 3, pp. 1038-1052, 2005.
[26] L. Wang and B. Zhu, “Efficient Algorithms for the Closest String and Distinguishing String Selection Problems,” Proc. Third Int'l Workshop Frontiers in Algorithms, pp. 261-270, 2009.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool