Issue No. 06 - Nov.-Dec. (2012 vol. 9)

ISSN: 1545-5963

pp: 1843-1846

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.84

C. Boucher , Dept. of Comput. Sci., Colorado State Univ., Fort Collins, CO, USA

M. Omar , Dept. of Math., California Inst. of Technol., Pasadena, CA, USA

ABSTRACT

Given a set S of n strings, each of length ℓ, and a nonnegative value d, we define a center string as a string of length ` that has Hamming distance at most d from each string in S. The #CLOSEST STRING problem aims to determine the number of center strings for a given set of strings S and input parameters n, ℓ, and d. We show #CLOSEST STRING is impossible to solve exactly or even approximately in polynomial time, and that restricting #CLOSEST STRING so that any one of the parameters n, ℓ, or d is fixed leads to a fully polynomial-time randomized approximation scheme (FPRAS). We show equivalent results for the problem of efficiently sampling center strings uniformly at random (u.a.r.).

INDEX TERMS

Hamming distance, Approximation methods, Polynomials, Approximation algorithms, Bioinformatics, Computational biology, Sequential analysis,computational complexity, Biological sequence analysis, motif recognition, fully polynomial-time randomized approximation scheme (FPRAS), journal, fully polynomial almost uniform sampler (FPAUS)

CITATION

C. Boucher, M. Omar, "On the Hardness of Counting and Sampling Center Strings",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol. 9, no. , pp. 1843-1846, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.84