Issue No. 03 - July-September (2007 vol. 4)
In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of non-overlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that when the length of a non-uniformly spaced seed is bounded above by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed of the same weight in both (i) the average number of non-overlapping hits and (ii) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds.
Homology search, pattern matching, sequence alignment, spaced seeds, renewal theory, run statistics
Louxin Zhang, "Superiority of Spaced Seeds for Homology Search", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. , pp. 496-505, July-September 2007, doi:10.1109/tcbb.2007.1013