Issue No. 03 - July-September (2009 vol. 6)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.4
Ewa Szczurek , Max Planck Institute for Molecular Genetics, Berlin
Eugenia Furletova , Institute of Mathematical Problems in Biology, Pushchino, Moscow
Laurent Noé , LIFL/CNRS/INRIA, France
Gregory Kucherov , LIFL/CNRS/INRIA, France
Slawomir Lasota , Warsaw University, Poland
Mikhail Roytberg , Institute of Mathematical Problems in Biology, Pushchino, Moscow
Anna Gambin , Warsaw University, Poland
We apply the concept of subset seeds proposed in  to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method , , as well as with the family of vector seeds proposed in . While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.
Protein sequences, protein databases, local alignment, similarity search, seeds, subset seeds, multiple seeds, seed alphabet, sensitivity, selectivity.
Ewa Szczurek, Eugenia Furletova, Laurent Noé, Gregory Kucherov, Slawomir Lasota, Mikhail Roytberg, Anna Gambin, "On Subset Seeds for Protein Alignment", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. , pp. 483-494, July-September 2009, doi:10.1109/TCBB.2009.4