Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04) Efficient Filtration of Sequence Similarity Search Through Singular Value Decomposition Taichung, Taiwan, ROC May 19-May 21 ISBN: 0-7695-2173-8
Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole-genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the Singular Value Decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2 .3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.
Index Terms:
Approximate String Search, Sequence Homology, Singular Value Decomposition, Bioinformatics, Comparative genomics
Citation:
S. Alireza Aghili, Ozgur D. Sahin, Divyakant Agrawal, Amr El Abbadi, "Efficient Filtration of Sequence Similarity Search Through Singular Value Decomposition," bibe, pp.403, Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04), 2004 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||