loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
First International Workshop on Similarity Search and Applications (sisap 2008)
Approximate Similarity Search in Genomic Sequence Databases Using Landmark-Guided Embedding
April 11-April 12
ISBN: 978-0-7695-3101-4
Similarity search in sequence databases is of paramount importance in bioinformatics research. As the size of the genomic databases increases, similarity search of proteins in these databases becomes a bottle-neck in large-scale studies, calling for more efficient methods of content-based retrieval. In this study, we present a metric-preserving, landmark-guided embedding approach to represent sequences in the vector domain in order to allow efficient indexing and similarity search. We analyze various properties of the embedding and show that the approximation achieved by the embedded representation is sufficient to achieve biologically relevant results. The approximate representation is shown to provide several orders of magnitude speed-up in similarity search compared to the exact representation, while maintaining comparable search accuracy.
Index Terms:
approximate similarity search, proteins, sequences, database, indexing, metric space, multi-dimensional scaling
Citation:
Ahmet Sacan, I. Hakki Toroslu, "Approximate Similarity Search in Genomic Sequence Databases Using Landmark-Guided Embedding," sisap, pp.43-50, First International Workshop on Similarity Search and Applications (sisap 2008), 2008
Usage of this product signifies your acceptance of the Terms of Use.