loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE Computational Systems Bioinformatics Conference (CSB'05)
On Optimizing Distance-Based Similarity Search for Biological Databases
Stanford, California
August 08-August 11
ISBN: 0-7695-2344-7
Rui Mao, University of Texas at Austin
Weijia Xu, University of Texas at Austin
Smriti Ramakrishnan, University of Texas at Austin
Glen Nuckolls, University of Texas at Austin
Daniel P. Miranker, University of Texas at Austin
Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits.
Citation:
Rui Mao, Weijia Xu, Smriti Ramakrishnan, Glen Nuckolls, Daniel P. Miranker, "On Optimizing Distance-Based Similarity Search for Biological Databases," csb, pp.351-361, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.