Subscribe
Issue No.04 - April (2008 vol.20)
pp: 519-531
ABSTRACT
Similarity search for $3$D structure data sets is fundamental to many database applications such as molecular biology, image registration and computer aided design. However, it is well known that computing structural similarity is extremely expensive due to high exponential time complexity of structure similarity measures. As the structure databases keep growing rapidly, real-time search from large structure databases becomes problematic. In this paper, we present a novel statistical model, multi-resolution \textit{Localized Co-occurrence Model} (LCM), to approximately measure the similarity between the two $3$D structures in linear time complexity for fast retrieval. LCM could capture both distribution characteristics and spatial structure of $3$D data by localizing the point co-occurrence relationship within a predefined neighborhood system. A novel structure query processing method called \textit{iBound} is also proposed for further computational reduction. iBound avoids a large amount of expensive computation at higher resolution LCMs. By superposing two LCMs, their largest common substructure can also be found quickly. Finally, our experiment results prove the effectiveness and efficiency of our methods.
INDEX TERMS
Simialrity Search, Approximate Search, Structural and Statistical Database
CITATION
Zi Huang, Heng Tao Shen, Xiaofang Zhou, "Localized Co-Occurrence Model for Fast Approximate Search in 3D Structure Databases", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 4, pp. 519-531, April 2008, doi:10.1109/TKDE.2007.190729
REFERENCES
 [1] P. Artymiuk, R. Spriggs, and P. Willett, “Graph Theoretic Methods for the Analysis of Structural Relationships in Biological Macromolecules,” J. Am. Soc. Information Science and Technology, vol. 56, no. 5, pp. 518-528, 2005. [2] O. Amoglu, T. Kahveci, and A. Singh, “Towards Index-Based Similarity Search for Protein Structure Databases,” Proc. IEEE Bioinformatics Conf. (CSB '03), pp. 148-158, 2003. [3] L. Holm and C. Sander, “Searching Protein Structure Database Has Come of Age,” Proteins, vol. 19, pp. 165-173, 1994. [4] R. Spriggs, P. Artymiuk, and P. Willett, “Searching for Patterns of Amino Acids in 3D Protein Structures,” J. Chemical Information and Computer Sciences, vol. 43, no. 2, pp. 412-421, 2003. [5] E. Ingvar, J. Inge, and W.R. Taylor, Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis. Wiley, 2004. [6] T. Akutsu, H. Tamaki, and T. Tokuyama, “Distribution of Distances and Triangles in a Point Set and Algorithms for Computing the Largest Common Point Sets,” Discrete and Computational Geometry, pp. 307-331, 1998. [7] J.-F. Gibrat, T. Madej, and S. Bryant, “Surprising Similarities in Structure Comparison,” Current Opinion in Structural Biology, vol. 6, pp. 377-385, 1996. [8] E. Gardiner, P. Artymiuk, and P. Willett, “Clique-Detection Algorithms for Matching Three-Dimensional Molecular Structures,” J. Molecular Graphics and Modeling, vol. 15, pp. 245-253, 1997. [9] H. Wolfson, “Geometric Hashing: An Overview,” IEEE Computational Science and Eng., vol. 4, no. 4, pp. 10-21, 1997. [10] N. Weskamp, D. Kuhn, E. Hüllermeier, and G. Klebe, “Efficient Similarity Search in Protein Structure Databases by $k$ -Clique Hashing,” Bioinformatics, vol. 20, no. 10, pp. 1522-1526, 2004. [11] “Protein Data Bank,” http://www.rcsb.orgpdb/, May 2007. [12] Z. Xie and G.E. Farin, “Image Registration Using Hierarchical B-Splines,” IEEE Trans. Visualization and Computer Graphics, vol. 10, no. 1, pp. 85-94, Jan./Feb. 2004. [13] H.-P. Kriegel, S. Brecheisen, P. Krger, M. Pfeifle, and M. Schubert, “Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects,” Proc. ACM SIGMOD '03, pp. 587-598, 2003. [14] B. Bustos, D.A. Keim, D. Saupe, T. Schreck, and D.V. Vranič, “Feature-Based Similarity Search in 3D Object Databases,” ACM Computing Surveys, vol. 37, no. 4, pp. 345-387, 2005. [15] H.S. Ip and Y.F. Wong, “3D Head Models Retrieval Based on Hierarchical Facial Region Similarity,” Proc. 15th Int'l Conf. Vision Interface (VI '02), pp. 314-319, 2002. [16] R. Ohbuchi, T. Otagiri, M. Ibato, and T. Takei, “Shape-Similarity Search of Three-Dimensional Models Using Parameterized Statistics,” Proc. 10th Pacific Conf. Computer Graphics and Applications (PG '02), pp. 265-273, 2002. [17] R. Ohbuchi, T. Minamitani, and T. Takei, “Shape-Similarity Search of 3D Models by Using Enhanced Shape Functions,” Proc. Theory and Practice of Computer Graphics, p. 97, 2003. [18] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, “Shape Distributions,” ACM Trans. Graphics, vol. 21, no. 4, pp. 807-832, 2002. [19] E. Paquet, M. Rioux, A. Murching, T. Naveen, and A. Batabai, “Description of Shape Information for 2D and 3D Objects,” Signal Processing: Image Comm., vol. 16, pp. 103-122, 2000. [20] M. Ankerst, G. Kastenmüller, H.-P. Kriegel, and T. Seidl, “3D Shape Histograms for Similarity Search and Classification in Spatial Databases,” Proc. Sixth Int'l Symp. Advances in Spatial Databases (SSD '99), pp. 207-226, 1999. [21] D.A. Keim, “Efficient Geometry-Based Similarity Search of 3D Spatial Databases,” Proc. ACM SIGMOD '99, pp. 419-430, 1999. [22] T. Funkhouser, P. Min, M. Kazhdan, J. Chen, A. Halderman, D. Dobkin, and D. Jacobs, “A Search Engine for 3D Models,” ACM Trans. Graphics, vol. 22, no. 1, pp. 83-105, 2003. [23] T. Zaharia and F. Prêteux, “Shape-Based Retrieval of 3D Mesh Models,” Proc. IEEE Int'l Conf. Multimedia and Expo (ICME '02), 2002. [24] L. Holm and C. Sander, “Mapping the Protein Universe,” Science, vol. 273, pp. 595-603, 1996. [25] S. Chakraborty and S. Biswas, “Approximation Algorithms for 3D Common Substructure Identification in Drug and Protein Molecules,” Proc. Sixth Workshop Algorithms and Data Structures (WADS '99), pp. 253-264, 1999. [26] V. Choi and N. Goyal, “An Efficient Approximation Algorithm for Point Pattern Matching under Noise,” Proc. Seventh Latin Am. Symp. Theoretical Informatics (LATIN '06), pp. 298-310, 2006. [27] C. Bron and J. Kerbosch, “Algorithm 457: Finding All Cliques of an Undirected Graph,” Comm. ACM, vol. 16, no. 9, pp. 575-577, 1973. [28] D. Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001. [29] A. Efrat and A. Itai, “Improvements on Bottleneck Matching and Related Problems Using Geometry,” Proc. 12th ACM Ann. Symp. Computational Geometry (SCG '96), pp. 301-310, 1996. [30] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “iDistance: An Adaptive ${\rm b}^{+}\hbox{-}{\rm Tree}$ Based Indexing Method for Nearest Neighbor Search,” ACM Trans. Database Systems, vol. 30, no. 2, pp. 364-397, 2005. [31] R. Weber, H. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 194-205, 1998. [32] N. Koudas, B.C. Ooi, H.T. Shen, and A. Tung, “LDC: Enabling Search by Partial Distance in a Hyper-Dimensional Space,” Proc. 20th IEEE Int'l Conf. Data Eng. (ICDE '04), pp. 6-17, 2004. [33] M.E. Houle and J. Sakuma, “Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets,” Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE '05), 2005. [34] A. Andreeva, D. Howorth, C. Brenner, T.J. Hubbard, C. Chothia, and A.G. Murzin, “Scop Database in 2004: Refinements Integrate Structure and Sequence Family Data,” Nucleic Acids Research, vol. 32, pp. 226-229, 2004. [35] “SCOP: Structural Classification of Proteins,” http://scop.mrc-lmb.cam.ac.ukscop/, 2006. [36] X. Wang and J.T.L. Wang, “Fast Similarity Search in Three-Dimensional Structure Databases,” J. Chemical Information and Computer Sciences, vol. 40, no. 2, pp. 442-451, 2000.