Subscribe

Issue No.04 - April (2008 vol.20)

pp: 519-531

ABSTRACT

Similarity search for $3$D structure data sets is fundamental to many database applications such as molecular biology, image registration and computer aided design. However, it is well known that computing structural similarity is extremely expensive due to high exponential time complexity of structure similarity measures. As the structure databases keep growing rapidly, real-time search from large structure databases becomes problematic. In this paper, we present a novel statistical model, multi-resolution \textit{Localized Co-occurrence Model} (LCM), to approximately measure the similarity between the two $3$D structures in linear time complexity for fast retrieval. LCM could capture both distribution characteristics and spatial structure of $3$D data by localizing the point co-occurrence relationship within a predefined neighborhood system. A novel structure query processing method called \textit{iBound} is also proposed for further computational reduction. iBound avoids a large amount of expensive computation at higher resolution LCMs. By superposing two LCMs, their largest common substructure can also be found quickly. Finally, our experiment results prove the effectiveness and efficiency of our methods.

INDEX TERMS

Simialrity Search, Approximate Search, Structural and Statistical Database

CITATION

Zi Huang, Heng Tao Shen, Xiaofang Zhou, "Localized Co-Occurrence Model for Fast Approximate Search in 3D Structure Databases",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 4, pp. 519-531, April 2008, doi:10.1109/TKDE.2007.190729REFERENCES

- [2] O. Amoglu, T. Kahveci, and A. Singh, “Towards Index-Based Similarity Search for Protein Structure Databases,”
Proc. IEEE Bioinformatics Conf. (CSB '03), pp. 148-158, 2003.- [3] L. Holm and C. Sander, “Searching Protein Structure Database Has Come of Age,”
Proteins, vol. 19, pp. 165-173, 1994.- [4] R. Spriggs, P. Artymiuk, and P. Willett, “Searching for Patterns of Amino Acids in 3D Protein Structures,”
J. Chemical Information and Computer Sciences, vol. 43, no. 2, pp. 412-421, 2003.- [5] E. Ingvar, J. Inge, and W.R. Taylor,
Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis. Wiley, 2004.- [6] T. Akutsu, H. Tamaki, and T. Tokuyama, “Distribution of Distances and Triangles in a Point Set and Algorithms for Computing the Largest Common Point Sets,”
Discrete and Computational Geometry, pp. 307-331, 1998.- [9] H. Wolfson, “Geometric Hashing: An Overview,”
IEEE Computational Science and Eng., vol. 4, no. 4, pp. 10-21, 1997.- [11] “Protein Data Bank,” http://www.rcsb.orgpdb/, May 2007.
- [15] H.S. Ip and Y.F. Wong, “3D Head Models Retrieval Based on Hierarchical Facial Region Similarity,”
Proc. 15th Int'l Conf. Vision Interface (VI '02), pp. 314-319, 2002.- [20] M. Ankerst, G. Kastenmüller, H.-P. Kriegel, and T. Seidl, “3D Shape Histograms for Similarity Search and Classification in Spatial Databases,”
Proc. Sixth Int'l Symp. Advances in Spatial Databases (SSD '99), pp. 207-226, 1999.- [23] T. Zaharia and F. Prêteux, “Shape-Based Retrieval of 3D Mesh Models,”
Proc. IEEE Int'l Conf. Multimedia and Expo (ICME '02), 2002.- [25] S. Chakraborty and S. Biswas, “Approximation Algorithms for 3D Common Substructure Identification in Drug and Protein Molecules,”
Proc. Sixth Workshop Algorithms and Data Structures (WADS '99), pp. 253-264, 1999.- [26] V. Choi and N. Goyal, “An Efficient Approximation Algorithm for Point Pattern Matching under Noise,”
Proc. Seventh Latin Am. Symp. Theoretical Informatics (LATIN '06), pp. 298-310, 2006.- [28] D. Mount,
Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001.- [30] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “iDistance: An Adaptive ${\rm b}^{+}\hbox{-}{\rm Tree}$ Based Indexing Method for Nearest Neighbor Search,”
ACM Trans. Database Systems, vol. 30, no. 2, pp. 364-397, 2005.- [31] R. Weber, H. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,”
Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 194-205, 1998.- [33] M.E. Houle and J. Sakuma, “Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets,”
Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE '05), 2005.- [34] A. Andreeva, D. Howorth, C. Brenner, T.J. Hubbard, C. Chothia, and A.G. Murzin, “Scop Database in 2004: Refinements Integrate Structure and Sequence Family Data,”
Nucleic Acids Research, vol. 32, pp. 226-229, 2004.- [35] “SCOP: Structural Classification of Proteins,” http://scop.mrc-lmb.cam.ac.ukscop/, 2006.
- [36] X. Wang and J.T.L. Wang, “Fast Similarity Search in Three-Dimensional Structure Databases,”
J. Chemical Information and Computer Sciences, vol. 40, no. 2, pp. 442-451, 2000. |