The Community for Technology Leaders
Green Image
A successful application of data mining to bioinformatics is protein classification. A number of techniques have been developed to classify proteins according to important features in their sequences, secondary structures, or three-dimensional structures. In this paper, we introduce a novel approach to protein classification based on significant patterns discovered on the surface of a protein. We define a notion called \alpha{\hbox{-}}{\rm{surface}}. We discuss the geometric properties of \alpha{\hbox{-}}{\rm{surface}} and present an algorithm that calculates the \alpha{\hbox{-}}{\rm{surface}} from a finite set of points in R^{3}. We apply the algorithm to extracting the \alpha{\hbox{-}}{\rm{surface}} of a protein and use a pattern discovery algorithm to discover frequently occurring patterns on the surfaces. The pattern discovery algorithm utilizes a new index structure called the \Delta{\rm{B}}^{+} tree. We use these patterns to classify the proteins. While most existing techniques focus on the binary classification problem, we apply our approach to classifying three families of proteins. Experimental results show the good performance of the proposed approach.
Index Terms- KDD, classification, data mining, structural pattern discovery, biochemistry, medicine.

X. Wang, "Finding Patterns on Protein Surfaces: Algorithms and Applications to Protein Classification," in IEEE Transactions on Knowledge & Data Engineering, vol. 17, no. , pp. 1065-1078, 2005.
87 ms
(Ver 3.3 (11022016))