The Community for Technology Leaders
2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2013)
Shanghai, China
Dec. 18, 2013 to Dec. 21, 2013
ISBN: 978-1-4799-1309-1
pp: 55-61
Pei-Yuan Zhou , Department of Computing, The Hong Kong Polytechnic University, Hung Horn, Kowloon, Hong Kong
En-Shiun Annie Lee , Systems Design Engineering, University of Waterloo Waterloo, Ontario, Canada
Andrew K. C. Wong , Systems Design Engineering, University of Waterloo Waterloo, Ontario, Canada
ABSTRACT
Discovering protein patterns for amino acids and their biochemical properties is important for revealing the underlying biophysical models. From this, pattern clustering was introduced in order to relate the discovered protein patterns to taxonomic classes in a localized region of a protein. This paper proposes an algorithm to synthesize and re-group pattern clusters, maximizing their separability in order to reveal class characteristics of the localized region of the protein based on our previous work. To evaluate the pattern clustering and regrouping pattern clusters results, we introduce three evaluation measures: F-measure, class entropy measure, and attribute entropy measure. To validate our proposed algorithm, experiments are run on synthetic data, protein family for amino acid attributes, and chemical property attributes. The experimental results show that: a) the result for regrouping pattern clusters is more accurate in class separation than only using pattern clustering; b) The clusters after regrouping are more distinctly separable with each other than only using pattern clustering; c) two types of pattern clusters are found, with one pertaining to distinct classes and the other associating with two or more related classes; and d) class characteristics are clearly revealed in the data subspace containing the patterns in the pattern clusters. The datasets with chemical properties show that unsupervised techniques can reveal common chemical attributes in the inherent classes as more of the common properties shared by different amino acids are taken into account
INDEX TERMS
Entropy, Proteins, Clustering algorithms, Pattern clustering, Fungi, Complexity theory, Amino acids
CITATION

P. Zhou, E. A. Lee and A. K. Wong, "Regrouping of pattern clusters to reveal characteristics of distinct classes and related classes," 2013 IEEE International Conference on Bioinformatics and Biomedicine(BIBM), Shanghai, China China, 2013, pp. 55-61.
doi:10.1109/BIBM.2013.6732718
83 ms
(Ver 3.3 (11022016))