The Community for Technology Leaders
2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
ISBN: 978-1-5090-3051-4
pp: 62-69
Pei-Yuan Zhou , VaryWave Technology Co., Ltd, No. 6W, Hong Kong Science Park, Shatin, NT, Hong Kong
Antonio Sze-Tzo , Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada
Andrew K. C. Wong , Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada
ABSTRACT
Proteins from the same family have similar functions. Hence, it is important to discover from a protein family conserved sequence patterns with variations to unveil the functionality of a functional domain. Aligned Pattern Clusters (APCs) are knowledge-rich representations comparing with probabilistic models. If significant aligned residue associations (ARAs) were discovered in APCs, they could reveal subtle functional or subgroup characteristics. However, when ARAs corresponding to different subgroups/classes were entangled due to certain subtle factors, to disentangle them to reveal succinct ARA groups is a big challenge. This paper presents a novel method known as Aligned Residual Association Discovery and Disentanglement (ARADD), to meet such challenge. ARADD first constructs an ARA Frequency Matrix (ARAFM) and converts it into a Statistical Residual (SR) Vector Space (SRV) to suppress noise. SR measures the deviation of the observed frequency of an event against that when the occurrence is random. By applying Principal Component Decomposition (PCD) on the SRV, we obtain PCs ranked by their variance. The ARAs of an AR with others can be represented by an AR-vector whose coordinates account for its associations with others. When the projection of an AR vector on a PC Space is reprojected to the SRV (abbreviated by RSRVs), its coordinates reflect the SRs of that AR associating with other ARs. Experiments showed that the ARADD can a) disentangle entangled ARAs in APCs, b) reveal subtle AR clusters relating to classes or ARAs within or between subgroups — significant to proteomic research, drug discovery and personalized medicine
INDEX TERMS
Amino acids, Frequency conversion, Frequency measurement, Probabilistic logic, Matrix converters, Protein sequence
CITATION

P. Zhou, A. Sze-Tzo and A. K. Wong, "Discovery and disentanglement of protein aligned pattern clusters to reveal subtle functional subgroups," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 62-69.
doi:10.1109/BIBM.2017.8217625
282 ms
(Ver 3.3 (11022016))