The Community for Technology Leaders
2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
ISBN: 978-1-5090-3051-4
pp: 28-35
Antonio Sze-To , Systems Design Engineering, University of Waterloo, Waterloo, Canada
Andrew K. C. Wong , Systems Design Engineering, University of Waterloo, Waterloo, Canada
ABSTRACT
Functional region identification is of fundamental importance for protein sequences analysis for a protein family. Such knowledge not only provides a better scientific understanding but also assists drug discovery. Domain annotation is one approach but it needs to leverage existing databases. For de novo discovery, motif discovery locates and aligns locally similar sub-sequences and represents them as a position-weight matrix (PWM). However, PWM is a fixed-length model whereas protein functional region size varies. Furthermore, to obtain a PWM, a width range parameter needs to be identified through exhaustive search. Hence, it is computational intensive for large dataset. This paper presents a new method known as Pattern-Directed Aligned Pattern Clustering (PD-APCn) to discover and align residues in conserved protein functional regions. It adopts Aligned Pattern Cluster (APC) as the representation model which allows variable pattern length. It uses patterns with strong support to direct the incremental expansion of the APCs, allowing substitution and frame-shift mutations, until a robust termination condition is reached. The concept of breakpoint gap is introduced to identify uncovered conserved patterns with substitution and frame-shift mutations, where these are often rare mutants. To evaluate the performance of PD-APCn, we conducted experiments on synthetic datasets with different size and noise level. Comparing with the popular motif discovery algorithm MEME, PD-APCn has demonstrated competitive performance throughout the experiments, obtaining a higher recall and Fmeasure with up to 400× significant computational speed up comparing to MEME.
INDEX TERMS
Proteins, Pattern clustering, Pulse width modulation, Clustering algorithms, Amino acids, Protein engineering, Bioinformatics
CITATION

A. Sze-To and A. K. Wong, "Pattern-directed aligned pattern clustering," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 28-35.
doi:10.1109/BIBM.2017.8217620
205 ms
(Ver 3.3 (11022016))