This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Improved Multiple Sequence Alignments Using Coupled Pattern Mining
Sept.-Oct. 2013 (vol. 10 no. 5)
pp. 1098-1112
K.S.M.Tozammel Hossain, Virginia Tech, Blacksburg
Debprakash Patnaik, Amazon Inc., Seattle
Srivatsan Laxman, Microsoft Research, Bangalore
Prateek Jain, Microsoft Research, Bangalore
Chris Bailey-Kellogg, Dartmouth College, Hanover
Naren Ramakrishnan, Virginia Tech, Blacksburg
We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.
Index Terms:
Sequential analysis,Hidden Markov models,Classification algorithms,Amino acids,Bioinformatics,Proteins,max-flow problems,Multiple sequence alignment,coupled residues,pattern set mining,coupled patterns
Citation:
K.S.M.Tozammel Hossain, Debprakash Patnaik, Srivatsan Laxman, Prateek Jain, Chris Bailey-Kellogg, Naren Ramakrishnan, "Improved Multiple Sequence Alignments Using Coupled Pattern Mining," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 5, pp. 1098-1112, Sept.-Oct. 2013, doi:10.1109/TCBB.2013.36
Usage of this product signifies your acceptance of the Terms of Use.