|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs
Oct. 2012 (vol. 23 no. 10)
pp. 1923-1933
| ASCII Text | x | ||
| Changjun Wu, Ananth Kalyanaraman, William R. Cannon, "pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 10, pp. 1923-1933, Oct., 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2012.19, author = {Changjun Wu and Ananth Kalyanaraman and William R. Cannon}, title = {pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {23}, number = {10}, issn = {1045-9219}, year = {2012}, pages = {1923-1933}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.19}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs IS - 10 SN - 1045-9219 SP1923 EP1933 EPD - 1923-1933 A1 - Changjun Wu, A1 - Ananth Kalyanaraman, A1 - William R. Cannon, PY - 2012 KW - Protein sequence KW - Computational modeling KW - Amino acids KW - DNA KW - Image edge detection KW - Dynamic programming KW - producer-consumer model KW - Parallel protein sequence homology detection KW - parallel sequence graph construction KW - hierarchical master-worker paradigm VL - 23 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.19
Web Extra: View Supplemental Material(PDF)
Detecting sequence homology between protein sequences is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting the homology between two protein sequences is relatively inexpensive, detecting pairwise homology for a large number of protein sequences can become computationally prohibitive for modern inputs, often requiring millions of CPU hours. Yet, there is currently no robust support to parallelize this kernel. In this paper, we identify the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for detecting homology on large data sets using distributed memory parallel computers. Our method, called pGraph, is a novel hybrid between the hierarchical multiple-master/worker model and producer-consumer model, and is designed to break the irregularities imposed by alignment computation and work generation. Experimental results show that pGraph achieves linear scaling on a 2,048 processor distributed memory cluster for a wide range of inputs ranging from as small as 20,000 sequences to 2,560,000 sequences. In addition to demonstrating strong scaling, we present an extensive report on the performance of the various system components and related parametric studies.
Index Terms:
Protein sequence,Computational modeling,Amino acids,DNA,Image edge detection,Dynamic programming,producer-consumer model,Parallel protein sequence homology detection,parallel sequence graph construction,hierarchical master-worker paradigm
Citation:
Changjun Wu, Ananth Kalyanaraman, William R. Cannon, "pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 10, pp. 1923-1933, Oct. 2012, doi:10.1109/TPDS.2012.19
Usage of this product signifies your acceptance of the Terms of Use.

