This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Parallel Pattern-Based Systems for Computational Biology: A Case Study
August 2006 (vol. 17 no. 8)
pp. 750-763

Abstract—Computational biology research is now faced with the burgeoning number of genome data. The rigorous postprocessing of this data requires an increased role for high-performance computing (HPC). Because the development of HPC applications for computational biology problems is much more complex than the corresponding sequential applications, existing traditional programming techniques have demonstrated their inadequacy. Many high level programming techniques, such as skeleton and pattern-based programming, have therefore been designed to provide users new ways to get HPC applications without much effort. However, most of them remain absent from the mainstream practice for computational biology. In this paper, we present a new parallel pattern-based system prototype for computational biology. The underlying programming techniques are based on generic programming, a programming technique suited for the generic representation of abstract concepts. This allows the system to be built in a generic way at application level and, thus, provides good extensibility and flexibility. We show how this system can be used to develop HPC applications for popular computational biology algorithms and lead to significant runtime savings on distributed memory architectures.

[1] Globus Project, http:/www.globus.org/, 2003.
[2] http://www-unix.mcs.anl.gov/mpimpich/, 2003.
[3] G. Almasi and A. Gottlieb, Highly Parallel Computing. Benjamin/Cummings Publishing Company, 1994.
[4] J. Anvik, J. Schaeffer, D. Szafron, and K. Tan, “Why Not Use a Pattern-Based Parallel Programming System,” Proc. EURO-PAR '03, 2003.
[5] A. Bartoli, P. Corsini, G. Dini, and C. Prete, “Graphical Design of Distributed Applications through Reusable Components,” IEEE Parallel Distributed Technology, vol. 3, 1995.
[6] J. Browne, M. Azam, and S. Sobek, “A Unified Approach to Parallel Programming,” IEEE Software, pp. 10-18, 1989.
[7] J. Browne, S. Hyder, J. Dongarra, K. Moore, and P. Newton, “Visual Programming and Debugging for Parallel Computing,” IEEE Parallel Distributed Technology, vol. 3, 1995.
[8] E. Cantu-Paz, “A Survey of Parallel Genetic Algorithms,” Calculateurs Paralleles, Reseaux et Systems Repartis, vol. 10, no. 2, pp. 141-171, 1998.
[9] M. Cole, “Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming,” Parallel Computing, vol. 30, no. 3, pp. 389-406, 2004.
[10] J. Deutsch, “Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction,” Bioinformatics, vol. 19, no. 1, pp. 45-52, 2003.
[11] K. Dill, S. Bromberg, K. Yue, K. Fiebig, D. Yee, P. Thomas, and H. Chan, “Principles of Protein Folding: A Perspective from Simple Exact Models,” Protein Science, vol. 4, pp. 561-602, 1995.
[12] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acids. Cambridge Univ. Press, 1998.
[13] Z. Galil and K. Park, “Dynamic Programming with Convexity, Concavity and Sparsity,” Theoretical Computer Science, vol. 92, pp. 49-76, 1992.
[14] M. Gelfand, A. Mironov, and P.A. Pevzner, “Gene Recognition via Spliced Sequence Alignment,” Proc. Nat'l Academy of Science, vol. 93, pp. 9061-9066, 1996.
[15] J. Gerlach, “Generic Programming of Parallel Application with Janus,” Parallel Processing Letters, vol. 12, no. 2, pp. 175-190, 2002.
[16] X. Huang and K.M. Chao, “A Generalized Global Alignment Algorithm,” Bioinformatics, vol. 19, no. 2, pp. 228-233, 2003.
[17] N. Karonis, B. Toonen, and I. Foster, “Mpich-G2: A Grid Enabled Implementation of the Message Passing Interface,” J. Parallel and Distributed Computing, vol. 63, no. 5, 2003.
[18] H. Kuchen, “A Skeleton Library,” Proc. EURO-PAR '02, 2002.
[19] V. Kumar, A. Grama, A. Gupa, and G. Karypis, Introduction to Parallel Computing. The Benjamin-Cummings Publishing Company, Inc., 1994.
[20] W. Liu and B. Schmidt, “A Generic Parallel Pattern-Based System for Bioinformatics,” Proc. EURO-PAR '04, 2004.
[21] D. Mount, Bioinformatics-Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001.
[22] C. Notredame and D. Higgins, “SAGA: Sequence Alignment by Genetic Algorithm,” Nucleic Acid Research, vol. 24, pp. 1515-1524, 1996.
[23] J. Pedersen and J. Moult, “Genetic Algorithms for Protein Structure Prediction,” Current Opinion in Structural Biology, vol. 6, no. 2, pp. 227-231, 1996.
[24] E. Santos Jr., “Reducing the Computational Load of Energy Evaluations for Protein Folding,” Proc. Fourth IEEE Symp. Bioinformatics and Bioengineering, pp. 79-88, 2004.
[25] J. Schaeffer, D. Szafron, G. Lobe, and I. Parsons, “The Enterprise Model for Developing Distributed Applications,” IEEE Parallel and Distributed Technology, vol. 1, pp. 85-96, 1993.
[26] R. Shonkwiler, “Parallel Genetic Algorithms,” Proc. Fifth Int'l Conf. Genetic Algorithms, 1992.
[27] A. Singh, J. Schaeffer, and M. Green, “A Template-Based Tool for Building Applications in a Multi-Computer Network Environment,” Parallel Computing, pp. 461-466, 1989.
[28] S. Siu and A. Singh, “Design Patterns for Parallel Computing Using a Network of Processors,” Proc. Sixth IEEE Int'l Symp. High-Performance Distributed Computing, pp. 293-304, 1997.
[29] T. Smith and M. Waterman, “Identification of Common Subsequences,” J. Molecular Biology, pp. 195-197, 1981.
[30] R. Unger and J. Moult, “Genetic Algorithms for Protein Folding Simulations,” J. Molecular Biology, 1993.
[31] D. Vandevoorde and N. Josuttis, C++ Template: The Complete Guide. Addison Wesley, 2002.
[32] B. Wilkinson and M. Allen, Parallel Programming— Techniques and Applications Using Networked Workstations and Parallel Computers. Pearson Education, Inc., 1999.
[33] L. Xue and J. Bajorath, “Molecular Descriptors for Effective Classification of Biologically Active Compounds Based on Principal Component Analysis Identified by a Genetic Algorithm,” J. Chemical Information and Computer Sciences, vol. 40, no. 3, pp. 801-809, 2000.
[34] M. Zuker and P. Stiegler, “Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information,” Nucleic Acids Research, vol. 9, 1981.

Index Terms:
High-performance computational biology, dynamic programming algorithms, hierarchical parallel genetic algorithms, parallel patterns, generic programming.
Citation:
Weiguo Liu, Bertil Schmidt, "Parallel Pattern-Based Systems for Computational Biology: A Case Study," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 8, pp. 750-763, Aug. 2006, doi:10.1109/TPDS.2006.109
Usage of this product signifies your acceptance of the Terms of Use.