This Article 
 Bibliographic References 
 Add to: 
Parallel Computation in Biological Sequence Analysis
March 1998 (vol. 9 no. 3)
pp. 283-294

Abstract—A massive volume of biological sequence data is available in over 36 different databases worldwide, including the sequence data generated by the Human Genome project. These databases, which also contain biological and bibliographical information, are growing at an exponential rate. Consequently, the computational demands needed to explore and analyze the data contained in these databases is quickly becoming a great concern. To meet these demands, we must use high performance computing systems, such as parallel computers and distributed networks of workstations. We present two parallel computational methods for analyzing these biological sequences. The first method is used to retrieve sequences that are homologous to a query sequence. The biological information associated with the homologous sequences found in the database may provide important clues to the structure and function of the query sequence. The second method, which helps in the prediction of the function, structure, and evolutionary history of biological sequences, is used to align a number of homologous sequences with each other. These two parallel computational methods were implemented and evaluated on an Intel iPSC/860 parallel computer. The resulting performance demonstrates that parallel computational methods can significantly reduce the computational time needed to analyze the sequences contained in large databases.

[1] A. Bairoch and B. Boeckmann, "The SWISS-PROT Protein Sequence Data Bank," Nucleic Acids Research, vol. 20, pp. 2,019-2,022, 1992.
[2] G.J. Barton and M.J.E. Sternberg, "A Strategy for the Rapid Multiple Alignment of Protein Sequences," J. Molecular Biology, vol. 198, pp. 327-337, 1987.
[3] G.J. Barton, "Protein Multiple Sequence Alignment and Flexible Pattern Matching," Methods in Enzymology, vol. 183, pp. 403-427, 1990.
[4] G.J. Barton, "Scanning Protein Sequence Databanks Using a Distributed Processing Workstation Network," Computer Applications in the Biosciences, vol. 7, pp. 85-88, 1991.
[5] D. Benson, D.J. Lipman, and J. Ostell, "GenBank," Nucleic Acids Research, vol. 21, pp. 2,963-2,965, 1993.
[6] M.P. Berger and P.J. Munson, "A Novel Randomized Iteration Strategy for Aligning Multiple Protein Sequences," Computer Applications in the Biosciences, vol. 7, pp. 479-484, 1991.
[7] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr., M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, "The Protein Data Bank: A Computer-based Archival File for Macromolecular Structures," J. Molecular Biology, vol. 112, pp. 535-542, 1977.
[8] D.L. Brutlag, J.P. Dautricourt, R. Diaz, F. Fier, B. Moxon, and R. Stamm, "BLAZE: An Implementation of the Smith-Waterman Sequence Comparison Algorithm on a Massively Parallel Computer," Computers Chem., vol. 17, pp. 203-207, 1993.
[9] C. Burks, M. Cassidy, M.J. Cinkowsky, K.E. Cumella, P. Gilna, J.E.H. Hayden, G.M. Keen, T.A. Kelly, M. Kelly, D. Kristofferson, and J. Ryals, "GenBank," Nucleic Acids Research, vol. 19supplement, pp. 2,221-2,225, 1991.
[10] U.S. Congress Office of Technology Assessment, "Mapping Our Genes—The Genome Projects: How Big, How Fast?" OTA-BA-373, Washington, D.C., Government Printing Office, Apr. 1988.
[11] F. Corpet, "Multiple Sequence Alignment with Hierarchical Clustering," Nucleic Acids Research, vol. 16, no. 22, pp. 10,881-10,891, 1988.
[12] M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt, "A Model of Evolutionary Change in Proteins," Atlas of Protein Sequence and Structure, vol. 5, pp. 345-352, 1978.
[13] C. DeLisi, "The Human Genome Project," American Scientist, vol. 76, pp. 488-493, 1988.
[14] A.S. Deshpande, D.S. Richards, and W.R. Pearson, "A Platform for Biological Sequence Comparison on Parallel Computers," Computer Applications in the Biosciences, vol. 7, pp. 237-247, 1991.
[15] DNA Data Bank of Japan Nat'l Inst. of Genetics, Yata, Mishima, 411, Japan, ftp address:
[16] E.W. Edmiston, N.G. Core, J.H. Saltz, and R.M. Smith, "Parallel Processing of Biological Sequence Comparison Algorithms," Int'l J. Parallel Programming, vol. 17, pp. 259-275, 1988.
[17] European Molecular Biology Laboratory, Postfach 10.2209, D-6900 Heidelberg, Federal Republic of Germany, E-mail: DataLib@EMBL-Heidelberg.DE.
[18] D.F. Feng and R.F. Doolittle, "Progressive Alignment and Phylogenetic Tree Construction of Protein Sequences," Methods in Enzymology, vol. 183, pp. 375-387, 1990.
[19] D.G. George, W.C. Barker, and L.T. Hunt, "The Protein Identification Resource (PIR)," Nucleic Acids Research., vol. 14, pp. 11-15, 1986.
[20] D.G. George, L.T. Hunt, and W.C. Barker, "Current Methods in Sequence Comparison and Analysis," Macromolecular Sequencing and Synthesis Selected Methods and Applications, pp. 127-149, 1988.
[21] O. Gotoh, "An Improved Algorithm for Matching Biological Sequences," J. Molecular Biology, vol. 162, pp. 705-708, 1982.
[22] X. Guan, R. Mural, R. Mann, and E. Uberbacher, "On Parallel Search of DNA Sequence Databases," Proc. Fifth SIAM Conf. Parallel Processing for Scientific Computing, pp. 332-337, 1991.
[23] D.G. Higgins and P.M. Sharp, "CLUSTAL: A Package for Performing Multiple Sequence Alignment on a Microcomputer," Gene, vol. 73, pp. 237-244, 1988.
[24] D.G. Higgins and P.M. Sharp, "Fast and Sensitive Multiple Sequence Alignments on a Microcomputer," Computer Applications in the Biosciences, vol. 5, pp. 151-153, 1989.
[25] M. Hirosawa, Y. Totoki, M. Hoshida, and M. Ishikawa, "Comprehensive Study on Iterative Algorithms of Multiple Sequence Alignment," Computer Applications in the Biosciences, vol. 11, pp. 13-18, 1995.
[26] D.S. Hirschberg, "A Linear Space Algorithm for Computing Maximal Common Subsequences," Comm. Assoc. Comput. Mach., vol. 18, pp. 341-343, 1975.
[27] M. Ishikawa, M. Hoshida, M. Hirosawa, T. Toya, O. Kentaro, and K. Nitta, "Protein Sequence Analysis Program: Multiple Sequence Alignment by Parallel Iterative Aligner," Demonstrations Int'l Conf. Fifth Generation Computer Systems,Tokyo, pp. 57-62, 1992.
[28] M. Ishikawa, M. Hoshida, M. Hirosawa, T. Toya, O. Kentaro, and K. Nitta, "Protein Sequence Analysis Program by Parallel Inference Machine," Proc. Int'l Conf. Fifth Generation Computer Systems,Tokyo, pp. 294-299, 1992.
[29] M. Ishikawa, T. Toya, M. Hoshida, K. Nitta, A. Ogiwara, and M. Kanehisa, "Multiple Sequence Alignment by Parallel Simulated Annealing," Computer Applications in the Biosciences, vol. 9, pp. 267-273, 1993.
[30] E.A. Kabat, T.T. Wu, H.M. Perry, K.S. Gottesman, and C. Foeller, "Sequence of Proteins of Immunological Interest," U.S. Dept. of Health and Human Services, Public Health Service, Nat'l Inst. of Health, NIH Publication No. 91-3242, 1991.
[31] G. Keen, G. Redgrave, J. Lawton, M. Cinkowsky, S. Mishra, J. Fickett, and C. Burks, "Access to Molecular Biology Databases," Mathematical Computer Modeling, vol. 16, pp. 93-101, 1992. Internet: To obtain the LiMB file, send the message: limb-data to bioserve@t10.Lanl.GOV.
[32] J. Kim, S. Pramanik, and M.J. Chung, "Multiple Sequence Alignment Using Simulated Annealing," Computer Applications in the Biosciences, vol. 10, pp. 419-426, 1994.
[33] J.B. Kruskal, "An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules," SIAM Review, vol. 25, pp. 201-237, 1983.
[34] E. Lander, J.P. Mesirov, and W. Taylor IV, "Study of Protein Sequence Comparison Metrics on the Connection Machine CM-2," J. Supercomputing, vol. 3, pp. 255-269, 1989.
[35] C.L. Lawrence, S.F. Altschul, M.S. Boguski, J.S. Liu, A.F. Neuwald, and J.C. Wootton, "Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment," Science, vol. 262, pp. 208-214, 1993.
[36] H.M. Martinez, "A Flexible Multiple Sequence Alignment Program," Nucleic Acids Research, vol. 16, pp. 1,683-1,691, 1988.
[37] P.L. Miller, P.M. Nadkarni, and W.R. Pearson, "Comparing Machine-Independent versus Machine-Specific Parallelization of a Software Platform for Biological Sequence Comparison," Computer Applications in the Biosciences, vol. 8, pp. 167-175, 1992.
[38] M. Murata, J.S. Richardson, and J.L. Sussman, "Simultaneous Comparison of Three Protein Sequences," Proc. Nat'l Academy of Sciences USA, vol. 82, pp. 3,073-3,077, 1985.
[39] M. Murata, "Three-Way Needleman-Wunsch Algorithm," Methods in Enzymology, vol. 183, pp. 365-375, 1990.
[40] E.W. Myers and W. Miller, "Optimal Alignments in Linear Space," Computer Applications in the Biosciences, vol. 4, pp. 11-17, 1988.
[41] S.B. Needleman and C.D. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Sequences," J. Molecular Biology, vol. 48, pp. 443-453, 1970.
[42] Protein Information Resource Nat'l Biomedical Research Foundation, 3900 Reservoir Road, N.W., Washington, D.C. 20007, E-mail: PIRMAIL@GUNBRF.bitnet.
[43] D.F. Sittig, D. Foulser, N. Carriero, G. McCorkle, and P.L. Miller, "A Parallel Computing Approach to Genetic Sequence Comparison: The Master-Worker Paradigm with Interworker Communication," Computers and Biomedical Research, vol. 24, pp. 152-169, 1991.
[44] T.F. Smith and M.S. Waterman, "Identification of Common Molecular Subsequence," J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[45] A. Sohn, “Parallel N-Ary Speculative Computation of Simulated Annealing,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, Oct. 1995.
[46] W.R. Taylor, "Multiple Sequence Alignment by a Pairwise Algorithm," Computer Applications in the Biosciences, vol. 3, pp. 81-87, 1987.
[47] W.R. Taylor, "A Flexible Method to Align Large Numbers of Biological Sequences," J. Molecular Evolution, vol. 28, pp. 161-169, 1988.
[48] V. Veljkovic, R. Meltlas, J. Raspopovic, and S. Pongor, "Spectral and Sequence Similarity between Vasoactive Intestinal Peptide and the Second Conserved Region of Human Immunodeficiency Virus Type 1 Envelope Glycoprotein (gp120): Possible Consequences on Prevention and Therapy of AIDS," Biochemical and Biophysical Research Comm., vol. 189, pp. 705-710, 1992.
[49] M.A. Watson and T.P. Fleming, "Isolation of Differentially Expressed Sequence Tags from Human Breast Cancer," Cancer Research, vol. 54, pp. 4,598-4,602, 1994.
[50] E.E. Witte, R.D. Chamberlain, and M.A. Franklin, "Parallel Simulated Annealing Using Speculative Computation," IEEE Trans. Parallel and Distributed Systems, vol. 2, pp. 483-494, 1991.
[51] T.K. Yap, O. Frieder, and R.L. Martino, "Parallel Computation in Biomedicine: Genetic and Protein Sequence Analysis," Handbook of Parallel and Distributing Computing, A.Y. Zomaya, ed., pp. 1,071-1,096. McGraw-Hill, 1996.
[52] T.K. Yap, O. Frieder, and R.L. Martino, "Parallel Homologous Sequence Searching in Large Databases," Proc. IEEE Fifth Symp. Frontiers of Massively Parallel Computation, pp. 231-237, Feb. 1995.
[53] T.K. Yap, P.J. Munson, O. Frieder, and R.L. Martino, "Parallel Multiple Sequence Alignment Using Speculative Computation," Proc. Int'l Conf. Parallel Processing, Aug. 1995.
[54] T.K. Yap, O. Frieder, and R.L. Martino, High Performance Computational Methods for Biological Sequence Analysis. Kluwer Academic Publishers, 1996.

Index Terms:
Sequence, comparison, alignment, search, retrieval, database, algorithm, parallel, speculative, computation.
Tieng K. Yap, Ophir Frieder, Robert L. Martino, "Parallel Computation in Biological Sequence Analysis," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 3, pp. 283-294, March 1998, doi:10.1109/71.674320
Usage of this product signifies your acceptance of the Terms of Use.