This Article 
 Bibliographic References 
 Add to: 
The UCSC Kestrel Parallel Processor
January 2005 (vol. 16 no. 1)
pp. 80-92

Abstract—The architectural landscape of high-performance computing stretches from superscalar uniprocessor to explicitly parallel systems to dedicated hardware implementations of algorithms. Single-purpose hardware can achieve the highest performance and uniprocessors can be the most programmable. Between these extremes, programmable and reconfigurable architectures provide a wide range of choice in flexibility, programmability, computational density, and performance. The UCSC Kestrel parallel processor strives to attain single-purpose performance while maintaining user programmability. Kestrel is a single-instruction stream, multiple-data stream (SIMD) parallel processor with a 512-element linear array of 8-bit processing elements. The system design focuses on efficient high-throughput DNA and protein sequence analysis, but its programmability enables high performance on computational chemistry, image processing, machine learning, and other applications. The Kestrel system has had unexpected longevity in its utility due to a careful design and analysis process. Experience with the system leads to the conclusion that programmable SIMD architectures can excel in both programmability and performance. This paper presents the architecture, implementation, applications, and observations of the Kestrel project at the University of California at Santa Cruz.

[1] J.D. Hirschberg, D. Dahle, K. Karplus, D. Speck, and R. Hughey, “Kestrel: A Programmable Array for Sequence Analysis,” J. VLSI Signal Processing, vol. 19, pp. 115-126, 1998.
[2] R. Hughey, “Parallel Sequence Comparison and Alignment,” Proc. CABIOS Conf., vol. 12, no. 6, pp. 473-479, 1996.
[3] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, and C. Chothia, “Sequence Comparisons Using Multiple Sequences Detect Three Times as Many Remote Homologues as Pairwise Methods,” J. Molecular Biology, vol. 284, no. 4, pp. 1201-1210, 1998.
[4] Time Logic Inc., Decypher II Product Literature, http:/, 2002.
[5] B. Schmidt, H. Schroder, and M. Schimmler, “Massively Parallel Solutions for Molecular Sequence Analysis,” Proc. Int'l Parallel and Distributed Processing Symp., pp. 186-192, Apr. 2002.
[6] M. Gokhale, W. Holmes, A. Kopser, S. Lucas, R. Minnich, D. Sweely, and D. Lopresti, “Building and Using a Highly Parallel Programmable Logic Array,” Computer, vol. 24, pp. 81-89, Jan. 1991.
[7] M. Gokhale et al., “Processing in Memory: The Terasys Massively Parallel PIM Array,” Computer, vol. 28, pp. 23-31, Apr. 1995.
[8] J. Frigo, M. Gokhale, and D. Lavenier, “Evaluation of the Streams-C C-To-FPGA Compiler: An Applications Perspective,” Proc. ACM/SIGDA Ninth Int'l Symp. Field Programmable Gate Arrays, pp. 134-140, 2001.
[9] C. Ebeling, D.C. Conquist, and P. Franklin, “Rapid— Reconfigurable Pipelined Datapath,” Proc. Sixth Int'l Workshop Field-Programmable Logic and Applications, pp. 126-135, 1996.
[10] The Massively Parallel Processor, J.L. Potter, ed., Cambridge, Mass.: MIT Press, 1985.
[11] L.W. Tucker and G.G. Robertson, “Architecture and Applications of the Connection Machine,” Computer, vol. 21, pp. 26-38, Aug. 1988.
[12] J.R. Nickolls, “The Design of the Maspar MP-1: A Cost Effective Massively Parallel Computer,” Proc. COMPCON Conf. Spring 1990, pp. 25-28, Feb. 1990.
[13] K. Hwang and Z. Xu, Scalable Parallel Computing. New York: McGraw-Hill Book Co., 1998.
[14] D.E. Culler and J.P. Singh, Parallel Computer Architecture. Los Altos, Calif.: Morgan Kaufmann Publishers, 1999.
[15] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, third ed., Los Altos, Calif.: Morgan Kaufmann Publishers, 2002.
[16] R. Hughey and D.P. Lopresti, “B-SYS: A 470-Processor Programmable Systolic Array,” Proc. Int'l Conf. Parallel Processing, C. Wu, ed., vol. 1, pp. 580-583, Boca Raton, Fla.: CRC Press, Aug. 1991.
[17] D.M. Dahle, J.D. Hirschberg, K. Karplus, H. Keller, E. Rice, D. Speck, D.H. Williams, and R. Hughey, “Kestrel: Design of an 8-Bit SIMD Parallel Processor,” Proc. 17th Conf. Advanced Research in VLSI, pp. 145-162, Sept. 1997.
[18] D.E. Knuth, The Art of Computer Programming, vol. 2, Reading, Mass.: Addison-Wesley, second ed., 1981.
[19] S.F. Altshul, W. Gish, W. Miller, E. Myers, and D. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[20] W. Pearson, “Comparison of Methods for Searching Protein Sequence Databases,” Protein Science, vol. 4, pp. 1145-1160, 1995.
[21] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, and C. Chothia, “Sequence Comparisons Using Multiple Sequences Detect Three Times as Many Remote Homologues as Pairwise Methods,” J. Molecular Biology, vol. 284, no. 4, pp. 1201-1210, 1998. .
[22] T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[23] A. Krogh, M. Brown, I.S. Mian, K. Sjölander, and D. Haussler, “Hidden Markov Models in Computational Biology: Applications to Protein Modeling,” J. Molecualar Biology, vol. 235, pp. 1501-1531, Feb. 1994.
[24] Paracel, Inc., Genematcher2 Product Literature, assess_paper/assess_paperNov.htmlhttp:/, 2001.
[25] Y. Yamaguchi and T. Maruyama, “High Speed Homology Search with Fpgas,” Proc. Pacific Symp. Biocomputing 2002, pp. 271-282, 2002.
[26] D.S. Hirschberg, “A Linear Space Algorithm for Computing Maximal Common Subsequences,” Comm. ACM, vol. 18, pp. 341-343, June 1975.
[27] J.A. Grice, R. Hughey, and D. Speck, “Reduced Space Sequence Alignment,” Proc. CABIOS Conf., vol. 13, pp. 45-53, Feb. 1997.
[28] L. Grate, M. Diekhans, D. Dahle, and R. Hughey, “Sequence Analysis with the Kestrel SIMD Parallel Processor,” Proc. Pacific Symp. Biocomputing 2001, pp. 263-274, 2000.
[29] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, pp. 257-286, Feb. 1989.
[30] R. Hughey and A. Krogh, “Hidden Markov Models for Sequence Analysis: Extension and Analysis of the Basic Method,” Proc. CABIOS Conf., vol. 12, no. 2, pp. 95-107, 1996, sam.html.
[31] D. Lavenier, “Speeding up Genome Computations with a Systolic Accelerator,” SIAM News, vol. 31, no. 8, pp. 6-7, 1998.
[32] E. Rice and R. Hughey, “Molecular Fingerprinting on the SIMD Parallel Processor Kestrel,” Proc. Pacific Symp. Biocomputing 2001, pp. 323-334, 2000.
[33] P.P. Jonker, “Why Linear Processor Arrays are Better Image Processors,” Proc. Int'l Conf. Pattern Recognition (ICPR), vol. 3, pp. 334-338, 1994.
[34] S. Kyo, S. Okazaki, Y. Fujita, and N. Yamashita, “A Parallelizing Method for Implementing Image Processing Tasks on SIMD Linear Processor Arrays,” Computer Architectures for Machine Perception, pp. 180-184, Oct. 1997.
[35] D. Taubman and M. Marcellin, JPEG2000— Image Compression Fundamentals, Standards and Practice. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1999.
[36] J.P. Hayes, Computer Architecture and Organization. New York: McGraw-Hill Book Co., 1976.
[37] E. Rice and R. Hughey, “Multiprecision Division on an 8-Bit Processor,” Proc. 13th IEEE Symp. Computer Arithmetic, pp. 74-81, July 1997.
[38] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: Freeman, 1979.
[39] Center for Discrete Mathematics and Theoretical Computer Science, http:/, 2004.
[40] A. Di Blas, A. Jagota, and R. Hughey, “Parallel Implementations of Optimizing Neural Networks,” Proc. Artificial Neural Networks in Eng. Conf. (ANNIE 2000), pp. 153-158, Nov. 2000.
[41] A. Di Blas, A. Jagota, and R. Hughey, “Optimizing Neural Networks on SIMD Parallel Computers,” Parallel Computing, pending publication.
[42] A. Di Blas and R. Hughey, “Explicit SIMD Programming for Asynchronous Applications,” Proc. Int'l Conf. ASAP, pp. 258-267, July 2000.
[43] D.M. Hawver and G.B. Adams III, “Processor Autonomy and Its Effect on Parallel Program Execution,” Proc. Symp. Frontiers of Massively Parallel Computing, pp. 144-153, Oct. 1996.
[44] Proc. Pacific Symp. Biocomputing 2001, London: World Scientific, 2001.

Index Terms:
Parallel processing, SIMD, systolic array, biological sequence analysis, DNA, computational chemistry, image processing, VLSI system design, computer architecture, high performance computing.
Andrea Di Blas, David M. Dahle, Mark Diekhans, Leslie Grate, Jeffrey Hirschberg, Kevin Karplus, Hansj? Keller, Mark Kendrick, Francisco J. Mesa-Martinez, David Pease, Eric Rice, Angela Schultz, Don Speck, Richard Hughey, "The UCSC Kestrel Parallel Processor," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 1, pp. 80-92, Jan. 2005, doi:10.1109/TPDS.2005.12
Usage of this product signifies your acceptance of the Terms of Use.