# VLSI Bit-Level Systolic Array for Radar Front-End Signal Processing William S. Song MIT Lincoln Laboratory, Lexington, MA 02173 ### **Abstract** A very-high-speed radar front-end signal processing CMOS VLSI chip-set using a fully efficient bit-level systolic array architecture has been developed by MIT Lincoln Laboratory. The chip-set performs baseband quadrature sampling, channel equalization, pulse compression, and digital beamforming. The highly pipelined fully efficient bit-level systolic architecture and the highly optimized scalable CMOS VLSI cell library design give the chip-set extremely high performance. The chip-set uses an efficient 4:1 down-sampling baseband quadrature sampling architecture with reduced computational requirement. The chip-set and the cell library have potential in a variety of applications such as communications and medical imaging. ### 1. Introduction A very-high-speed radar front-end signal processing CMOS VLSI chip-set using a fullyefficient bit-level systolic array architecture has been developed by MIT Lincoln Laboratory. The chip-set performs baseband quadrature sampling, channel equalization, pulse compression, and digital beamforming. A front-end signal processing architecture for an adaptive multichannel radar is shown in Figure 1. The RF signal received by the antenna is converted to an IF signal by the receiver. The IF signal is then digitized by the A/D converter. From the A/D output, the baseband quadrature sampler (BQS) module generates the digital baseband in-phase (I) and quadrature (Q) signals. The I/Q pair is then channel equalized, pulse Figure 1. A Modern Adaptive Radar Architecture. compressed, and sent to the digital beamformer. The digital beamformer generates the adapted output beam by computing the weighted sum of input channels. The adaptive weights used by the digital beamformer are computed by a separate adaptive weight computation processor, which is not part of the chip-set. This chip-set uses the fully efficient bit-level systolic array architecture by Wang [1]. The high-level pipelining of the bit-level systolic array architecture and the highly optimized custom CMOS VLSI cell library design resulted in Figure 2. Bit-Level Systolic Array FIR Filter. an extremely high-performance chip-set with relatively small die sizes and power consumption. In addition, the BQS module in the chip-set uses an efficient architecture with reduced computational requirements. The chip-set is expected to greatly reduce size, weight, power consumption, and cost of airborne adaptive radar systems and can also be used for applications such as communications and medical imaging. ### 2. Fully Efficient Bit-Level Systolic Array The architecture used by the chip-set's VLSI cell library is a fully efficient bit-level systolic array [1]. While conventional bit-level systolic arrays have only 50 percent of the bit-cells computing at any given time [2~5], the fully efficient architecture has all bit-cells computing at all times. This architecture is suited for finite impulse response (FIR) filters, infinite impulse response (IIR) filters, and inner product computations. The FIR filter example of the bit-level systolic architecture is shown in Figure 2, with each element function shown in Figure 3. The array shown consists of four 4-bit taps. The number of taps and number of bits can be changed by varying the number of rows and columns in the array. ## 3. Efficient Baseband Quadrature Sampling Architecture The BQS function in the chip-set is implemented with an efficient 4:1 down-sampling Figure 3. Element Functions of the Systolic Array. architecture with reduced computational requirements. The functional block diagram of the conventional 4:1 down-sampling digital BQS method [6] is shown in Figures 4 and 5. In Figure 4, the input IF signal, which is centered around 1/4 of the A/D sampling rate, is sampled by the A/D converter. The sampled data is then digitally mixed to produce the baseband in-phase and quadrature components. The I/Q signals are then N-tap FIR filtered to eliminate the negative frequency sideband and the DC component. The filter output is then decimated by a factor of 4 to produce the digital baseband signal. Since half the filter inputs in Figure 4 are zeros, a simplification shown in Figure 5 can be made [6]. The h(n) is the original N-tap filter coefficients. The $h_l(n)$ and $h_Q(n)$ are the N/2 tap filter coefficients related to h(n) by $$h_I(n) = (-1)^{n+1}h(2n+1)$$ $$h_{\mathcal{Q}}(n) = (-1)^{n+1}h(2n)$$ for $n = 0, 1, \dots, \frac{N}{2} - 1$ . Since $h_I(n)$ and $h_Q(n)$ are followed by a decimation by a factor of 2, half the filter outputs are discarded. Using a polyphase filtering technique [7], further optimization can be made as shown in Figure 6. The inputs are alternately routed to the four filters, whose coefficients can be $$h_{Ia}(n) = h_I(2n)$$ $$h_{Ia}(n) = h_I(2n)$$ $$h_{Ia}(n) = h_I(2n)h_{Ia}(n) = h_I(2n)$$ for $$n = 0, 1, ..., N/4 - 1.$$ The two N-tap filters in Figure 4 are now replaced by four (N/4)-tap filters running 1/4 as fast. The BQS architecture in Figure 6 has half the computational requirement of the architecture in Figure 5. ## 4. High-Speed CMOS VLSI Implementation The full custom CMOS VLSI cell library was optimized for maximum speed, minimum power consumption, minimum area, and easy floor planning. Because the bit-level systolic array architecture is based on a single bit-cell, much effort was expended optimizing this bit-cell. The entire array was then constructed by replication; very-high-performance design resulted with relatively little design effort. The cells were designed using vendor-independent lambdarized design rules, which scale well with shrinking device fabrication technology. These cells are designed to be used with processes down to quarter-micron with little or no modification. Figure 4. Baseband Quadrature Sampling. Figure 5. Improved Baseband Quadrature Sampling. Figure 6. Implemented Baseband Quadrature Sampling. The number of filter taps, inner product terms, and input/coefficient bits can all be adjusted to fit the application by varying the array dimensions. The array's internal arithmetic is full precision; there is no internal roundoff or truncation. The output bit-field is user-selectable. The coefficient registers are programmable dual static registers so that one set of coefficients can be loaded while the other set is in use. The clock distribution was carefully designed with distributed buffers to accommodate the high clock speed. On-chip bypass capacitors are provided for high-speed operation. Because the fully efficient bit-level systolic array requires an interlaced bit-serial data format, the cell library contains the appropriate data format conversion cells. The cell library also contains all peripheral support and glue logic cells, such as I/O, timing control, and clock driver cells. The bit-level systolic radar processor chip-set consists of the processing element (PE) chip, pulse compression (PC) chip, and digital beamformer (BF) chip. The PE chip implements BQS and channel equalization (EO) functions. The BQS module uses 64 taps for the FIR filter (N =64 in Figure 6), and the EQ module uses four 32-tap FIR filters to implement a complex 32-tap FIR filter. The pulse compression chip is a cascadable 256-tap FIR filter. The beamformer (BF) chip is a complex digital beamformer. The current BF chip is a small test version with four input channels. More input channels can be accommodated by cascading multiple chips in a tree fashion. The inputs to the BF chip are interlaced nibble-serial data format; the nibble is equivalent to four bits. The nibble-serial link was used because a bit-parallel input required too many I/O pins and high-speed bit-serial I/O required additional phase-locking circuitry. All the chips have 24-bit external I/O and up to 53-bit internal dynamic range with no roundoff or truncation. All the chips have fully programmable dual static coefficient registers. The PE and BF chips have already been fabricated and tested using 1.0-micron (drawn minimum geometry) HP CMOS process through MOSIS. The PC chip is currently being designed. The PE chip runs at 260 MHz performing over 2 GOPS. The chip block diagram is shown in Figure 7 and the picture of the chip is shown in Figure 8. The design specifications are listed in Table 1. The BF chip runs faster than the PE chip at 300 MHz, because it is a small test version. A demonstration system using four PE chips and one BF chip has been successfully tested. Figure 7. Processing Element Chip Block Diagram. Figure 8. Processing Element Chip - Digital Baseband Quadrature Sampling (64 Taps) - Channel Equalization (4 × 32 Taps) - Dual Static Coefficient Registers - Recording and Control I/F - Testing and Fault-Tolerance I/F - 24-bit External and 52-bit Internal Arithmetic Precision - 2 GOPS at 240-MHz Operation - 6.5-W Power Consumption (27mW/MHz) - On-Chip Power Decoupling - HP 1.0-micron CMOS Process via MOSIS - 1.1 cm × 1.2 cm Die - 239-pin Ceramic PGA ### **Table 1. Processing Element Chip Features.** Because the VLSI cells are designed with device scaling in mind, chip designs with more transistors and higher clock speed should be possible with scaling device fabrication geometry. Our projection suggests that around 20 GOPS with 0.5-micron CMOS technology should be possible. ### 5. Summary A very-high-performance radar front-end signal processing chip-set has been developed using a fully efficient bit-level systolic array architecture. The chip-set also uses an efficient baseband quadrature sampling architecture. The optimized custom CMOS VLSI cell library gives an extremely high operation count per unit die area and is expected to scale well with the shrinking fabrication geometry. The chip-set is expected to greatly reduce size, weight, power consumption, and cost of the airborne adaptive radar systems. This chip-set has potential to a variety of other applications such as communications and medical imaging. ### References [1] C. L. Wang, C. H. Wei, and S. H. Chen, "Efficient Bit-Level Systolic Array Implementation of FIR and IIR Digital Filters," *IEEE* Journal On Selected Areas In Communications, Vol. 6, No. 3, April 1988. [2] J. G. McWhirter, J. V. McCanny, and K. W. Wood, "Novel Multibit Convolver/Correlator Chip Based on Systolic Array Principles," *Proc. SPIE*, vol. 341, Real Time Signal Processing V, 1982, pp. 66~73. [3] J. G. McWhirter, D. Wood, K. Wood, R. A. Evans, J. V. McCanny, and A. P. H. McCabe, "Multibit Convolution Using a Bit Level Systolic Array," *IEEE Trans. Circuits Syst.*, vol. CAS-32, pp. 95~98, Jan. 1985. [4] C. Caraiscos and B. Liu, "Bit Serial VLSI Implementations of FIR and IIR Digital Filters," Proc. IEEE Int. Symp. Circuits Syst., May 1983, pp. 717~721. [5] P. E. Danielsson, "Serial/Parallel Convolver, " *IEEE Trans. Comput.*, vol. C-33, pp. 652~667. [6] K. Teitelbaum, "A Flexible Processor for A Digital Adaptive Radar," *Proc. IEEE National Radar Conference*, Los Angeles, CA, March, 1991, pp. 103~107. [7] M. G. Bellanger, G. Bonnerot, and M Coudreuse, "Digital Filtering by Polyphase Network: Application to Sample-Rate Alteration and Filter Banks," *IEEE Trans. Acoustics, Speech, and Signal Processing*, vol. ASSP-24, No. 22, April 1976, pp. 109~114.